AI Infrastructure Apr 16, 2026 by Greg 9 min read

I Run 9 MCP Servers in Production — Here's What 97 Million Downloads Don't Tell You

Architecture diagram showing MCP servers connecting AI models

The Number Everyone's Quoting

Model Context Protocol crossed 97 million monthly SDK downloads by March 2026, up from 100,000 at launch. There are over 10,000 active public MCP servers. OpenAI, Google, Microsoft, and Salesforce all ship native MCP support. The Linux Foundation has taken MCP under open governance. By any measure, the adoption curve is real.

But here's the number nobody leads with: only 12.9% of MCP servers score a "high trust" rating. That means 87% of the servers in the ecosystem are unreliable, poorly maintained, or both. The protocol succeeded. The implementations mostly haven't. I know this because I've built 9 of my own, and I use them every day.

My 9 Servers and What They Do

Every server is custom-built in Node.js using the MCP SDK. Each one connects Claude Code to a different AI provider or capability:

gemini-mcp connects to Gemini 3.1 Flash and Pro. I use it for research, content generation, comparisons, and deep reasoning tasks. Flash handles the volume; Pro handles the hard problems.

groq-mcp connects to five models on Groq's inference platform: Llama 3.3 70B, Llama 4 Scout, Kimi K2, GPT OSS 120B, and Qwen 3 32B. Groq's speed makes it ideal for quick drafts, fast analysis, and any task where latency matters more than depth.

opencode-mcp provides access to GPT 5.4 Pro and GPT 5.3 Codex Spark for complex backend code and fast code generation tasks that need OpenAI-level reasoning.

ollama-mcp is the heavyweight. It connects to 8 models via my Ollama Pro subscription: DeepSeek V4 Flash, Devstral 2, Kimi K2.6, Qwen 3.5 (397B), Mistral Large 3 (675B), Nemotron 3 Super, GLM 5.1, and Qwen3-Coder (480B). These run on Ollama's cloud infrastructure, not locally.

nvidia-mcp provides access to roughly 80 free models via NVIDIA's NIM platform. It's the ultimate fallback — when paid APIs rate-limit me, NVIDIA usually has a comparable model available at no cost.

glm-mcp connects to GLM 5 as a free-tier fallback for general tasks.

fal-mcp handles image and video generation through fal.ai — Flux Pro for images, Seedance 2.0 for text-to-video. This is the creative production server.

mistral-mcp connects to Mistral's model lineup for tasks where Mistral's strengths in instruction following and structured output matter.

playwright-mcp is different from the others — it drives a headless browser for automated QA. Take screenshots, run Lighthouse audits, verify deployments visually. It's the quality gate at the end of every workflow.

Why I Build Custom Instead of Using Public Servers

That 12.9% trust score tells the story. Public MCP servers go down without warning. They change their APIs. Some inject telemetry or advertising. None of them have an SLA. When you're in the middle of a production deploy and your MCP server disappears, you don't want to be filing an issue on someone's GitHub repo.

Custom servers give me control over the things that matter: uptime monitoring, error handling, rate limit management, and fallback chains. When Groq rate-limits me during a heavy session, my server automatically falls back to NVIDIA NIM. When a model returns garbage, I can log the failure and route to an alternative without the client (Claude Code) even knowing something went wrong.

Building a basic MCP server takes about 2 hours. Maintaining it takes maybe 30 minutes a month. For the reliability I get in return, that's an easy trade.

The Orchestration Pattern

The servers don't work in isolation. They're part of a delegation system: Claude Opus 4.6 plans the work. Claude Sonnet 4.6 executes it. The MCP servers provide the specialist layer.

Need research and competitive analysis? Route to Gemini. Need a fast first draft? Groq Llama. Complex backend code with edge cases? GPT 5.4 Pro. Generate a hero image? fal.ai Flux. Verify the deployment looks right? Playwright screenshots.

I have 263 Claude Code skills that route tasks to the right model automatically based on the type of work. It's not about having 15+ models available — it's about knowing which one to use for each specific task. A research question doesn't need GPT 5.4 Pro pricing. A complex database migration doesn't need Groq's speed at the expense of reasoning depth.

What It Actually Costs

This is where MCP changes the economics of AI development:

Ollama Pro: $10/month for 8 frontier-class models including 675B parameter Mistral Large 3. This is the best deal in AI right now.

Gemini API: roughly $5/month at my usage levels. Flash handles 90% of research tasks cheaply.

Groq: free tier covers about 90% of my fast-inference needs. I rarely hit rate limits during normal hours.

NVIDIA NIM: completely free. Around 80 models including DeepSeek V3, GLM 5.1, and Kimi K2.5.

fal.ai: roughly $10/month for image and video generation. Seedance videos cost about $4 each, so this varies.

GLM: free.

Total: $30-50 per month for access to capabilities that would cost $200+ if purchasing each API subscription separately. MCP is the abstraction layer that makes multi-model AI affordable for a solo developer.

The Honest Problems

MCP servers crash. Not often, but often enough that you need to plan for it. Cold starts add 2-3 seconds to the first request after a server has been idle, which interrupts the flow of an interactive coding session.

Error messages are bad. When a tool call fails across three layers — Claude to MCP server to external API — the error you get back is often useless. "Tool execution failed" doesn't tell you whether Groq is down, your API key expired, or you hit a rate limit. I've spent hours debugging failures that turned out to be a provider having a 5-minute outage.

The MCP spec itself is stable, but the tooling around it is still maturing. IDE integration varies in quality. Debugging requires checking logs across multiple processes. There's no standard monitoring dashboard — I built my own with a simple health-check script.

Is it worth it? Without question. Running 9 MCP servers gives me access to the combined capabilities of every major AI provider through a single interface. The workflow that results — plan with Opus, execute with Sonnet, specialize with MCP — is the most productive development environment I've ever used. But it's not turnkey. You're building infrastructure, not installing an app.