Five custom Model Context Protocol servers with 28 tools — gemini-research, groq-fast, opencode, glm-free, and ollama-local — built with Node.js, exposing 10+ AI models as development tools for research, coding, content creation, and audio transcription.
The MCP Servers project is a suite of five custom Node.js servers that expose 28 AI tools through the Model Context Protocol standard. Each server wraps a different AI provider — Google Gemini, Groq, OpenCode.ai, GLM, and Ollama — giving Claude Code instant access to the right model for every task. Whether the job calls for deep research, blazing-fast code generation, premium frontier models, or fully offline inference, one of these servers handles it.
The gemini-research server is the largest, offering 10 tools powered by Gemini 2.5 Flash, 2.5 Pro, and 3.1 Pro. It handles research, comparisons, deep reasoning, and content generation including blog posts, FAQs, and case studies. The groq-fast server provides 5 tools running on Groq's hardware-accelerated inference (~500 tok/s) with Llama 3.3 70B, Llama 4 Maverick, Llama 4 Scout, Llama 3.1 8B, and Whisper v3 Turbo for audio transcription. The opencode server routes to premium and free models through the OpenCode.ai gateway — GPT 5.1 Codex for complex backend code, Kimi K2.5 for bulk HTML templates, and Kimi K2 Thinking for second-opinion reasoning.
Rounding out the stack, glm-free provides 4 general-purpose tools via GLM 4.7 for chat, code, writing, and analysis at zero cost. The ollama-local server runs Llama 3.2 3B entirely on local hardware, providing 4 tools for summarization, rewriting, code review, and quick answers with no rate limits, no API keys, and full offline capability. All five servers are configured globally so every project in the workspace has immediate access to all 28 tools.
Gemini 2.5 Flash, 2.5 Pro, and 3.1 Pro for research, comparisons, deep reasoning, blog posts, FAQs, case studies, and page content generation.
Hardware-accelerated inference at ~500 tok/s with Llama 3.3 70B, Llama 4 Maverick & Scout, Llama 3.1 8B, plus Whisper v3 audio transcription.
Premium and free models via OpenCode.ai — GPT 5.1 Codex for backend code, Kimi K2.5 for bulk templates, Kimi K2 Thinking, Trinity, and GLM 5.
GLM 4.7 provides zero-cost general purpose chat, code generation, writing, and analysis with no rate limits on the free tier.
Llama 3.2 3B running on local hardware — fully offline summarization, rewriting, code review, and Q&A with zero latency and no API keys.
All 5 servers configured globally for Claude Code, providing 28 tools across every project — pick the right model for the job automatically.
Five servers, 28 tools, 10+ AI models — from Google Gemini's research depth to Groq's hardware-accelerated speed to fully offline local inference. Every project in the workspace gets instant access to the right model for the job, whether it's deep reasoning, fast code generation, audio transcription, or content creation at scale.