Case Study: Multi-Model AI Development Workflow

The Problem

I manage over 30 production websites across Astro SSR, Laravel, FastAPI, and static HTML. The sites range from a 162-page bilingual dog grooming platform to an enterprise copier quoting system deployed in two countries. Each one needs ongoing development, SEO optimization, deployment, and monitoring.

Traditional development at this scale requires a team. I needed to deliver the same output as a small agency without the headcount, overhead, or the latency of coordinating between people. The answer was not working harder or longer hours. It was building infrastructure that lets AI handle the volume work while I focus on architecture and quality decisions.

The Architecture

The system has three layers. Claude Opus 4.6 handles planning: project architecture, code review, complex decision-making. Claude Sonnet 4.6 handles execution: writing code, making edits, running commands. The third layer is 7 custom MCP servers that connect Claude Code to specialist AI models for tasks that benefit from a different model's strengths.

Each MCP server is a Node.js application built with the MCP SDK. Each one connects to a different AI provider or capability:

Gemini MCP

Research, content generation, competitive analysis. Gemini 3.1 Flash handles volume; Pro handles deep reasoning tasks.

Ollama Pro MCP

The primary routing layer. 11 frontier models from 120B to 1.6T parameters: DeepSeek V4 Pro, Qwen3-coder 480B, Mistral Large 3 675B, Kimi K2.6, Devstral 2, and more. $10/month flat.

Groq MCP

The speed layer. Fast inference across 8 models when latency matters more than depth. Free tier, used as a fallback.

OpenCode MCP

GPT 5.4 Pro for complex backend logic. GPT 5.3 Codex Spark for fast code generation.

NVIDIA NIM MCP

~80 free models. The fallback layer: when paid APIs rate-limit, NVIDIA usually has a comparable model at no cost.

fal.ai MCP

Image generation (Flux Pro) and video generation (Seedance 2.0). The creative production layer.

Playwright MCP

Headless browser for automated QA. Screenshots, Lighthouse audits, visual verification after every deploy.

Xiaomi MiMo MCP

The voice and reasoning layer. MiMo-V2.5-Pro for long-horizon agent tasks (1T parameters, 1M context). MiMo-V2.5-TTS for text-to-speech with emotion control, voice cloning from a 30-second sample, and voice design from a text description.

GLM + NVIDIA MCPs

Free-tier fallbacks. NVIDIA NIM provides 80+ models as a safety net; GLM covers edge cases when the primary layer is unavailable.

The servers are organized in tiers. Ollama Pro is the primary routing layer for all delegated work: drafts, code review, HTML generation, analysis. Gemini handles research and content where it excels. GPT 5.4 Pro handles complex backend logic. Xiaomi MiMo adds voice AI and frontier reasoning with 1M token context. Groq and NVIDIA NIM serve as speed and fallback layers. I have 265+ Claude Code skills that auto-route tasks to the right model based on work type, always preferring the highest-quality model available.

What It Costs

The entire multi-model infrastructure runs for $35-55 per month. Ollama Pro is $10/month for 11 frontier models ranging from 120B to 1.6T parameters. Gemini API runs about $5/month at my usage. Xiaomi MiMo adds reasoning and voice AI with TTS currently free during open beta. Groq and NVIDIA NIM are free tier. fal.ai is roughly $10/month for image and video generation. GLM is free. Accessing these same capabilities through individual subscriptions would cost $300+ per month.

On the hosting side, all 10+ Astro SSR production sites run on a single Contabo VPS for $15/month. Each site is a systemd service on its own port with Nginx as reverse proxy. The total infrastructure cost for running 30+ production sites with AI-powered development: under $100/month.

Measured Results

Fancy Pet Salon

162-page bilingual dog grooming site migrated from WordPress to Astro SSR.

PageSpeed score went from 62 on WordPress to 95+ on Astro. Zero JavaScript shipped by default. Full EN/ES translations with hreflang tags. Local SEO targeting Lynchburg, VA with structured data on every page.

PerfectCopier

Enterprise copier quoting platform deployed across US and UK.

Laravel 11 with Angular frontend, role-based dashboard, standalone HTML pages for specific workflows. Two separate deployments on Vultr CloudPanel. AI-assisted code review on every deploy using Codex CLI.

CrawlHound

Custom SEO scanner built from scratch.

FastAPI backend with async crawling, 20+ SEO factors per page, PDF report generation. Running in production at crawlhound.com. Built and iterated using the multi-model workflow: Groq for fast prototyping, GPT for complex crawl logic, Playwright for automated testing.

Life Gateway CRM

Custom CRM for a retirement planning firm.

FastAPI with Docker deployment on Contabo. Contact management, pipeline tracking, Calendly integration. The CRM was designed, built, and deployed in production using the same AI orchestration workflow.

Content Pipeline

Blog posts across multiple sites.

Each blog post is researched (Gemini), drafted (Ollama Pro), reviewed and rewritten (Claude), formatted with Schema.org BlogPosting markup, and published with SEO verification. The full cycle takes under 30 minutes per post. Every site ships with an AI crawler access audit, llms.txt file, and structured data validation.

What This Means for Clients

The AI workflow is a force multiplier, not a replacement for engineering judgment. I architect every project, review every deployment, and make every technical decision. The AI handles the volume: generating drafts, running audits, processing repetitive transformations, and testing deployments. This means projects that would take weeks get delivered in days. Sites that would require a team get built by one person with consistent quality.

Every site I ship includes mandatory post-deploy verification: SEO file audits, AI crawler access checks, structured data validation, PageSpeed testing, and Playwright visual regression screenshots. The AI workflow makes this level of thoroughness sustainable across 30+ sites. Without it, these checks would be the first thing to get skipped under time pressure.

The infrastructure is the product. Not any single tool or model, but the orchestration system that routes the right work to the right model at the right cost. That is what makes solo development at agency scale possible.

How I Ship 30+ Production Sites With a Multi-Model AI Workflow