Back to Blog
AI Cost

Model Routing for AI Coding Tools: Bedrock vs Azure Foundry vs Vertex

Most enterprises already run AI coding tools through Bedrock, Azure AI Foundry, or Vertex. Here's what model routing means on each gateway — and how to cut cost without switching vendors.

KorPro Team
June 29, 2026
4 min read
AI CostModel RoutingAWS BedrockAzure AI FoundryVertex AILLMEnterprise

If your company has standardized AI access, your developers' coding tools almost certainly run through one of three gateways: AWS Bedrock, Azure AI Foundry, or Google Vertex AI. That's the right call for governance — central billing, access control, and data residency.

It also means your AI coding spend is already concentrated in one place. The question isn't which gateway — you've chosen. It's how efficiently you use the models on it. That's what model routing addresses.

What model routing actually is

Model routing is simple to state: send each task to the cheapest model that can still do it well.

AI coding tools, left alone, route everything to the most capable (most expensive) model. But the work is wildly uneven:

  • "Rename this variable across the file" — a small model handles it fine.
  • "Refactor this service to be concurrency-safe" — you want the top model.

Routing makes that decision per task, automatically, against a quality bar — so the expensive model is reserved for work that benefits from it, and the rest runs cheaper. Crucially, it does this with the models already available on your gateway. No new vendor, no data leaving your environment.

On AWS Bedrock

Bedrock gives you a catalog of models behind one API, with provisioned and on-demand options and its own prompt-caching support. Routing on Bedrock means choosing among the Bedrock-hosted models per task and measuring savings against Bedrock's actual pricing — including the cache discount. The win: you keep Bedrock's governance and billing while stopping the "top model on everything" default.

On Azure AI Foundry

Foundry centers on model deployments and quota. Routing here means directing each task to the appropriate Foundry deployment rather than defaulting every request to your most capable one. Because quota and deployment cost vary, routing also helps you use provisioned capacity more evenly instead of saturating the expensive deployment.

On Google Vertex AI

Vertex offers its own model catalog, context caching, and pricing. Routing on Vertex follows the same pattern: match the task to the cheapest capable Vertex model, and measure against Vertex's billed rates. The routing logic is the same as on the other gateways — what changes is the catalog and the pricing it's measured against.

The common thread

The routing decision — "which model is cheapest-but-capable for this task?" — is gateway-agnostic. What's gateway-specific is:

  • The model catalog available.
  • The pricing each model is billed at.
  • The caching behavior that affects real cost.

So a routing layer worth using has to be portable across Bedrock, Foundry, and Vertex, and it has to measure savings against your gateway's actual prices — not a generic token estimate. (For why token estimates mislead, see Why Token-Reduction Percentages Lie About AI Savings.)

Keep it self-hosted

The reason you adopted Bedrock/Foundry/Vertex in the first place was control. A routing layer should preserve that: run it self-hosted, inside your own infrastructure, in front of your gateway. Prompts and code stay in your environment; routing and measurement happen privately. Adding a routing layer shouldn't mean adding a new place your source code travels to.

Set the policy centrally

Finally, the controls belong with whoever owns the AI budget. Managers should set:

  • Routing aggressiveness (quality-first → max-savings),
  • Prompt-compression level,
  • Per-team and per-repo policy, and
  • a quality floor routing never crosses.

Developers keep their existing tools and workflow; the cost policy is set above them.

How Tokor does this

Tokor is KorPro's AI cost optimization product, built for exactly this setup. It's a self-hosted model router and measurement layer that runs in front of your team's AI coding usage on Bedrock, Azure Foundry, or Vertex — starting with Claude Code. It routes each task to the cheapest capable model, lets managers set routing and compression centrally, and proves the savings in net-of-cache dollars measured against your gateway's real pricing.

Tokor is in early access for teams running 20+ developers. Apply to the design partner program.

Related: How to Reduce AI Coding Costs Across a Developer Team.

Stop Wasting Kubernetes Resources

Ready to Clean Up Your Clusters?

KorPro automatically detects unused resources, orphaned secrets, and wasted spend across all your Kubernetes clusters. Start optimizing in minutes.

Written by

KorPro Team

View All Posts