Skip to content

Model Router

The Model Router selects which LLM provider handles each request based on configurable routing strategies, with automatic fallback and circuit breaker protection. It lives in astromesh/core/model_router.py.

Incoming Request
┌──────────────────────┐
│ Routing Strategy │
│ (cost / latency / │
│ quality / round │
│ robin / capability)│
└──────────┬───────────┘
┌──────────▼──────────┐
│ Circuit Breaker │
│ State Check │
│ (closed? open? │
│ half-open?) │
└──────────┬──────────┘
┌────────────┼────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ OpenAI │ │ Anthropic │ │ Ollama │
│ Provider │ │ Provider │ │ Provider │
└───────────┘ └───────────┘ └───────────┘
│ │ │
└────────────┼────────────┘
On failure
┌──────────────────────┐
│ Fallback Provider │
└──────────────────────┘

Configure the strategy in your agent YAML under spec.model.routing:

spec:
model:
primary:
provider: openai
model: gpt-4o
fallback:
provider: anthropic
model: claude-sonnet-4-20250514
routing: cost_optimized
StrategyBehaviorWhen to Use
cost_optimizedSelects the provider with the lowest estimated_cost() for the requestBudget-sensitive workloads, high-volume batch processing
latency_optimizedSelects the provider with the lowest avg_latency_msReal-time applications, chat UIs, latency-critical paths
quality_firstAlways uses the primary provider; falls back only on failureTasks requiring the best model regardless of cost or speed
round_robinDistributes requests evenly across all healthy providersLoad balancing, even utilization across providers
capability_matchSelects based on required capabilities (tools, vision, streaming)Mixed workloads where some requests need tool calling and others need vision

Each provider has an independent circuit breaker that prevents cascading failures.

success
┌──────────────┐
│ │
▼ │
┌────────┐ 3 failures ┌────────┐ 60s cooldown ┌────────────┐
│ Closed │ ──────────────▶ │ Open │ ───────────────▶ | Half-Open │
└────────┘ └────────┘ └────────────┘
▲ │
│ success │
└───────────────────────────────────────────────────────┘
failure ──▶ Back to Open
StateBehavior
ClosedRequests pass through normally. Failure counter incremented on each error
OpenAll requests immediately rejected (no network call). Entered after 3 consecutive failures
Half-OpenEntered after 60-second cooldown. Allows a single probe request through. Success -> Closed, Failure -> Open
ParameterDefaultDescription
failure_threshold3Consecutive failures before circuit opens
cooldown_seconds60Seconds to wait before half-open probe

All LLM providers implement this runtime-checkable Protocol defined in astromesh/providers/base.py. The Model Router interacts with providers exclusively through this interface.

MethodSignatureDescription
complete()async def complete(messages, tools?, temperature?, max_tokens?) -> CompletionResponseSend a chat completion request. Returns the full response when complete
stream()async def stream(messages, tools?, temperature?, max_tokens?) -> AsyncIterator[StreamChunk]Stream a chat completion. Yields chunks as they arrive
health_check()async def health_check() -> boolVerify the provider is reachable and authenticated. Returns True if healthy
supports_tools()def supports_tools() -> boolWhether this provider supports function/tool calling
supports_vision()def supports_vision() -> boolWhether this provider supports image inputs
estimated_cost()def estimated_cost(input_tokens: int, output_tokens: int) -> floatEstimated cost in USD for a request of the given size. Used by cost_optimized routing
avg_latency_ms@property avg_latency_ms -> floatRolling average latency in milliseconds. Used by latency_optimized routing
FieldTypeDescription
contentstrThe model’s text response
tool_callslist[ToolCall] | NoneTool calls requested by the model
usageTokenUsageInput/output token counts
modelstrModel identifier that actually served the request
providerstrProvider name

When the primary provider fails (network error, rate limit, circuit breaker open), the router automatically tries the fallback provider if configured:

  1. Primary provider called
  2. If primary fails or circuit breaker is open, log warning
  3. Try fallback provider
  4. If fallback also fails, raise ProviderUnavailableError

Both primary and fallback have independent circuit breakers. If both circuits are open, the request fails immediately without any network call.

spec:
model:
primary:
provider: openai
model: gpt-4o
temperature: 0.7
max_tokens: 4096
fallback:
provider: anthropic
model: claude-sonnet-4-20250514
temperature: 0.7
max_tokens: 4096
routing: cost_optimized
FieldRequiredDescription
primary.providerYesProvider name (openai, anthropic, ollama, etc.)
primary.modelYesModel identifier for the provider
primary.temperatureNoSampling temperature (0.0 — 2.0). Default: provider-specific
primary.max_tokensNoMaximum output tokens. Default: provider-specific
fallback.providerNoFallback provider name
fallback.modelNoFallback model identifier
routingNoRouting strategy. Default: quality_first