Multi-Model Configuration

Commander uses 5 model roles across 3 providers, automatically selecting the right model for each workflow step.

Model Roles

RoleDefault ModelProviderPurposeCost
🧠 ThinkingClaude OpusAnthropicAnalysis, planning, debugging$$$
πŸ’» CodingClaude SonnetAnthropicCode generation, fixes$$
⚑ TaskGemini FlashOpenRouterSearch, classification$
πŸ“Š Embeddingnomic-embed-textOllamaVector embeddingsFree
πŸ”’ Auditorllama3.2:3bOllamaSecurity reviewFree

How Routing Works

Step type β†’ Model role β†’ Provider:

step_type: analyze  β†’ Thinking β†’ Claude Opus
step_type: code     β†’ Coding   β†’ Claude Sonnet
step_type: search   β†’ Task     β†’ Gemini Flash
step_type: execute  β†’ (shell)  β†’ No model needed

View Configuration

murc models
πŸ€– Model Configuration

  auditor    Ollama / llama3.2:3b (max $0.00/call, temp 0.1)
  coding     Anthropic / claude-sonnet-4 (max $2.00/call, temp 0.2)
  embedding  Ollama / nomic-embed-text (max $0.00/call, temp 0.0)
  task       OpenRouter / gemini-2.0-flash (max $0.50/call, temp 0.3)
  thinking   Anthropic / claude-opus-4 (max $5.00/call, temp 0.3)

Cost Tracking

murc models cost --period 7d

Commander tracks every API call's cost. The constitution enforces limits:

  • max_api_cost_per_run β€” per workflow execution
  • max_api_cost_per_day β€” daily cap across all workflows

Environment Variables

VariableRequired For
ANTHROPIC_API_KEYThinking + Coding models
OPENROUTER_API_KEYTask model (Gemini Flash)
OLLAMA_BASE_URLCustom Ollama endpoint (default: localhost:11434)

Three-Layer Cache

To minimize API costs, Commander uses a 3-layer cache:

  • L1 (Struct) β€” Pre-loaded workflows, patterns, config
  • L2 (Query) β€” Search results and injection results
  • L3 (LLM) β€” Promptβ†’response pairs with TTL
murc cache stats