Multi-Model Configuration

Commander uses 5 model roles across 3 providers, automatically selecting the right model for each workflow step.

Model Roles

Role	Default Model	Provider	Purpose	Cost
🧠 Thinking	Claude Opus	Anthropic	Analysis, planning, debugging	$$$
💻 Coding	Claude Sonnet	Anthropic	Code generation, fixes	$$
⚡ Task	Gemini Flash	OpenRouter	Search, classification	$
📊 Embedding	nomic-embed-text	Ollama	Vector embeddings	Free
🔒 Auditor	llama3.2:3b	Ollama	Security review	Free

How Routing Works

Step type → Model role → Provider:

step_type: analyze  → Thinking → Claude Opus
step_type: code     → Coding   → Claude Sonnet
step_type: search   → Task     → Gemini Flash
step_type: execute  → (shell)  → No model needed

View Configuration

murc models

🤖 Model Configuration

  auditor    Ollama / llama3.2:3b (max $0.00/call, temp 0.1)
  coding     Anthropic / claude-sonnet-4 (max $2.00/call, temp 0.2)
  embedding  Ollama / nomic-embed-text (max $0.00/call, temp 0.0)
  task       OpenRouter / gemini-2.0-flash (max $0.50/call, temp 0.3)
  thinking   Anthropic / claude-opus-4 (max $5.00/call, temp 0.3)

Cost Tracking

murc models cost --period 7d

Commander tracks every API call's cost. The constitution enforces limits:

max_api_cost_per_run — per workflow execution
max_api_cost_per_day — daily cap across all workflows

Environment Variables

Variable	Required For
`ANTHROPIC_API_KEY`	Thinking + Coding models
`OPENROUTER_API_KEY`	Task model (Gemini Flash)
`OLLAMA_BASE_URL`	Custom Ollama endpoint (default: localhost:11434)

Three-Layer Cache

To minimize API costs, Commander uses a 3-layer cache:

L1 (Struct) — Pre-loaded workflows, patterns, config
L2 (Query) — Search results and injection results
L3 (LLM) — Prompt→response pairs with TTL

murc cache stats