Skip to main content
28.06.2026

Wayfinder Router for LLM Cost Control

head-image

AI tooling is becoming part of the operations stack. Agents read runbooks, summarize incidents, inspect logs, and draft changes. The next reliability problem is not only model quality. It is routing. Which requests can stay on a cheap local model, and which ones deserve an expensive hosted model? Wayfinder Router is a small open source project built around that exact question.

What Is Wayfinder Router?

Wayfinder Router is a Python CLI, library, and OpenAI-compatible gateway for prompt complexity routing. It scores a prompt from structural signals such as length, headings, lists, code blocks, and constraint-heavy wording, then recommends a model tier.

The key design choice is that routing is deterministic and offline. Wayfinder does not call an LLM judge, hosted classifier, or external API to decide where the request should go. That keeps the routing step fast, cheap, and repeatable. For SRE teams, that matters because a routing layer should not add another unreliable dependency before every AI request.

Key Features

  • No model call for routing: score prompts locally in microseconds before calling any backend.
  • OpenAI-compatible gateway: point existing clients at Wayfinder and keep the same /v1/chat/completions shape.
  • Local and hosted tiers: route simple prompts to Ollama, vLLM, LM Studio, or another local endpoint, then send complex prompts to a cloud model.
  • Configurable thresholds: tune routing with binary cuts, ordered tiers, or a calibrated classifier.
  • Observable decisions: responses include x-wayfinder-router-model and x-wayfinder-router-score headers.

Installation

For a quick dry run, use the terminal chat without keys:

uvx wayfinder-router chat --dry-run

For gateway mode, install the extra dependencies:

pip install "wayfinder-router[gateway]"
wayfinder-router init
wayfinder-router doctor
wayfinder-router serve --port 8088

Usage

Once the gateway is running, applications can keep using an OpenAI-style client. Only the base URL changes:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8088/v1", api_key="unused")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this incident timeline."}],
)

Wayfinder chooses the tier and returns headers showing the decision. Operators can also force a route with model="local" or model="cloud" when a workflow needs predictable placement.

Operational Tips

Treat routing policy like production configuration. Keep wayfinder-router.toml in Git, review threshold changes, and test it against real prompts from runbooks, incident summaries, code reviews, and support workflows. If a local model handles repetitive summarization well, route that class locally. If remediation advice or high-risk debugging needs stronger reasoning, send it to the hosted tier.

Pair Wayfinder with an AI gateway such as LiteLLM when you need provider failover, budgets, or tenant controls underneath the tier decision.

Conclusion

Wayfinder Router is useful because it turns model selection into a cheap, inspectable control point. It will not understand every subtle prompt, but it gives platform teams a practical starting layer for cost control, latency reduction, and safer local-first AI operations.

Looking to automate infrastructure operations? Akmatori helps SRE teams reduce toil with AI agents built for real production workflows. For reliable global infrastructure, check out Gcore.

Automate incident response and prevent on-call burnout with AI-driven agents!