Introduction
Kostrack is an AI API cost governance platform. It tracks, attributes, and governs every LLM API call — by project, feature, user, and workflow — and gives every person on the team a way to see and control what AI spend is going where.
What problem does it solve?
Once you ship AI features to production, a question becomes urgent fast: where is the API spend going? Your Anthropic dashboard shows total token spend. It doesn't tell you which feature costs the most, which agent workflow is the most expensive to run, or which team is over budget.
Helicone and LangSmith solve the ML engineer's problem — prompt logging, evals, model quality. Kostrack solves the engineering lead and CFO's problem: financial governance of AI API spend.
Kostrack is the only open-source tool with agentic cost rollup — the ability to attribute the total cost of a multi-step LangGraph workflow to a single traceable business action, not just 15 disconnected API calls.
How it works
Kostrack wraps your existing LLM provider clients. You change one import line. Every API call is intercepted after it returns, token counts are extracted, cost is calculated using versioned pricing, and the record is written asynchronously to TimescaleDB. Grafana reads from TimescaleDB and renders your dashboards.
Your App
↓
kostrack SDK ← wraps: Anthropic / OpenAI / Gemini / DeepSeek
↓ (async, <5ms)
TimescaleDB
↓
Grafana dashboards
The write path is non-blocking. A background thread batches records and writes them every 5 seconds. If TimescaleDB is unreachable, records buffer to a local SQLite file and flush automatically when connectivity returns. Your LLM calls are never delayed by Kostrack.
Core concepts
Three ways to interact
Engineers instrument code using the Python SDK — one import change, everything else identical. Operators and DevOps teams govern spend from the terminal using the kostrack CLI — no Python knowledge needed beyond installation. Finance teams and managers query data through the Platform API at port 8080 or open Grafana at port 3000 directly in a browser. All three read from the same TimescaleDB instance.
Tags
Every API call carries a tags dict — arbitrary key-value pairs you define. Reserved keys (project, feature, team, environment, user_id) appear as dropdown filters in Grafana. All other keys are stored in a GIN-indexed JSONB column and available for custom queries.
Traces and spans
For agentic workflows, Kostrack provides kostrack.trace() and kostrack.span() context managers. Every LLM call made inside a trace block inherits the same trace_id. Costs roll up from child spans to the root trace, so you can query the total cost of a workflow run as a single unit.
Pricing engine
Cost is calculated at write time — not query time — so dashboard aggregation queries stay fast. Pricing is stored in a versioned table in TimescaleDB (with effective_from / effective_to dates), meaning historical cost accuracy is preserved even after provider pricing changes.
Supported providers
| Provider | Drop-in class | Special token handling |
|---|---|---|
| Anthropic | kostrack.Anthropic | cache_write, cache_read, thinking tokens |
| OpenAI | kostrack.OpenAI | cached_prompt, reasoning_tokens |
| Gemini | kostrack.GenerativeModel | context_cache, thoughts_tokens |
Architecture decisions
| Decision | Rationale |
|---|---|
| Wrapper, not proxy | No extra server in the call path. One import change. Metadata injection at the call site. |
| TimescaleDB | Postgres extension with hypertables, continuous aggregates, and retention policies — perfect for time-series cost data. |
| Grafana, not custom UI | Production-grade dashboards immediately, without building frontend. Engineers already have it. |
| Write-time cost calculation | Dashboard queries aggregate pre-computed cost_usd values — fast even over millions of rows. |
| SQLite fallback | Kostrack is an observability tool. It must never be a critical path dependency. |