Getting Started

Introduction

Kostrack is an AI API cost governance platform. It tracks, attributes, and governs every LLM API call — by project, feature, user, and workflow — and gives every person on the team a way to see and control what AI spend is going where.

What problem does it solve?

Once you ship AI features to production, a question becomes urgent fast: where is the API spend going? Your Anthropic dashboard shows total token spend. It doesn't tell you which feature costs the most, which agent workflow is the most expensive to run, or which team is over budget.

Helicone and LangSmith solve the ML engineer's problem — prompt logging, evals, model quality. Kostrack solves the engineering lead and CFO's problem: financial governance of AI API spend.

Key differentiator

Kostrack is the only open-source tool with agentic cost rollup — the ability to attribute the total cost of a multi-step LangGraph workflow to a single traceable business action, not just 15 disconnected API calls.

How it works

Kostrack wraps your existing LLM provider clients. You change one import line. Every API call is intercepted after it returns, token counts are extracted, cost is calculated using versioned pricing, and the record is written asynchronously to TimescaleDB. Grafana reads from TimescaleDB and renders your dashboards.

Your App
   ↓
kostrack SDK  ←  wraps: Anthropic / OpenAI / Gemini / DeepSeek
   ↓  (async, <5ms)
TimescaleDB
   ↓
Grafana dashboards

The write path is non-blocking. A background thread batches records and writes them every 5 seconds. If TimescaleDB is unreachable, records buffer to a local SQLite file and flush automatically when connectivity returns. Your LLM calls are never delayed by Kostrack.

Core concepts

Three ways to interact

Engineers instrument code using the Python SDK — one import change, everything else identical. Operators and DevOps teams govern spend from the terminal using the kostrack CLI — no Python knowledge needed beyond installation. Finance teams and managers query data through the Platform API at port 8080 or open Grafana at port 3000 directly in a browser. All three read from the same TimescaleDB instance.

Traces and spans

For agentic workflows, Kostrack provides kostrack.trace() and kostrack.span() context managers. Every LLM call made inside a trace block inherits the same trace_id. Costs roll up from child spans to the root trace, so you can query the total cost of a workflow run as a single unit.

Pricing engine

Cost is calculated at write time — not query time — so dashboard aggregation queries stay fast. Pricing is stored in a versioned table in TimescaleDB (with effective_from / effective_to dates), meaning historical cost accuracy is preserved even after provider pricing changes.

Supported providers

Provider	Drop-in class	Special token handling
Anthropic	`kostrack.Anthropic`	cache_write, cache_read, thinking tokens
OpenAI	`kostrack.OpenAI`	cached_prompt, reasoning_tokens
Gemini	`kostrack.GenerativeModel`	context_cache, thoughts_tokens

Architecture decisions

Decision	Rationale
Wrapper, not proxy	No extra server in the call path. One import change. Metadata injection at the call site.
TimescaleDB	Postgres extension with hypertables, continuous aggregates, and retention policies — perfect for time-series cost data.
Grafana, not custom UI	Production-grade dashboards immediately, without building frontend. Engineers already have it.
Write-time cost calculation	Dashboard queries aggregate pre-computed `cost_usd` values — fast even over millions of rows.
SQLite fallback	Kostrack is an observability tool. It must never be a critical path dependency.

Quick Start

Introduction

What problem does it solve?

How it works

Core concepts

Three ways to interact

Tags

Traces and spans

Pricing engine

Supported providers

Architecture decisions