LLM Monitoring
Track LLM usage, costs, latency, and errors across your AI applications
Cased Telemetry provides observability for LLM calls in your applications. Track token usage, estimate costs, monitor latency, and debug errors across all your AI/LLM integrations.
Quick Start
Section titled “Quick Start”1. Get Your DSN
Section titled “1. Get Your DSN”Create a telemetry project in Cased and copy your DSN. It looks like:
https://<public_key>@app.cased.com/api/<project_id>2. Instrument Your Code
Section titled “2. Instrument Your Code”import requests
CASED_DSN = "https://abc123@app.cased.com/api/1"
def track_llm_call(model, provider, input_tokens, output_tokens, latency_ms, **kwargs): """Send LLM call metrics to Cased.""" # Parse DSN import re match = re.match(r'https://([^@]+)@([^/]+)/api/(\d+)', CASED_DSN) public_key, host, project_id = match.groups()
requests.post( f"https://{host}/api/{project_id}/llm/", headers={ "X-Sentry-Auth": f"Sentry sentry_key={public_key}", "Content-Type": "application/json" }, json={ "model": model, "provider": provider, "input_tokens": input_tokens, "output_tokens": output_tokens, "latency_ms": latency_ms, **kwargs } )
# Example: Track an OpenAI callimport timeimport openai
start = time.time()response = openai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}])latency = (time.time() - start) * 1000
track_llm_call( model="gpt-4o", provider="openai", input_tokens=response.usage.prompt_tokens, output_tokens=response.usage.completion_tokens, latency_ms=latency, session_id="conversation-123", # Group multi-turn conversations trace_id="rag-pipeline-456", # Group related LLM calls)const CASED_DSN = "https://abc123@app.cased.com/api/1";
async function trackLLMCall(data: { model: string; provider: string; input_tokens: number; output_tokens: number; latency_ms: number; session_id?: string; trace_id?: string; success?: boolean; error?: string;}) { const match = CASED_DSN.match(/https:\/\/([^@]+)@([^/]+)\/api\/(\d+)/); if (!match) throw new Error("Invalid DSN"); const [, publicKey, host, projectId] = match;
await fetch(`https://${host}/api/${projectId}/llm/`, { method: "POST", headers: { "X-Sentry-Auth": `Sentry sentry_key=${publicKey}`, "Content-Type": "application/json", }, body: JSON.stringify(data), });}
// Example: Track an OpenAI callconst start = Date.now();const response = await openai.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }],});
await trackLLMCall({ model: "gpt-4o", provider: "openai", input_tokens: response.usage.prompt_tokens, output_tokens: response.usage.completion_tokens, latency_ms: Date.now() - start, session_id: "conversation-123",});Event Fields
Section titled “Event Fields”| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model name (e.g., gpt-4o, claude-sonnet-4) |
provider | string | No | Provider name (e.g., openai, anthropic) |
input_tokens | int | No | Number of input/prompt tokens |
output_tokens | int | No | Number of output/completion tokens |
cached_tokens | int | No | Number of cached tokens (prompt caching) |
latency_ms | float | No | Request latency in milliseconds |
success | bool | No | Whether the call succeeded (default: true) |
error | string | No | Error message if call failed |
session_id | string | No | Group multi-turn conversations |
trace_id | string | No | Group related LLM calls (e.g., RAG pipeline) |
tags | object | No | Custom key-value tags |
environment | string | No | Environment name (default: production) |
release | string | No | Application version |
Grouping Calls
Section titled “Grouping Calls”Session ID
Section titled “Session ID”Use session_id to group multi-turn conversations:
# All calls in a conversation share the same session_idtrack_llm_call(model="gpt-4o", session_id="conv-abc123", ...)track_llm_call(model="gpt-4o", session_id="conv-abc123", ...) # Same sessionTrace ID
Section titled “Trace ID”Use trace_id to group related LLM calls in a pipeline:
# RAG pipeline: embedding + retrieval + generationtrace_id = "rag-" + str(uuid.uuid4())
track_llm_call(model="text-embedding-3-small", trace_id=trace_id, ...) # Embeddingtrack_llm_call(model="gpt-4o", trace_id=trace_id, ...) # GenerationBatch Ingestion
Section titled “Batch Ingestion”For high-volume applications, send multiple events in a single request:
requests.post( f"https://{host}/api/{project_id}/llm/batch/", headers={"X-Sentry-Auth": f"Sentry sentry_key={public_key}"}, json={ "events": [ {"model": "gpt-4o", "input_tokens": 100, ...}, {"model": "gpt-4o", "input_tokens": 150, ...}, {"model": "claude-sonnet-4", "input_tokens": 200, ...}, ] })Maximum batch size: 100 events.
Query API
Section titled “Query API”Query your LLM metrics via the API or CLI.
Usage Statistics
Section titled “Usage Statistics”curl -H "Authorization: Token YOUR_API_KEY" \ "https://app.cased.com/api/v1/telemetry/query/llm/usage/?since=24h&group_by=model"Response:
{ "usage_stats": [ { "model": "gpt-4o", "call_count": 1250, "total_input_tokens": 2500000, "total_output_tokens": 750000 } ], "totals": { "calls": 1250, "input_tokens": 2500000, "output_tokens": 750000 }}Cost Estimates
Section titled “Cost Estimates”curl -H "Authorization: Token YOUR_API_KEY" \ "https://app.cased.com/api/v1/telemetry/query/llm/cost/?since=24h"Latency Percentiles
Section titled “Latency Percentiles”curl -H "Authorization: Token YOUR_API_KEY" \ "https://app.cased.com/api/v1/telemetry/query/llm/latency/?since=1h"Response includes p50, p95, p99 latencies:
{ "latency_stats": [ { "model": "gpt-4o", "avg_ms": 1234.5, "p50_ms": 1100.0, "p95_ms": 2500.0, "p99_ms": 3200.0 } ]}Error Analysis
Section titled “Error Analysis”curl -H "Authorization: Token YOUR_API_KEY" \ "https://app.cased.com/api/v1/telemetry/query/llm/errors/?since=24h"Session Rollups
Section titled “Session Rollups”Analyze cost and usage per conversation:
curl -H "Authorization: Token YOUR_API_KEY" \ "https://app.cased.com/api/v1/telemetry/query/llm/sessions/?sort_by=cost&limit=20"CLI Commands
Section titled “CLI Commands”If you have the Cased CLI installed:
# Token usage by modelcased llm usage --since 24h
# Cost estimatescased llm cost --since 24h --model gpt-4o
# Latency percentilescased llm latency --since 1h
# Error ratescased llm errors --since 24h
# Overall summarycased llm summary --since 24h
# Per-session analysiscased llm sessions --sort-by cost --limit 20Supported Models
Section titled “Supported Models”Cost estimates are calculated using current pricing for:
| Provider | Models |
|---|---|
| Anthropic | Claude Opus 4.5, Sonnet 4.5, Haiku 4.5, and 3.x series |
| OpenAI | GPT-5.2, GPT-5, GPT-4.1, GPT-4o, o3, o1 |
| Gemini 3 Pro/Flash, 2.5/2.0 series | |
| Mistral | Large, Medium 3, Small 3 |
| DeepSeek | V3, chat, reasoner |
| Meta | Llama 3.3-70b, 3.1-405b |
Unknown models use default pricing estimates.