Skip to content

LLM Monitoring

Track LLM usage, costs, latency, and errors across your AI applications

Cased Telemetry provides observability for LLM calls in your applications. Track token usage, estimate costs, monitor latency, and debug errors across all your AI/LLM integrations.

Create a telemetry project in Cased and copy your DSN. It looks like:

https://<public_key>@app.cased.com/api/<project_id>
import requests
CASED_DSN = "https://abc123@app.cased.com/api/1"
def track_llm_call(model, provider, input_tokens, output_tokens, latency_ms, **kwargs):
"""Send LLM call metrics to Cased."""
# Parse DSN
import re
match = re.match(r'https://([^@]+)@([^/]+)/api/(\d+)', CASED_DSN)
public_key, host, project_id = match.groups()
requests.post(
f"https://{host}/api/{project_id}/llm/",
headers={
"X-Sentry-Auth": f"Sentry sentry_key={public_key}",
"Content-Type": "application/json"
},
json={
"model": model,
"provider": provider,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"latency_ms": latency_ms,
**kwargs
}
)
# Example: Track an OpenAI call
import time
import openai
start = time.time()
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
latency = (time.time() - start) * 1000
track_llm_call(
model="gpt-4o",
provider="openai",
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
latency_ms=latency,
session_id="conversation-123", # Group multi-turn conversations
trace_id="rag-pipeline-456", # Group related LLM calls
)
FieldTypeRequiredDescription
modelstringYesModel name (e.g., gpt-4o, claude-sonnet-4)
providerstringNoProvider name (e.g., openai, anthropic)
input_tokensintNoNumber of input/prompt tokens
output_tokensintNoNumber of output/completion tokens
cached_tokensintNoNumber of cached tokens (prompt caching)
latency_msfloatNoRequest latency in milliseconds
successboolNoWhether the call succeeded (default: true)
errorstringNoError message if call failed
session_idstringNoGroup multi-turn conversations
trace_idstringNoGroup related LLM calls (e.g., RAG pipeline)
tagsobjectNoCustom key-value tags
environmentstringNoEnvironment name (default: production)
releasestringNoApplication version

Use session_id to group multi-turn conversations:

# All calls in a conversation share the same session_id
track_llm_call(model="gpt-4o", session_id="conv-abc123", ...)
track_llm_call(model="gpt-4o", session_id="conv-abc123", ...) # Same session

Use trace_id to group related LLM calls in a pipeline:

# RAG pipeline: embedding + retrieval + generation
trace_id = "rag-" + str(uuid.uuid4())
track_llm_call(model="text-embedding-3-small", trace_id=trace_id, ...) # Embedding
track_llm_call(model="gpt-4o", trace_id=trace_id, ...) # Generation

For high-volume applications, send multiple events in a single request:

requests.post(
f"https://{host}/api/{project_id}/llm/batch/",
headers={"X-Sentry-Auth": f"Sentry sentry_key={public_key}"},
json={
"events": [
{"model": "gpt-4o", "input_tokens": 100, ...},
{"model": "gpt-4o", "input_tokens": 150, ...},
{"model": "claude-sonnet-4", "input_tokens": 200, ...},
]
}
)

Maximum batch size: 100 events.

Query your LLM metrics via the API or CLI.

Terminal window
curl -H "Authorization: Token YOUR_API_KEY" \
"https://app.cased.com/api/v1/telemetry/query/llm/usage/?since=24h&group_by=model"

Response:

{
"usage_stats": [
{
"model": "gpt-4o",
"call_count": 1250,
"total_input_tokens": 2500000,
"total_output_tokens": 750000
}
],
"totals": {
"calls": 1250,
"input_tokens": 2500000,
"output_tokens": 750000
}
}
Terminal window
curl -H "Authorization: Token YOUR_API_KEY" \
"https://app.cased.com/api/v1/telemetry/query/llm/cost/?since=24h"
Terminal window
curl -H "Authorization: Token YOUR_API_KEY" \
"https://app.cased.com/api/v1/telemetry/query/llm/latency/?since=1h"

Response includes p50, p95, p99 latencies:

{
"latency_stats": [
{
"model": "gpt-4o",
"avg_ms": 1234.5,
"p50_ms": 1100.0,
"p95_ms": 2500.0,
"p99_ms": 3200.0
}
]
}
Terminal window
curl -H "Authorization: Token YOUR_API_KEY" \
"https://app.cased.com/api/v1/telemetry/query/llm/errors/?since=24h"

Analyze cost and usage per conversation:

Terminal window
curl -H "Authorization: Token YOUR_API_KEY" \
"https://app.cased.com/api/v1/telemetry/query/llm/sessions/?sort_by=cost&limit=20"

If you have the Cased CLI installed:

Terminal window
# Token usage by model
cased llm usage --since 24h
# Cost estimates
cased llm cost --since 24h --model gpt-4o
# Latency percentiles
cased llm latency --since 1h
# Error rates
cased llm errors --since 24h
# Overall summary
cased llm summary --since 24h
# Per-session analysis
cased llm sessions --sort-by cost --limit 20

Cost estimates are calculated using current pricing for:

ProviderModels
AnthropicClaude Opus 4.5, Sonnet 4.5, Haiku 4.5, and 3.x series
OpenAIGPT-5.2, GPT-5, GPT-4.1, GPT-4o, o3, o1
GoogleGemini 3 Pro/Flash, 2.5/2.0 series
MistralLarge, Medium 3, Small 3
DeepSeekV3, chat, reasoner
MetaLlama 3.3-70b, 3.1-405b

Unknown models use default pricing estimates.