Quickstart
The whole idea: change one line in your app (the base URL), keep using your real API key, and watch every call show up in the dashboard broken down token by token.
-
Start the proxy.
Terminal window llmprof upIt listens on
http://localhost:4000and routes each request to the right provider, so one instance profiles both OpenAI and Anthropic clients. -
Point your client at it. Set the base URL to the proxy and leave your API key as-is.
from openai import OpenAIclient = OpenAI(base_url="http://localhost:4000/v1") # your key still worksclient.chat.completions.create(model="gpt-4o",messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Summarize this contract..."},],tools=[...],) -
Open the dashboard. Go to http://localhost:4000. Your call appears in the list on the left; click it to see the context flame graph, the optimization findings, and how much is reclaimable.
The example above uses the Python OpenAI client, but nothing here is
Python-specific. The proxy works with any OpenAI- or Anthropic-compatible client
in any language: point its base URL at http://localhost:4000, leave the key
alone, and you are profiling. The launcher itself runs without Python too (npx llmprof up).
What you are looking at
Section titled “What you are looking at”- The flame graph shows where the prompt tokens went: system prompt, tool schemas (with each tool as a child you can click into), history, and the current input. See Context flame graph.
- The optimization panel lists concrete waste with a per-call reclaimable number. See The waste detector.
- Switch to trends and timeline at the top for day-over-day usage and per-turn context growth.
- Using a CLI agent? See Claude Code or the Codex CLI.
- Want precise component labels from your own code? Use the Python SDK or the JavaScript SDK.