Skip to content

Quickstart

The whole idea: change one line in your app (the base URL), keep using your real API key, and watch every call show up in the dashboard broken down token by token.

  1. Start the proxy.

    Terminal window
    llmprof up

    It listens on http://localhost:4000 and routes each request to the right provider, so one instance profiles both OpenAI and Anthropic clients.

  2. Point your client at it. Set the base URL to the proxy and leave your API key as-is.

    from openai import OpenAI
    client = OpenAI(base_url="http://localhost:4000/v1") # your key still works
    client.chat.completions.create(
    model="gpt-4o",
    messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize this contract..."},
    ],
    tools=[...],
    )
  3. Open the dashboard. Go to http://localhost:4000. Your call appears in the list on the left; click it to see the context flame graph, the optimization findings, and how much is reclaimable.

The example above uses the Python OpenAI client, but nothing here is Python-specific. The proxy works with any OpenAI- or Anthropic-compatible client in any language: point its base URL at http://localhost:4000, leave the key alone, and you are profiling. The launcher itself runs without Python too (npx llmprof up).

  • The flame graph shows where the prompt tokens went: system prompt, tool schemas (with each tool as a child you can click into), history, and the current input. See Context flame graph.
  • The optimization panel lists concrete waste with a per-call reclaimable number. See The waste detector.
  • Switch to trends and timeline at the top for day-over-day usage and per-turn context growth.