Skip to content

Python SDK

The proxy’s heuristics are good, but sometimes you know more than they can infer: this block is a RAG chunk, that one is history, these are the tools. The Python SDK lets you label components yourself and records straight into the same local database the dashboard reads - no base-URL change needed.

Terminal window
pip install llmprof
import llmprof
with llmprof.profile(model="gpt-4o") as p:
p.add("system prompt", system_text)
p.add("rag_chunk", retrieved_doc, name="kb#42")
p.add("tool", search_schema, name="search", called=True)
resp = client.chat.completions.create(...)
p.usage(resp.usage) # exact prompt/completion tokens + cost
# the trace shows up in the dashboard with precise component labels

When the with block exits, the trace is recorded (idempotent - calling p.record() yourself is safe too).

p.add(component, content, *, name=None, called=False)

Section titled “p.add(component, content, *, name=None, called=False)”

Tags a component and returns the token count of content (counted with tiktoken). content can be a string or any JSON-serializable object (a tool schema, a dict). Friendly labels map onto the dashboard’s component buckets:

You passShows as
system / system promptsystem prompt
user / inputuser input
history / assistanthistory (assistant)
tool / toolstool schemas (with name as a drill-down child)
rag / rag_chunk / retrievedrag chunks (with name as a child)
tool_result / tool_resultstool results

Pass called=True (or call p.called("search", ...)) to mark which tools the model actually used, so the waste detector can flag the unused ones.

Set exact token counts. Accepts a provider usage object or dict, or explicit numbers:

p.usage(resp.usage) # OpenAI/Anthropic usage object
p.usage(prompt_tokens=1234, completion_tokens=56) # explicit
p.usage(prompt_tokens=1234, completion_tokens=56, cached_tokens=400)

If you never call usage(), llmprof falls back to the summed token count of the components you added.

To profile a whole function, wrap it and tag components inside with the module-level helpers (they target the active profile):

@llmprof.profiled(model="gpt-4o")
def answer(question: str) -> str:
llmprof.add("system prompt", SYSTEM)
llmprof.add("user input", question)
resp = client.chat.completions.create(...)
llmprof.usage(resp.usage)
return resp.choices[0].message.content

profile() and profiled() accept:

  • model - model id, used for tokenizer choice and pricing (default gpt-4o).
  • provider - "openai" (default) or "anthropic", for labeling.
  • session - an explicit run id, to group calls into a timeline.
  • db_path - write to a specific SQLite file instead of the default.

By default the SDK writes to the same store as the proxy (~/.llmprof/llmprof.db), so SDK traces and proxied traces appear together in one dashboard.