OpenAI-compatible clients
Anything that talks the OpenAI /v1/chat/completions API works with llmprof -
the official SDKs, LangChain, LlamaIndex, and the many providers that expose an
OpenAI-compatible endpoint (DeepInfra, Fireworks, Cerebras, DeepSeek, Together,
local servers, …).
Point the base URL at the proxy
Section titled “Point the base URL at the proxy”from openai import OpenAI
client = OpenAI(base_url="http://localhost:4000/v1") # api_key unchangedOr via environment variable, which most tools respect:
export OPENAI_BASE_URL=http://localhost:4000/v1The proxy forwards to OpenAI by default. Your API key is passed through untouched; llmprof never stores it.
Using a different OpenAI-compatible provider
Section titled “Using a different OpenAI-compatible provider”Point the proxy upstream at that provider, then send its model ids as usual:
llmprof up --upstream https://api.deepinfra.com/v1/openai# or: export LLMPROF_UPSTREAM=https://api.fireworks.ai/inference/v1llmprof recognizes 100+ model ids for pricing, including the popular open-weight models on these hosts. If a model is unknown, tokens still show; only the dollar cost is omitted. See Providers & pricing to add or override prices.
Streaming
Section titled “Streaming”Streaming responses are forwarded token-by-token without buffering, so stream=True
behaves exactly as it would against the provider directly. The trace (including
which tools were called) is recorded after the stream completes.