Documentation
Everything you need to add real-time cost enforcement to your AI agents.
Installation
pip install agentbudgetPython 3.9+ · Go 1.21+ · Node.js 18+. Zero external dependencies in all three SDKs.
For Python LangChain integration:
pip install agentbudget[langchain]Quickstart
AgentBudget offers two modes: drop-in (zero code changes) and manual (explicit wrapping).
Drop-in Mode Recommended
Add two lines to the top of your script. Every OpenAI and Anthropic call is tracked automatically.
import agentbudget
import openai
agentbudget.init("$5.00")
# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
print(agentbudget.spent()) # e.g. 0.0035
print(agentbudget.remaining()) # e.g. 4.9965
print(agentbudget.report()) # Full cost breakdown
agentbudget.teardown() # Stop tracking, get final reportagentbudget.init() monkey-patches Completions.create and Messages.create on the OpenAI and Anthropic SDKs. Same pattern used by Sentry, Datadog, and other observability tools.Drop-in API
| Function | Description |
|---|---|
agentbudget.init(budget) | Start tracking. Patches OpenAI/Anthropic. Returns the session. |
agentbudget.spent() | Total dollars spent so far. |
agentbudget.remaining() | Dollars left in the budget. |
agentbudget.report() | Full cost breakdown as a dict. |
agentbudget.track(result, cost, tool_name) | Manually track a tool/API call cost. |
agentbudget.wrap_client(client, session) | Attach tracking to a specific client instance only. |
agentbudget.register_model(name, input, output) | Add pricing for a new model at runtime. |
agentbudget.register_models(dict) | Batch register pricing for multiple models. |
agentbudget.get_session() | Get the active session for advanced use. |
agentbudget.teardown() | Stop tracking, unpatch SDKs, return final report. |
Manual Mode
from agentbudget import AgentBudget
budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
response = session.wrap(
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this..."}]
)
)
data = session.track(call_serp_api(query), cost=0.01, tool_name="serp")
print(session.report())Budget Envelope
A budget envelope is a dollar amount assigned to a unit of work. Every cost is tracked in real time. When exhausted, BudgetExhausted is raised.
# All of these work:
AgentBudget(max_spend="$5.00")
AgentBudget(max_spend="5.00")
AgentBudget(max_spend=5.0)
AgentBudget(max_spend=5)Cost Sources
- LLM calls — Automatically costed using a built-in pricing table. Use
session.wrap(response)or drop-in mode. - Tool calls — External APIs with known per-call costs. Use
session.track(result, cost=0.01). - Decorated functions — Annotate with
@session.track_tool(cost=0.02)to auto-track on every call.
Circuit Breaker
Three levels of protection:
budget = AgentBudget(
max_spend="$5.00",
soft_limit=0.9, # Warn at 90%
max_repeated_calls=10, # Trip after 10 repeated calls
loop_window_seconds=60.0, # Within a 60-second window
on_soft_limit=lambda r: print("Warning: 90% budget used"),
on_hard_limit=lambda r: alert_ops_team(r),
on_loop_detected=lambda r: print("Loop detected!"),
)- Soft limit (default 90%) — Fires a callback. Agent can wrap up gracefully.
- Hard limit (100%) — Raises
BudgetExhausted. No more calls. - Loop detection — Catches repeated calls before they drain the budget. Raises
LoopDetected.
Cost Report
{
"session_id": "sess_abc123",
"budget": 5.00,
"total_spent": 3.42,
"remaining": 1.58,
"breakdown": {
"llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80}},
"tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05}}
},
"duration_seconds": 34.2,
"terminated_by": null,
"events": [...]
}Streaming Support
Streaming responses (stream=True) are fully tracked. Cost is recorded after the stream is exhausted — every chunk passes through to your code unchanged.
agentbudget.init("$5.00")
client = openai.OpenAI()
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this report"}],
stream=True,
stream_options={"include_usage": True}, # required for OpenAI
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
print(agentbudget.spent()) # cost recorded after stream exhaustedstream_options={"include_usage": True} for token counts to appear on the final chunk. Without it, streaming calls are silently tracked as $0.00 — no error, just no cost. Anthropic streams always include usage automatically.Both for-loop and context-manager patterns are supported, sync and async:
# async for
async for chunk in await client.chat.completions.create(
stream=True, stream_options={"include_usage": True}, ...
):
process(chunk)
# context manager
with client.chat.completions.create(stream=True, ...) as stream:
for chunk in stream:
process(chunk)Per-Client Tracking
By default, agentbudget.init() patches all OpenAI/Anthropic calls globally. For finer control — multiple budgets, isolated clients, or production apps where global side effects are undesirable — use wrap_client():
import agentbudget
from agentbudget import AgentBudget
import openai
budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
# Only this instance is tracked
client = agentbudget.wrap_client(openai.OpenAI(), session)
response = client.chat.completions.create(...) # tracked
other = openai.OpenAI()
other.chat.completions.create(...) # NOT trackedWorks with openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, and anthropic.AsyncAnthropic. Global patching via init() is unchanged — both approaches coexist.
Finalization Reserve
Prevent your agent from being cut off mid-task. Reserve a fraction of the budget exclusively for the final response step — the hard limit fires early, keeping that slice free.
budget = AgentBudget(
max_spend="$1.00",
finalization_reserve=0.05, # hard limit at $0.95, $0.05 reserved for final call
)For manual control, check before the final call with session.would_exceed():
with budget.session() as session:
# ... do work ...
if session.would_exceed(estimated_final_cost):
return "Budget nearly exhausted — here is what was completed: ..."
# Safe to proceed
response = session.wrap(client.chat.completions.create(...))would_exceed(cost) checks against the remaining budget without recording anything. Use it as a pre-flight check before expensive final steps.Async Support
from agentbudget import AgentBudget
budget = AgentBudget(max_spend="$5.00")
async with budget.async_session() as session:
response = await session.wrap_async(
client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
)
@session.track_tool(cost=0.01)
async def async_search(query):
return await api.search(query)Nested Budgets
Parent sessions allocate sub-budgets to child tasks. When the child finishes, its total spend is charged to the parent.
with budget.session() as parent:
child = parent.child_session(max_spend=2.0)
with child:
child.track("result", cost=1.50, tool_name="sub_task")
print(parent.spent) # 1.50
print(parent.remaining) # 8.50max_spend and the parent's remaining balance.Webhooks
budget = AgentBudget(
max_spend="$5.00",
webhook_url="https://your-app.com/api/budget-events",
)Events are sent as JSON POST requests with event_type ("soft_limit", "hard_limit", "loop_detected") and the full cost report. Failures are logged but never raise.
Event Callbacks
budget = AgentBudget(
max_spend="$5.00",
on_soft_limit=lambda r: logger.warning(f"90% used: {r}"),
on_hard_limit=lambda r: alert_ops_team(r),
on_loop_detected=lambda r: logger.error(f"Loop: {r}"),
)When webhook_url is also set, both your callback and the webhook fire.
LangChain IntegrationPython only
Go and TypeScript integrations are coming in a future release.
pip install agentbudget[langchain]from agentbudget.integrations.langchain import LangChainBudgetCallback
callback = LangChainBudgetCallback(budget="$5.00")
agent.run(
"Research competitors in the CRM space",
callbacks=[callback]
)
print(callback.get_report())CrewAI IntegrationPython only
Go and TypeScript integrations are coming in a future release.
from agentbudget.integrations.crewai import CrewAIBudgetMiddleware
with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
result = middleware.track(
crew.kickoff(),
cost=0.50,
tool_name="crew_run"
)
print(middleware.get_report())API Reference
AgentBudget
AgentBudget(
max_spend: str | float | int,
soft_limit: float = 0.9,
max_repeated_calls: int = 10,
loop_window_seconds: float = 60.0,
on_soft_limit: Callable = None,
on_hard_limit: Callable = None,
on_loop_detected: Callable = None,
webhook_url: str = None,
finalization_reserve: float = 0.0, # fraction of budget reserved for final step
)| Method | Returns | Description |
|---|---|---|
.session() | BudgetSession | Create a sync budget session |
.async_session() | AsyncBudgetSession | Create an async budget session |
.max_spend | float | The configured budget amount |
BudgetSession
| Method / Property | Description |
|---|---|
.wrap(response) | Extract model/tokens from LLM response and record cost. Returns response. |
.track(result, cost, tool_name) | Record a tool call cost. Returns the result. |
.track_tool(cost, tool_name) | Decorator that tracks cost on every call. |
.child_session(max_spend) | Create child session with sub-budget. Costs roll up. |
.would_exceed(cost) | Returns True if cost would exceed the remaining budget. Does not record anything. |
.report() | Full cost report as a dict. |
.spent | Total dollars spent (float). |
.remaining | Dollars remaining (float). |
Supported Models
Built-in pricing for 50+ models. Updated February 2026.
OpenAI
| Model | Input / 1M | Output / 1M |
|---|---|---|
| gpt-4.1 | $2.00 | $8.00 |
| gpt-4.1-mini | $0.40 | $1.60 |
| gpt-4.1-nano | $0.10 | $0.40 |
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| o3 | $2.00 | $8.00 |
| o3-mini | $1.10 | $4.40 |
| o4-mini | $1.10 | $4.40 |
| o1 | $15.00 | $60.00 |
Anthropic
| Model | Input / 1M | Output / 1M |
|---|---|---|
| claude-opus-4-6 | $5.00 | $25.00 |
| claude-sonnet-4.5 | $3.00 | $15.00 |
| claude-haiku-4.5 | $1.00 | $5.00 |
| claude-3.5-sonnet | $3.00 | $15.00 |
| claude-3.5-haiku | $0.80 | $4.00 |
Google Gemini
| Model | Input / 1M | Output / 1M |
|---|---|---|
| gemini-2.5-pro | $1.25 | $10.00 |
| gemini-2.5-flash | $0.30 | $2.50 |
| gemini-2.0-flash | $0.10 | $0.40 |
| gemini-1.5-pro | $1.25 | $5.00 |
Mistral & Cohere
| Model | Input / 1M | Output / 1M |
|---|---|---|
| mistral-large | $0.50 | $1.50 |
| mistral-small | $0.03 | $0.11 |
| codestral | $0.30 | $0.90 |
| command-r-plus | $2.50 | $10.00 |
register_model() or submit a PR to pricing.json and run python scripts/generate_pricing.py.Custom Model Pricing
New model just launched? Don't wait for a release — register pricing at runtime.
Single model
import agentbudget
agentbudget.register_model(
"gpt-5",
input_price_per_million=5.00,
output_price_per_million=20.00,
)Batch register
agentbudget.register_models({
"gpt-5": (5.00, 20.00),
"gpt-5-mini": (0.50, 2.00),
})Fuzzy matching
Dated model variants are automatically matched to their base model. For example, gpt-4o-2025-06-15 automatically uses gpt-4o pricing.
register_model) → Built-in table → Fuzzy match (strip date suffixes) → OpenRouter prefix strip ("openai/gpt-4o" → "gpt-4o").Exceptions
| Exception | When |
|---|---|
BudgetExhausted | Session exceeded its dollar budget (hard limit). |
LoopDetected | Repeated calls to the same tool/model detected. |
InvalidBudget | Budget string couldn't be parsed. |
AgentBudgetError | Base exception for all AgentBudget errors. |