Documentation
Everything you need to add real-time cost enforcement to your AI agents.
Installation
pip install agentbudgetPython 3.9+ · Go 1.21+ · Node.js 18+. Zero external dependencies in all three SDKs.
Optional Python integrations:
pip install agentbudget[langchain] # LangChain / LangGraph
pip install agentbudget[autogen] # AutoGenQuickstart
AgentBudget offers two modes: drop-in (zero code changes) and manual (explicit wrapping).
Drop-in Mode Recommended
Add two lines to the top of your script. Every OpenAI and Anthropic call is tracked automatically.
import agentbudget
import openai
agentbudget.init("$5.00")
# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
print(agentbudget.spent()) # e.g. 0.0035
print(agentbudget.remaining()) # e.g. 4.9965
print(agentbudget.report()) # Full cost breakdown
agentbudget.teardown() # Stop tracking, get final reportagentbudget.init() monkey-patches Completions.create and Messages.create on the OpenAI and Anthropic SDKs. Same pattern used by Sentry, Datadog, and other observability tools. The patch is process-wide, but the active session is scoped to the current thread or async task — so concurrent requests each get their own budget and don't overwrite each other.Drop-in API
| Function | Description |
|---|---|
agentbudget.init(budget) | Start tracking. Patches OpenAI/Anthropic. Returns the session. |
agentbudget.spent() | Total dollars spent so far. |
agentbudget.remaining() | Dollars left in the budget. |
agentbudget.report() | Full cost breakdown as a dict. |
agentbudget.track(result, cost, tool_name) | Manually track a tool/API call cost. |
agentbudget.wrap_client(client, session) | Attach tracking to a specific client instance only. |
agentbudget.register_model(name, input, output) | Add pricing for a new model at runtime. |
agentbudget.register_models(dict) | Batch register pricing for multiple models. |
agentbudget.get_session() | Get the active session for advanced use. |
agentbudget.teardown() | Stop tracking, unpatch SDKs, return final report. |
Manual Mode
from agentbudget import AgentBudget
budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
response = session.wrap(
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this..."}]
)
)
data = session.track(call_serp_api(query), cost=0.01, tool_name="serp")
print(session.report())Budget Envelope
A budget envelope is a dollar amount assigned to a unit of work. Every cost is tracked in real time. When exhausted, BudgetExhausted is raised.
# All of these work:
AgentBudget(max_spend="$5.00")
AgentBudget(max_spend="5.00")
AgentBudget(max_spend=5.0)
AgentBudget(max_spend=5)Cost Sources
- LLM calls — Automatically costed using a built-in pricing table. Use
session.wrap(response)or drop-in mode. - Tool calls — External APIs with known per-call costs. Use
session.track(result, cost=0.01). - Decorated functions — Annotate with
@session.track_tool(cost=0.02)to auto-track on every call.
Circuit Breaker
Three levels of protection:
budget = AgentBudget(
max_spend="$5.00",
soft_limit=0.9, # Warn at 90%
max_repeated_calls=10, # Trip after 10 repeated calls
loop_window_seconds=60.0, # Within a 60-second window
on_soft_limit=lambda r: print("Warning: 90% budget used"),
on_hard_limit=lambda r: alert_ops_team(r),
on_loop_detected=lambda r: print("Loop detected!"),
)- Soft limit (default 90%) — Fires a callback. Agent can wrap up gracefully.
- Hard limit (100%) — Raises
BudgetExhausted. No more calls. - Loop detection — Catches repeated calls before they drain the budget. Raises
LoopDetected.
Cost Report
{
"session_id": "sess_abc123",
"budget": 5.00,
"total_spent": 3.42,
"remaining": 1.58,
"breakdown": {
"llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80}},
"tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05}}
},
"duration_seconds": 34.2,
"terminated_by": null,
"events": [...]
}Streaming Support
Streaming responses (stream=True) are fully tracked. Cost is recorded even if you break out of the stream early — every chunk passes through to your code unchanged.
agentbudget.init("$5.00")
client = openai.OpenAI()
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this report"}],
stream=True, # include_usage is added automatically for OpenAI
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
print(agentbudget.spent()) # cost recorded after the streamwrap_client(), AgentBudget adds stream_options={"include_usage": True} automatically, so token counts appear on the final chunk — you don't need to pass it (an explicit value is respected). Anthropic streams always include usage. Only set it yourself if you call OpenAI directly and pass the result to session.wrap().Both for-loop and context-manager patterns are supported, sync and async:
# async for
async for chunk in await client.chat.completions.create(
stream=True, ...
):
process(chunk)
# context manager
with client.chat.completions.create(stream=True, ...) as stream:
for chunk in stream:
process(chunk)Per-Client Tracking
By default, agentbudget.init() patches all OpenAI/Anthropic calls globally. For finer control — multiple budgets, isolated clients, or production apps where global side effects are undesirable — use wrap_client():
import agentbudget
from agentbudget import AgentBudget
import openai
budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
# Only this instance is tracked
client = agentbudget.wrap_client(openai.OpenAI(), session)
response = client.chat.completions.create(...) # tracked
other = openai.OpenAI()
other.chat.completions.create(...) # NOT trackedWorks with openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, and anthropic.AsyncAnthropic. Global patching via init() is unchanged — both approaches coexist.
Finalization Reserve
Prevent your agent from being cut off mid-task. Reserve a fraction of the budget exclusively for the final response step — the hard limit fires early, keeping that slice free.
budget = AgentBudget(
max_spend="$1.00",
finalization_reserve=0.05, # hard limit at $0.95, $0.05 reserved for final call
)For manual control, check before the final call with session.would_exceed():
with budget.session() as session:
# ... do work ...
if session.would_exceed(estimated_final_cost):
return "Budget nearly exhausted — here is what was completed: ..."
# Safe to proceed
response = session.wrap(client.chat.completions.create(...))would_exceed(cost) checks against the remaining budget without recording anything. Use it as a pre-flight check before expensive final steps.Async Support
from agentbudget import AgentBudget
budget = AgentBudget(max_spend="$5.00")
async with budget.async_session() as session:
response = await session.wrap_async(
client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
)
@session.track_tool(cost=0.01)
async def async_search(query):
return await api.search(query)Nested Budgets
Parent sessions allocate sub-budgets to child tasks. When the child finishes, its total spend is charged to the parent.
with budget.session() as parent:
child = parent.child_session(max_spend=2.0)
with child:
child.track("result", cost=1.50, tool_name="sub_task")
print(parent.spent) # 1.50
print(parent.remaining) # 8.50max_spend and the parent's remaining balance.Webhooks
budget = AgentBudget(
max_spend="$5.00",
webhook_url="https://your-app.com/api/budget-events",
)Events are sent as JSON POST requests with event_type ("soft_limit", "hard_limit", "loop_detected") and the full cost report. Failures are logged but never raise.
Event Callbacks
budget = AgentBudget(
max_spend="$5.00",
on_soft_limit=lambda r: logger.warning(f"90% used: {r}"),
on_hard_limit=lambda r: alert_ops_team(r),
on_loop_detected=lambda r: logger.error(f"Loop: {r}"),
)When webhook_url is also set, both your callback and the webhook fire.
LangChain IntegrationPython only
Go and TypeScript integrations are coming in a future release.
pip install agentbudget[langchain]from agentbudget.integrations.langchain import LangChainBudgetCallback
# Use as a context manager so the session is finalized.
with LangChainBudgetCallback(budget="$5.00") as callback:
agent.invoke(
{"input": "Research competitors in the CRM space"},
config={"callbacks": [callback]},
)
print(callback.get_report())Costs are tracked from both legacy LLMResult usage and modern chat-model usage_metadata, so LangGraph runs and chat models are counted correctly. Pass tool_costs={"web_search": 0.01} to also charge tool calls against the budget.
CrewAI IntegrationPython only
Go and TypeScript integrations are coming in a future release.
from agentbudget.integrations.crewai import CrewAIBudgetMiddleware
with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
result = middleware.track(
crew.kickoff(),
cost=0.50,
tool_name="crew_run"
)
print(middleware.get_report())AutoGen IntegrationPython only
Drop-in subclasses for AutoGen agents with built-in budget enforcement and cost tracking.
pip install agentbudget[autogen]from agentbudget.integrations.autogen import BudgetedAssistantAgent, BudgetedUserProxyAgent
assistant = BudgetedAssistantAgent(name="assistant", budget="$5.00")
user = BudgetedUserProxyAgent(name="user", budget="$5.00")
user.initiate_chat(
assistant,
message="Research competitors in the CRM space"
)
print(assistant.get_report())For patching existing agent instances without subclassing:
from agentbudget.integrations.autogen import AutoGenBudgetTracker
tracker = AutoGenBudgetTracker(budget="$5.00")
tracker.attach(existing_assistant)
# BudgetExhausted is raised automatically when the limit is hit
print(tracker.get_report())API Reference
AgentBudget
AgentBudget(
max_spend: str | float | int,
soft_limit: float = 0.9,
max_repeated_calls: int = 10,
loop_window_seconds: float = 60.0,
on_soft_limit: Callable = None,
on_hard_limit: Callable = None,
on_loop_detected: Callable = None,
webhook_url: str = None,
finalization_reserve: float = 0.0, # fraction of budget reserved for final step
)| Method | Returns | Description |
|---|---|---|
.session() | BudgetSession | Create a sync budget session |
.async_session() | AsyncBudgetSession | Create an async budget session |
.max_spend | float | The configured budget amount |
BudgetSession
| Method / Property | Description |
|---|---|
.wrap(response) | Extract model/tokens from LLM response and record cost. Returns response. |
.track(result, cost, tool_name) | Record a tool call cost. Returns the result. |
.track_tool(cost, tool_name) | Decorator that tracks cost on every call. |
.child_session(max_spend) | Create child session with sub-budget. Costs roll up. |
.would_exceed(cost) | Returns True if cost would exceed the remaining budget. Does not record anything. |
.report() | Full cost report as a dict. |
.spent | Total dollars spent (float). |
.remaining | Dollars remaining (float). |
Supported Models
Built-in pricing for 50+ models. Updated February 2026.
OpenAI
| Model | Input / 1M | Output / 1M |
|---|---|---|
| gpt-4.1 | $2.00 | $8.00 |
| gpt-4.1-mini | $0.40 | $1.60 |
| gpt-4.1-nano | $0.10 | $0.40 |
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| o3 | $2.00 | $8.00 |
| o3-mini | $1.10 | $4.40 |
| o4-mini | $1.10 | $4.40 |
| o1 | $15.00 | $60.00 |
Anthropic
| Model | Input / 1M | Output / 1M |
|---|---|---|
| claude-opus-4-6 | $5.00 | $25.00 |
| claude-sonnet-4.5 | $3.00 | $15.00 |
| claude-haiku-4.5 | $1.00 | $5.00 |
| claude-3.5-sonnet | $3.00 | $15.00 |
| claude-3.5-haiku | $0.80 | $4.00 |
Google Gemini
| Model | Input / 1M | Output / 1M |
|---|---|---|
| gemini-2.5-pro | $1.25 | $10.00 |
| gemini-2.5-flash | $0.30 | $2.50 |
| gemini-2.0-flash | $0.10 | $0.40 |
| gemini-1.5-pro | $1.25 | $5.00 |
Mistral & Cohere
| Model | Input / 1M | Output / 1M |
|---|---|---|
| mistral-large | $0.50 | $1.50 |
| mistral-small | $0.03 | $0.11 |
| codestral | $0.30 | $0.90 |
| command-r-plus | $2.50 | $10.00 |
register_model() or submit a PR to pricing.json and run python scripts/generate_pricing.py.Custom Model Pricing
New model just launched? Don't wait for a release — register pricing at runtime.
Single model
import agentbudget
agentbudget.register_model(
"gpt-5",
input_price_per_million=5.00,
output_price_per_million=20.00,
)Batch register
agentbudget.register_models({
"gpt-5": (5.00, 20.00),
"gpt-5-mini": (0.50, 2.00),
})Fuzzy matching
Dated model variants are automatically matched to their base model. For example, gpt-4o-2025-06-15 automatically uses gpt-4o pricing.
register_model) → Built-in table → Fuzzy match (strip date suffixes) → OpenRouter prefix strip ("openai/gpt-4o" → "gpt-4o").Exceptions
| Exception | When |
|---|---|
BudgetExhausted | Session exceeded its dollar budget (hard limit). |
LoopDetected | Repeated calls to the same tool/model detected. |
InvalidBudget | Budget string couldn't be parsed. |
InvalidCost | A tracked cost was negative, NaN, or infinite. |
AgentBudgetError | Base exception for all AgentBudget errors. |