Documentation
Everything you need to add real-time cost enforcement to your AI agents.
Installation
pip install agentbudgetRequires Python 3.9+. No external dependencies.
For LangChain integration:
pip install agentbudget[langchain]Quickstart
AgentBudget offers two modes: drop-in (zero code changes) and manual (explicit wrapping).
Drop-in Mode Recommended
Add two lines to the top of your script. Every OpenAI and Anthropic call is tracked automatically.
import agentbudget
import openai
agentbudget.init("$5.00")
# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
print(agentbudget.spent()) # e.g. 0.0035
print(agentbudget.remaining()) # e.g. 4.9965
print(agentbudget.report()) # Full cost breakdown
agentbudget.teardown() # Stop tracking, get final reportagentbudget.init() monkey-patches Completions.create and Messages.create on the OpenAI and Anthropic SDKs. Same pattern used by Sentry, Datadog, and other observability tools.Drop-in API
| Function | Description |
|---|---|
agentbudget.init(budget) | Start tracking. Patches OpenAI/Anthropic. Returns the session. |
agentbudget.spent() | Total dollars spent so far. |
agentbudget.remaining() | Dollars left in the budget. |
agentbudget.report() | Full cost breakdown as a dict. |
agentbudget.track(result, cost, tool_name) | Manually track a tool/API call cost. |
agentbudget.register_model(name, input, output) | Add pricing for a new model at runtime. |
agentbudget.register_models(dict) | Batch register pricing for multiple models. |
agentbudget.get_session() | Get the active session for advanced use. |
agentbudget.teardown() | Stop tracking, unpatch SDKs, return final report. |
Manual Mode
from agentbudget import AgentBudget
budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
response = session.wrap(
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze this..."}]
)
)
data = session.track(call_serp_api(query), cost=0.01, tool_name="serp")
@session.track_tool(cost=0.02, tool_name="search")
def my_search(query):
return api.search(query)
print(session.report())Budget Envelope
A budget envelope is a dollar amount assigned to a unit of work. Every cost is tracked in real time. When exhausted, BudgetExhausted is raised.
# All of these work:
AgentBudget(max_spend="$5.00")
AgentBudget(max_spend="5.00")
AgentBudget(max_spend=5.0)
AgentBudget(max_spend=5)Cost Sources
- LLM calls — Automatically costed using a built-in pricing table. Use
session.wrap(response)or drop-in mode. - Tool calls — External APIs with known per-call costs. Use
session.track(result, cost=0.01). - Decorated functions — Annotate with
@session.track_tool(cost=0.02)to auto-track on every call.
Circuit Breaker
Three levels of protection:
budget = AgentBudget(
max_spend="$5.00",
soft_limit=0.9, # Warn at 90%
max_repeated_calls=10, # Trip after 10 repeated calls
loop_window_seconds=60.0, # Within a 60-second window
on_soft_limit=lambda r: print("Warning: 90% budget used"),
on_hard_limit=lambda r: alert_ops_team(r),
on_loop_detected=lambda r: print("Loop detected!"),
)- Soft limit (default 90%) — Fires a callback. Agent can wrap up gracefully.
- Hard limit (100%) — Raises
BudgetExhausted. No more calls. - Loop detection — Catches repeated calls before they drain the budget. Raises
LoopDetected.
Cost Report
{
"session_id": "sess_abc123",
"budget": 5.00,
"total_spent": 3.42,
"remaining": 1.58,
"breakdown": {
"llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80}},
"tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05}}
},
"duration_seconds": 34.2,
"terminated_by": null,
"events": [...]
}Async Support
from agentbudget import AgentBudget
budget = AgentBudget(max_spend="$5.00")
async with budget.async_session() as session:
response = await session.wrap_async(
client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
)
@session.track_tool(cost=0.01)
async def async_search(query):
return await api.search(query)Nested Budgets
Parent sessions allocate sub-budgets to child tasks. When the child finishes, its total spend is charged to the parent.
with budget.session() as parent:
child = parent.child_session(max_spend=2.0)
with child:
child.track("result", cost=1.50, tool_name="sub_task")
print(parent.spent) # 1.50
print(parent.remaining) # 8.50max_spend and the parent's remaining balance.Webhooks
budget = AgentBudget(
max_spend="$5.00",
webhook_url="https://your-app.com/api/budget-events",
)Events are sent as JSON POST requests with event_type ("soft_limit", "hard_limit", "loop_detected") and the full cost report. Failures are logged but never raise.
Event Callbacks
budget = AgentBudget(
max_spend="$5.00",
on_soft_limit=lambda r: logger.warning(f"90% used: {r}"),
on_hard_limit=lambda r: alert_ops_team(r),
on_loop_detected=lambda r: logger.error(f"Loop: {r}"),
)When webhook_url is also set, both your callback and the webhook fire.
LangChain Integration
pip install agentbudget[langchain]from agentbudget.integrations.langchain import LangChainBudgetCallback
callback = LangChainBudgetCallback(budget="$5.00")
agent.run(
"Research competitors in the CRM space",
callbacks=[callback]
)
print(callback.get_report())CrewAI Integration
from agentbudget.integrations.crewai import CrewAIBudgetMiddleware
with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
result = middleware.track(
crew.kickoff(),
cost=0.50,
tool_name="crew_run"
)
print(middleware.get_report())API Reference
AgentBudget
AgentBudget(
max_spend: str | float | int,
soft_limit: float = 0.9,
max_repeated_calls: int = 10,
loop_window_seconds: float = 60.0,
on_soft_limit: Callable = None,
on_hard_limit: Callable = None,
on_loop_detected: Callable = None,
webhook_url: str = None,
)| Method | Returns | Description |
|---|---|---|
.session() | BudgetSession | Create a sync budget session |
.async_session() | AsyncBudgetSession | Create an async budget session |
.max_spend | float | The configured budget amount |
BudgetSession
| Method / Property | Description |
|---|---|
.wrap(response) | Extract model/tokens from LLM response and record cost. Returns response. |
.track(result, cost, tool_name) | Record a tool call cost. Returns the result. |
.track_tool(cost, tool_name) | Decorator that tracks cost on every call. |
.child_session(max_spend) | Create child session with sub-budget. Costs roll up. |
.report() | Full cost report as a dict. |
.spent | Total dollars spent (float). |
.remaining | Dollars remaining (float). |
Supported Models
Built-in pricing for 50+ models. Updated February 2026.
OpenAI
| Model | Input / 1M | Output / 1M |
|---|---|---|
| gpt-4.1 | $2.00 | $8.00 |
| gpt-4.1-mini | $0.40 | $1.60 |
| gpt-4.1-nano | $0.10 | $0.40 |
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| o3 | $2.00 | $8.00 |
| o3-mini | $1.10 | $4.40 |
| o4-mini | $1.10 | $4.40 |
| o1 | $15.00 | $60.00 |
Anthropic
| Model | Input / 1M | Output / 1M |
|---|---|---|
| claude-opus-4-6 | $5.00 | $25.00 |
| claude-sonnet-4.5 | $3.00 | $15.00 |
| claude-haiku-4.5 | $1.00 | $5.00 |
| claude-3.5-sonnet | $3.00 | $15.00 |
| claude-3.5-haiku | $0.80 | $4.00 |
Google Gemini
| Model | Input / 1M | Output / 1M |
|---|---|---|
| gemini-2.5-pro | $1.25 | $10.00 |
| gemini-2.5-flash | $0.30 | $2.50 |
| gemini-2.0-flash | $0.10 | $0.40 |
| gemini-1.5-pro | $1.25 | $5.00 |
Mistral & Cohere
| Model | Input / 1M | Output / 1M |
|---|---|---|
| mistral-large | $0.50 | $1.50 |
| mistral-small | $0.03 | $0.11 |
| codestral | $0.30 | $0.90 |
| command-r-plus | $2.50 | $10.00 |
register_model() or submit a PR to agentbudget/pricing.py.Custom Model Pricing
New model just launched? Don't wait for a release — register pricing at runtime.
Single model
import agentbudget
agentbudget.register_model(
"gpt-5",
input_price_per_million=5.00,
output_price_per_million=20.00,
)Batch register
agentbudget.register_models({
"gpt-5": (5.00, 20.00),
"gpt-5-mini": (0.50, 2.00),
})Fuzzy matching
Dated model variants are automatically matched to their base model. For example, gpt-4o-2025-06-15 automatically uses gpt-4o pricing.
register_model) → Built-in table → Fuzzy match (strip date suffixes).Exceptions
| Exception | When |
|---|---|
BudgetExhausted | Session exceeded its dollar budget (hard limit). |
LoopDetected | Repeated calls to the same tool/model detected. |
InvalidBudget | Budget string couldn't be parsed. |
AgentBudgetError | Base exception for all AgentBudget errors. |