Documentation

Everything you need to add real-time cost enforcement to your AI agents.

Installation

pip install agentbudget

Requires Python 3.9+. No external dependencies.

For LangChain integration:

pip install agentbudget[langchain]

Quickstart

AgentBudget offers two modes: drop-in (zero code changes) and manual (explicit wrapping).

Drop-in Mode Recommended

Add two lines to the top of your script. Every OpenAI and Anthropic call is tracked automatically.

import agentbudget
import openai

agentbudget.init("$5.00")

# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

print(agentbudget.spent())      # e.g. 0.0035
print(agentbudget.remaining())  # e.g. 4.9965
print(agentbudget.report())     # Full cost breakdown

agentbudget.teardown()  # Stop tracking, get final report

How it works: agentbudget.init() monkey-patches Completions.create and Messages.create on the OpenAI and Anthropic SDKs. Same pattern used by Sentry, Datadog, and other observability tools.

Drop-in API

Function	Description
`agentbudget.init(budget)`	Start tracking. Patches OpenAI/Anthropic. Returns the session.
`agentbudget.spent()`	Total dollars spent so far.
`agentbudget.remaining()`	Dollars left in the budget.
`agentbudget.report()`	Full cost breakdown as a dict.
`agentbudget.track(result, cost, tool_name)`	Manually track a tool/API call cost.
`agentbudget.register_model(name, input, output)`	Add pricing for a new model at runtime.
`agentbudget.register_models(dict)`	Batch register pricing for multiple models.
`agentbudget.get_session()`	Get the active session for advanced use.
`agentbudget.teardown()`	Stop tracking, unpatch SDKs, return final report.

Manual Mode

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

with budget.session() as session:
    response = session.wrap(
        client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Analyze this..."}]
        )
    )

    data = session.track(call_serp_api(query), cost=0.01, tool_name="serp")

    @session.track_tool(cost=0.02, tool_name="search")
    def my_search(query):
        return api.search(query)

print(session.report())

Budget Envelope

A budget envelope is a dollar amount assigned to a unit of work. Every cost is tracked in real time. When exhausted, BudgetExhausted is raised.

# All of these work:
AgentBudget(max_spend="$5.00")
AgentBudget(max_spend="5.00")
AgentBudget(max_spend=5.0)
AgentBudget(max_spend=5)

Cost Sources

LLM calls — Automatically costed using a built-in pricing table. Use session.wrap(response) or drop-in mode.
Tool calls — External APIs with known per-call costs. Use session.track(result, cost=0.01).
Decorated functions — Annotate with @session.track_tool(cost=0.02) to auto-track on every call.

Circuit Breaker

Three levels of protection:

budget = AgentBudget(
    max_spend="$5.00",
    soft_limit=0.9,               # Warn at 90%
    max_repeated_calls=10,        # Trip after 10 repeated calls
    loop_window_seconds=60.0,     # Within a 60-second window
    on_soft_limit=lambda r: print("Warning: 90% budget used"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: print("Loop detected!"),
)

Soft limit (default 90%) — Fires a callback. Agent can wrap up gracefully.
Hard limit (100%) — Raises BudgetExhausted. No more calls.
Loop detection — Catches repeated calls before they drain the budget. Raises LoopDetected.

Cost Report

{
    "session_id": "sess_abc123",
    "budget": 5.00,
    "total_spent": 3.42,
    "remaining": 1.58,
    "breakdown": {
        "llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80}},
        "tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05}}
    },
    "duration_seconds": 34.2,
    "terminated_by": null,
    "events": [...]
}

Async Support

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

async with budget.async_session() as session:
    response = await session.wrap_async(
        client.chat.completions.acreate(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello"}]
        )
    )

    @session.track_tool(cost=0.01)
    async def async_search(query):
        return await api.search(query)

Nested Budgets

Parent sessions allocate sub-budgets to child tasks. When the child finishes, its total spend is charged to the parent.

with budget.session() as parent:
    child = parent.child_session(max_spend=2.0)
    with child:
        child.track("result", cost=1.50, tool_name="sub_task")

    print(parent.spent)      # 1.50
    print(parent.remaining)  # 8.50

The child budget is automatically capped at the lesser of max_spend and the parent's remaining balance.

Webhooks

budget = AgentBudget(
    max_spend="$5.00",
    webhook_url="https://your-app.com/api/budget-events",
)

Events are sent as JSON POST requests with event_type ("soft_limit", "hard_limit", "loop_detected") and the full cost report. Failures are logged but never raise.

Event Callbacks

budget = AgentBudget(
    max_spend="$5.00",
    on_soft_limit=lambda r: logger.warning(f"90% used: {r}"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: logger.error(f"Loop: {r}"),
)

When webhook_url is also set, both your callback and the webhook fire.

LangChain Integration

pip install agentbudget[langchain]

from agentbudget.integrations.langchain import LangChainBudgetCallback

callback = LangChainBudgetCallback(budget="$5.00")

agent.run(
    "Research competitors in the CRM space",
    callbacks=[callback]
)

print(callback.get_report())

CrewAI Integration

from agentbudget.integrations.crewai import CrewAIBudgetMiddleware

with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
    result = middleware.track(
        crew.kickoff(),
        cost=0.50,
        tool_name="crew_run"
    )

print(middleware.get_report())

API Reference

AgentBudget

AgentBudget(
    max_spend: str | float | int,
    soft_limit: float = 0.9,
    max_repeated_calls: int = 10,
    loop_window_seconds: float = 60.0,
    on_soft_limit: Callable = None,
    on_hard_limit: Callable = None,
    on_loop_detected: Callable = None,
    webhook_url: str = None,
)

Method	Returns	Description
`.session()`	BudgetSession	Create a sync budget session
`.async_session()`	AsyncBudgetSession	Create an async budget session
`.max_spend`	float	The configured budget amount

BudgetSession

Method / Property	Description
`.wrap(response)`	Extract model/tokens from LLM response and record cost. Returns response.
`.track(result, cost, tool_name)`	Record a tool call cost. Returns the result.
`.track_tool(cost, tool_name)`	Decorator that tracks cost on every call.
`.child_session(max_spend)`	Create child session with sub-budget. Costs roll up.
`.report()`	Full cost report as a dict.
`.spent`	Total dollars spent (float).
`.remaining`	Dollars remaining (float).

Supported Models

Built-in pricing for 50+ models. Updated February 2026.

OpenAI

Model	Input / 1M	Output / 1M
gpt-4.1	$2.00	$8.00
gpt-4.1-mini	$0.40	$1.60
gpt-4.1-nano	$0.10	$0.40
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
o3	$2.00	$8.00
o3-mini	$1.10	$4.40
o4-mini	$1.10	$4.40
o1	$15.00	$60.00

Anthropic

Model	Input / 1M	Output / 1M
claude-opus-4-6	$5.00	$25.00
claude-sonnet-4.5	$3.00	$15.00
claude-haiku-4.5	$1.00	$5.00
claude-3.5-sonnet	$3.00	$15.00
claude-3.5-haiku	$0.80	$4.00

Google Gemini

Model	Input / 1M	Output / 1M
gemini-2.5-pro	$1.25	$10.00
gemini-2.5-flash	$0.30	$2.50
gemini-2.0-flash	$0.10	$0.40
gemini-1.5-pro	$1.25	$5.00

Mistral & Cohere

Model	Input / 1M	Output / 1M
mistral-large	$0.50	$1.50
mistral-small	$0.03	$0.11
codestral	$0.30	$0.90
command-r-plus	$2.50	$10.00

Missing a model? Register it at runtime with register_model() or submit a PR to agentbudget/pricing.py.

Custom Model Pricing

New model just launched? Don't wait for a release — register pricing at runtime.

Single model

import agentbudget

agentbudget.register_model(
    "gpt-5",
    input_price_per_million=5.00,
    output_price_per_million=20.00,
)

Batch register

agentbudget.register_models({
    "gpt-5": (5.00, 20.00),
    "gpt-5-mini": (0.50, 2.00),
})

Fuzzy matching

Dated model variants are automatically matched to their base model. For example, gpt-4o-2025-06-15 automatically uses gpt-4o pricing.

Resolution order: Custom pricing (via register_model) → Built-in table → Fuzzy match (strip date suffixes).

Exceptions

Exception	When
`BudgetExhausted`	Session exceeded its dollar budget (hard limit).
`LoopDetected`	Repeated calls to the same tool/model detected.
`InvalidBudget`	Budget string couldn't be parsed.
`AgentBudgetError`	Base exception for all AgentBudget errors.