Documentation

Everything you need to add real-time cost enforcement to your AI agents.

Installation

pip install agentbudget

Requires Python 3.9+. No external dependencies.

For LangChain integration:

pip install agentbudget[langchain]

Quickstart

AgentBudget offers two modes: drop-in (zero code changes) and manual (explicit wrapping).

Drop-in Mode Recommended

Add two lines to the top of your script. Every OpenAI and Anthropic call is tracked automatically.

import agentbudget
import openai

agentbudget.init("$5.00")

# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

print(agentbudget.spent())      # e.g. 0.0035
print(agentbudget.remaining())  # e.g. 4.9965
print(agentbudget.report())     # Full cost breakdown

agentbudget.teardown()  # Stop tracking, get final report
How it works: agentbudget.init() monkey-patches Completions.create and Messages.create on the OpenAI and Anthropic SDKs. Same pattern used by Sentry, Datadog, and other observability tools.

Drop-in API

FunctionDescription
agentbudget.init(budget)Start tracking. Patches OpenAI/Anthropic. Returns the session.
agentbudget.spent()Total dollars spent so far.
agentbudget.remaining()Dollars left in the budget.
agentbudget.report()Full cost breakdown as a dict.
agentbudget.track(result, cost, tool_name)Manually track a tool/API call cost.
agentbudget.register_model(name, input, output)Add pricing for a new model at runtime.
agentbudget.register_models(dict)Batch register pricing for multiple models.
agentbudget.get_session()Get the active session for advanced use.
agentbudget.teardown()Stop tracking, unpatch SDKs, return final report.

Manual Mode

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

with budget.session() as session:
    response = session.wrap(
        client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Analyze this..."}]
        )
    )

    data = session.track(call_serp_api(query), cost=0.01, tool_name="serp")

    @session.track_tool(cost=0.02, tool_name="search")
    def my_search(query):
        return api.search(query)

print(session.report())

Budget Envelope

A budget envelope is a dollar amount assigned to a unit of work. Every cost is tracked in real time. When exhausted, BudgetExhausted is raised.

# All of these work:
AgentBudget(max_spend="$5.00")
AgentBudget(max_spend="5.00")
AgentBudget(max_spend=5.0)
AgentBudget(max_spend=5)

Cost Sources

  • LLM calls — Automatically costed using a built-in pricing table. Use session.wrap(response) or drop-in mode.
  • Tool calls — External APIs with known per-call costs. Use session.track(result, cost=0.01).
  • Decorated functions — Annotate with @session.track_tool(cost=0.02) to auto-track on every call.

Circuit Breaker

Three levels of protection:

budget = AgentBudget(
    max_spend="$5.00",
    soft_limit=0.9,               # Warn at 90%
    max_repeated_calls=10,        # Trip after 10 repeated calls
    loop_window_seconds=60.0,     # Within a 60-second window
    on_soft_limit=lambda r: print("Warning: 90% budget used"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: print("Loop detected!"),
)
  • Soft limit (default 90%) — Fires a callback. Agent can wrap up gracefully.
  • Hard limit (100%) — Raises BudgetExhausted. No more calls.
  • Loop detection — Catches repeated calls before they drain the budget. Raises LoopDetected.

Cost Report

{
    "session_id": "sess_abc123",
    "budget": 5.00,
    "total_spent": 3.42,
    "remaining": 1.58,
    "breakdown": {
        "llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80}},
        "tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05}}
    },
    "duration_seconds": 34.2,
    "terminated_by": null,
    "events": [...]
}

Async Support

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

async with budget.async_session() as session:
    response = await session.wrap_async(
        client.chat.completions.acreate(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello"}]
        )
    )

    @session.track_tool(cost=0.01)
    async def async_search(query):
        return await api.search(query)

Nested Budgets

Parent sessions allocate sub-budgets to child tasks. When the child finishes, its total spend is charged to the parent.

with budget.session() as parent:
    child = parent.child_session(max_spend=2.0)
    with child:
        child.track("result", cost=1.50, tool_name="sub_task")

    print(parent.spent)      # 1.50
    print(parent.remaining)  # 8.50
The child budget is automatically capped at the lesser of max_spend and the parent's remaining balance.

Webhooks

budget = AgentBudget(
    max_spend="$5.00",
    webhook_url="https://your-app.com/api/budget-events",
)

Events are sent as JSON POST requests with event_type ("soft_limit", "hard_limit", "loop_detected") and the full cost report. Failures are logged but never raise.

Event Callbacks

budget = AgentBudget(
    max_spend="$5.00",
    on_soft_limit=lambda r: logger.warning(f"90% used: {r}"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: logger.error(f"Loop: {r}"),
)

When webhook_url is also set, both your callback and the webhook fire.

LangChain Integration

pip install agentbudget[langchain]
from agentbudget.integrations.langchain import LangChainBudgetCallback

callback = LangChainBudgetCallback(budget="$5.00")

agent.run(
    "Research competitors in the CRM space",
    callbacks=[callback]
)

print(callback.get_report())

CrewAI Integration

from agentbudget.integrations.crewai import CrewAIBudgetMiddleware

with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
    result = middleware.track(
        crew.kickoff(),
        cost=0.50,
        tool_name="crew_run"
    )

print(middleware.get_report())

API Reference

AgentBudget

AgentBudget(
    max_spend: str | float | int,
    soft_limit: float = 0.9,
    max_repeated_calls: int = 10,
    loop_window_seconds: float = 60.0,
    on_soft_limit: Callable = None,
    on_hard_limit: Callable = None,
    on_loop_detected: Callable = None,
    webhook_url: str = None,
)
MethodReturnsDescription
.session()BudgetSessionCreate a sync budget session
.async_session()AsyncBudgetSessionCreate an async budget session
.max_spendfloatThe configured budget amount

BudgetSession

Method / PropertyDescription
.wrap(response)Extract model/tokens from LLM response and record cost. Returns response.
.track(result, cost, tool_name)Record a tool call cost. Returns the result.
.track_tool(cost, tool_name)Decorator that tracks cost on every call.
.child_session(max_spend)Create child session with sub-budget. Costs roll up.
.report()Full cost report as a dict.
.spentTotal dollars spent (float).
.remainingDollars remaining (float).

Supported Models

Built-in pricing for 50+ models. Updated February 2026.

OpenAI

ModelInput / 1MOutput / 1M
gpt-4.1$2.00$8.00
gpt-4.1-mini$0.40$1.60
gpt-4.1-nano$0.10$0.40
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
o3$2.00$8.00
o3-mini$1.10$4.40
o4-mini$1.10$4.40
o1$15.00$60.00

Anthropic

ModelInput / 1MOutput / 1M
claude-opus-4-6$5.00$25.00
claude-sonnet-4.5$3.00$15.00
claude-haiku-4.5$1.00$5.00
claude-3.5-sonnet$3.00$15.00
claude-3.5-haiku$0.80$4.00

Google Gemini

ModelInput / 1MOutput / 1M
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.30$2.50
gemini-2.0-flash$0.10$0.40
gemini-1.5-pro$1.25$5.00

Mistral & Cohere

ModelInput / 1MOutput / 1M
mistral-large$0.50$1.50
mistral-small$0.03$0.11
codestral$0.30$0.90
command-r-plus$2.50$10.00
Missing a model? Register it at runtime with register_model() or submit a PR to agentbudget/pricing.py.

Custom Model Pricing

New model just launched? Don't wait for a release — register pricing at runtime.

Single model

import agentbudget

agentbudget.register_model(
    "gpt-5",
    input_price_per_million=5.00,
    output_price_per_million=20.00,
)

Batch register

agentbudget.register_models({
    "gpt-5": (5.00, 20.00),
    "gpt-5-mini": (0.50, 2.00),
})

Fuzzy matching

Dated model variants are automatically matched to their base model. For example, gpt-4o-2025-06-15 automatically uses gpt-4o pricing.

Resolution order: Custom pricing (via register_model) → Built-in table → Fuzzy match (strip date suffixes).

Exceptions

ExceptionWhen
BudgetExhaustedSession exceeded its dollar budget (hard limit).
LoopDetectedRepeated calls to the same tool/model detected.
InvalidBudgetBudget string couldn't be parsed.
AgentBudgetErrorBase exception for all AgentBudget errors.