Documentation

Everything you need to add real-time cost enforcement to your AI agents.

Installation

pip install agentbudget

Python 3.9+ · Go 1.21+ · Node.js 18+. Zero external dependencies in all three SDKs.

For Python LangChain integration:

pip install agentbudget[langchain]

Quickstart

AgentBudget offers two modes: drop-in (zero code changes) and manual (explicit wrapping).

Drop-in Mode Recommended

Add two lines to the top of your script. Every OpenAI and Anthropic call is tracked automatically.

import agentbudget
import openai

agentbudget.init("$5.00")

# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

print(agentbudget.spent())      # e.g. 0.0035
print(agentbudget.remaining())  # e.g. 4.9965
print(agentbudget.report())     # Full cost breakdown

agentbudget.teardown()  # Stop tracking, get final report
How it works: agentbudget.init() monkey-patches Completions.create and Messages.create on the OpenAI and Anthropic SDKs. Same pattern used by Sentry, Datadog, and other observability tools.

Drop-in API

FunctionDescription
agentbudget.init(budget)Start tracking. Patches OpenAI/Anthropic. Returns the session.
agentbudget.spent()Total dollars spent so far.
agentbudget.remaining()Dollars left in the budget.
agentbudget.report()Full cost breakdown as a dict.
agentbudget.track(result, cost, tool_name)Manually track a tool/API call cost.
agentbudget.wrap_client(client, session)Attach tracking to a specific client instance only.
agentbudget.register_model(name, input, output)Add pricing for a new model at runtime.
agentbudget.register_models(dict)Batch register pricing for multiple models.
agentbudget.get_session()Get the active session for advanced use.
agentbudget.teardown()Stop tracking, unpatch SDKs, return final report.

Manual Mode

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

with budget.session() as session:
    response = session.wrap(
        client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Analyze this..."}]
        )
    )

    data = session.track(call_serp_api(query), cost=0.01, tool_name="serp")

print(session.report())

Budget Envelope

A budget envelope is a dollar amount assigned to a unit of work. Every cost is tracked in real time. When exhausted, BudgetExhausted is raised.

# All of these work:
AgentBudget(max_spend="$5.00")
AgentBudget(max_spend="5.00")
AgentBudget(max_spend=5.0)
AgentBudget(max_spend=5)

Cost Sources

  • LLM calls — Automatically costed using a built-in pricing table. Use session.wrap(response) or drop-in mode.
  • Tool calls — External APIs with known per-call costs. Use session.track(result, cost=0.01).
  • Decorated functions — Annotate with @session.track_tool(cost=0.02) to auto-track on every call.

Circuit Breaker

Three levels of protection:

budget = AgentBudget(
    max_spend="$5.00",
    soft_limit=0.9,               # Warn at 90%
    max_repeated_calls=10,        # Trip after 10 repeated calls
    loop_window_seconds=60.0,     # Within a 60-second window
    on_soft_limit=lambda r: print("Warning: 90% budget used"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: print("Loop detected!"),
)
  • Soft limit (default 90%) — Fires a callback. Agent can wrap up gracefully.
  • Hard limit (100%) — Raises BudgetExhausted. No more calls.
  • Loop detection — Catches repeated calls before they drain the budget. Raises LoopDetected.

Cost Report

{
    "session_id": "sess_abc123",
    "budget": 5.00,
    "total_spent": 3.42,
    "remaining": 1.58,
    "breakdown": {
        "llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80}},
        "tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05}}
    },
    "duration_seconds": 34.2,
    "terminated_by": null,
    "events": [...]
}

Streaming Support

Streaming responses (stream=True) are fully tracked. Cost is recorded after the stream is exhausted — every chunk passes through to your code unchanged.

agentbudget.init("$5.00")
client = openai.OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this report"}],
    stream=True,
    stream_options={"include_usage": True},  # required for OpenAI
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

print(agentbudget.spent())  # cost recorded after stream exhausted
OpenAI note: You must pass stream_options={"include_usage": True} for token counts to appear on the final chunk. Without it, streaming calls are silently tracked as $0.00 — no error, just no cost. Anthropic streams always include usage automatically.

Both for-loop and context-manager patterns are supported, sync and async:

# async for
async for chunk in await client.chat.completions.create(
    stream=True, stream_options={"include_usage": True}, ...
):
    process(chunk)

# context manager
with client.chat.completions.create(stream=True, ...) as stream:
    for chunk in stream:
        process(chunk)

Per-Client Tracking

By default, agentbudget.init() patches all OpenAI/Anthropic calls globally. For finer control — multiple budgets, isolated clients, or production apps where global side effects are undesirable — use wrap_client():

import agentbudget
from agentbudget import AgentBudget
import openai

budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
    # Only this instance is tracked
    client = agentbudget.wrap_client(openai.OpenAI(), session)
    response = client.chat.completions.create(...)  # tracked

    other = openai.OpenAI()
    other.chat.completions.create(...)              # NOT tracked

Works with openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, and anthropic.AsyncAnthropic. Global patching via init() is unchanged — both approaches coexist.

Finalization Reserve

Prevent your agent from being cut off mid-task. Reserve a fraction of the budget exclusively for the final response step — the hard limit fires early, keeping that slice free.

budget = AgentBudget(
    max_spend="$1.00",
    finalization_reserve=0.05,  # hard limit at $0.95, $0.05 reserved for final call
)

For manual control, check before the final call with session.would_exceed():

with budget.session() as session:
    # ... do work ...

    if session.would_exceed(estimated_final_cost):
        return "Budget nearly exhausted — here is what was completed: ..."

    # Safe to proceed
    response = session.wrap(client.chat.completions.create(...))
would_exceed(cost) checks against the remaining budget without recording anything. Use it as a pre-flight check before expensive final steps.

Async Support

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

async with budget.async_session() as session:
    response = await session.wrap_async(
        client.chat.completions.acreate(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello"}]
        )
    )

    @session.track_tool(cost=0.01)
    async def async_search(query):
        return await api.search(query)

Nested Budgets

Parent sessions allocate sub-budgets to child tasks. When the child finishes, its total spend is charged to the parent.

with budget.session() as parent:
    child = parent.child_session(max_spend=2.0)
    with child:
        child.track("result", cost=1.50, tool_name="sub_task")

    print(parent.spent)      # 1.50
    print(parent.remaining)  # 8.50
The child budget is automatically capped at the lesser of max_spend and the parent's remaining balance.

Webhooks

budget = AgentBudget(
    max_spend="$5.00",
    webhook_url="https://your-app.com/api/budget-events",
)

Events are sent as JSON POST requests with event_type ("soft_limit", "hard_limit", "loop_detected") and the full cost report. Failures are logged but never raise.

Event Callbacks

budget = AgentBudget(
    max_spend="$5.00",
    on_soft_limit=lambda r: logger.warning(f"90% used: {r}"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: logger.error(f"Loop: {r}"),
)

When webhook_url is also set, both your callback and the webhook fire.

LangChain IntegrationPython only

Go and TypeScript integrations are coming in a future release.

pip install agentbudget[langchain]
from agentbudget.integrations.langchain import LangChainBudgetCallback

callback = LangChainBudgetCallback(budget="$5.00")

agent.run(
    "Research competitors in the CRM space",
    callbacks=[callback]
)

print(callback.get_report())

CrewAI IntegrationPython only

Go and TypeScript integrations are coming in a future release.

from agentbudget.integrations.crewai import CrewAIBudgetMiddleware

with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
    result = middleware.track(
        crew.kickoff(),
        cost=0.50,
        tool_name="crew_run"
    )

print(middleware.get_report())

API Reference

AgentBudget

AgentBudget(
    max_spend: str | float | int,
    soft_limit: float = 0.9,
    max_repeated_calls: int = 10,
    loop_window_seconds: float = 60.0,
    on_soft_limit: Callable = None,
    on_hard_limit: Callable = None,
    on_loop_detected: Callable = None,
    webhook_url: str = None,
    finalization_reserve: float = 0.0,  # fraction of budget reserved for final step
)
MethodReturnsDescription
.session()BudgetSessionCreate a sync budget session
.async_session()AsyncBudgetSessionCreate an async budget session
.max_spendfloatThe configured budget amount

BudgetSession

Method / PropertyDescription
.wrap(response)Extract model/tokens from LLM response and record cost. Returns response.
.track(result, cost, tool_name)Record a tool call cost. Returns the result.
.track_tool(cost, tool_name)Decorator that tracks cost on every call.
.child_session(max_spend)Create child session with sub-budget. Costs roll up.
.would_exceed(cost)Returns True if cost would exceed the remaining budget. Does not record anything.
.report()Full cost report as a dict.
.spentTotal dollars spent (float).
.remainingDollars remaining (float).

Supported Models

Built-in pricing for 50+ models. Updated February 2026.

OpenAI

ModelInput / 1MOutput / 1M
gpt-4.1$2.00$8.00
gpt-4.1-mini$0.40$1.60
gpt-4.1-nano$0.10$0.40
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
o3$2.00$8.00
o3-mini$1.10$4.40
o4-mini$1.10$4.40
o1$15.00$60.00

Anthropic

ModelInput / 1MOutput / 1M
claude-opus-4-6$5.00$25.00
claude-sonnet-4.5$3.00$15.00
claude-haiku-4.5$1.00$5.00
claude-3.5-sonnet$3.00$15.00
claude-3.5-haiku$0.80$4.00

Google Gemini

ModelInput / 1MOutput / 1M
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.30$2.50
gemini-2.0-flash$0.10$0.40
gemini-1.5-pro$1.25$5.00

Mistral & Cohere

ModelInput / 1MOutput / 1M
mistral-large$0.50$1.50
mistral-small$0.03$0.11
codestral$0.30$0.90
command-r-plus$2.50$10.00
Missing a model? Register it at runtime with register_model() or submit a PR to pricing.json and run python scripts/generate_pricing.py.

Custom Model Pricing

New model just launched? Don't wait for a release — register pricing at runtime.

Single model

import agentbudget

agentbudget.register_model(
    "gpt-5",
    input_price_per_million=5.00,
    output_price_per_million=20.00,
)

Batch register

agentbudget.register_models({
    "gpt-5": (5.00, 20.00),
    "gpt-5-mini": (0.50, 2.00),
})

Fuzzy matching

Dated model variants are automatically matched to their base model. For example, gpt-4o-2025-06-15 automatically uses gpt-4o pricing.

Resolution order: Custom pricing (via register_model) → Built-in table → Fuzzy match (strip date suffixes) → OpenRouter prefix strip ("openai/gpt-4o""gpt-4o").

Exceptions

ExceptionWhen
BudgetExhaustedSession exceeded its dollar budget (hard limit).
LoopDetectedRepeated calls to the same tool/model detected.
InvalidBudgetBudget string couldn't be parsed.
AgentBudgetErrorBase exception for all AgentBudget errors.