Documentation

Everything you need to add real-time cost enforcement to your AI agents.

Installation

pip install agentbudget

Python 3.9+ · Go 1.21+ · Node.js 18+. Zero external dependencies in all three SDKs.

Optional Python integrations:

pip install agentbudget[langchain]   # LangChain / LangGraph
pip install agentbudget[autogen]    # AutoGen

Quickstart

AgentBudget offers two modes: drop-in (zero code changes) and manual (explicit wrapping).

Drop-in Mode Recommended

Add two lines to the top of your script. Every OpenAI and Anthropic call is tracked automatically.

import agentbudget
import openai

agentbudget.init("$5.00")

# Your existing code — no changes needed
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

print(agentbudget.spent())      # e.g. 0.0035
print(agentbudget.remaining())  # e.g. 4.9965
print(agentbudget.report())     # Full cost breakdown

agentbudget.teardown()  # Stop tracking, get final report
How it works: agentbudget.init() monkey-patches Completions.create and Messages.create on the OpenAI and Anthropic SDKs. Same pattern used by Sentry, Datadog, and other observability tools. The patch is process-wide, but the active session is scoped to the current thread or async task — so concurrent requests each get their own budget and don't overwrite each other.

Drop-in API

FunctionDescription
agentbudget.init(budget)Start tracking. Patches OpenAI/Anthropic. Returns the session.
agentbudget.spent()Total dollars spent so far.
agentbudget.remaining()Dollars left in the budget.
agentbudget.report()Full cost breakdown as a dict.
agentbudget.track(result, cost, tool_name)Manually track a tool/API call cost.
agentbudget.wrap_client(client, session)Attach tracking to a specific client instance only.
agentbudget.register_model(name, input, output)Add pricing for a new model at runtime.
agentbudget.register_models(dict)Batch register pricing for multiple models.
agentbudget.get_session()Get the active session for advanced use.
agentbudget.teardown()Stop tracking, unpatch SDKs, return final report.

Manual Mode

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

with budget.session() as session:
    response = session.wrap(
        client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Analyze this..."}]
        )
    )

    data = session.track(call_serp_api(query), cost=0.01, tool_name="serp")

print(session.report())

Budget Envelope

A budget envelope is a dollar amount assigned to a unit of work. Every cost is tracked in real time. When exhausted, BudgetExhausted is raised.

# All of these work:
AgentBudget(max_spend="$5.00")
AgentBudget(max_spend="5.00")
AgentBudget(max_spend=5.0)
AgentBudget(max_spend=5)

Cost Sources

  • LLM calls — Automatically costed using a built-in pricing table. Use session.wrap(response) or drop-in mode.
  • Tool calls — External APIs with known per-call costs. Use session.track(result, cost=0.01).
  • Decorated functions — Annotate with @session.track_tool(cost=0.02) to auto-track on every call.

Circuit Breaker

Three levels of protection:

budget = AgentBudget(
    max_spend="$5.00",
    soft_limit=0.9,               # Warn at 90%
    max_repeated_calls=10,        # Trip after 10 repeated calls
    loop_window_seconds=60.0,     # Within a 60-second window
    on_soft_limit=lambda r: print("Warning: 90% budget used"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: print("Loop detected!"),
)
  • Soft limit (default 90%) — Fires a callback. Agent can wrap up gracefully.
  • Hard limit (100%) — Raises BudgetExhausted. No more calls.
  • Loop detection — Catches repeated calls before they drain the budget. Raises LoopDetected.

Cost Report

{
    "session_id": "sess_abc123",
    "budget": 5.00,
    "total_spent": 3.42,
    "remaining": 1.58,
    "breakdown": {
        "llm": {"total": 3.12, "calls": 8, "by_model": {"gpt-4o": 2.80}},
        "tools": {"total": 0.30, "calls": 6, "by_tool": {"serp_api": 0.05}}
    },
    "duration_seconds": 34.2,
    "terminated_by": null,
    "events": [...]
}

Streaming Support

Streaming responses (stream=True) are fully tracked. Cost is recorded even if you break out of the stream early — every chunk passes through to your code unchanged.

agentbudget.init("$5.00")
client = openai.OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this report"}],
    stream=True,  # include_usage is added automatically for OpenAI
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

print(agentbudget.spent())  # cost recorded after the stream
OpenAI note: In drop-in mode and with wrap_client(), AgentBudget adds stream_options={"include_usage": True} automatically, so token counts appear on the final chunk — you don't need to pass it (an explicit value is respected). Anthropic streams always include usage. Only set it yourself if you call OpenAI directly and pass the result to session.wrap().

Both for-loop and context-manager patterns are supported, sync and async:

# async for
async for chunk in await client.chat.completions.create(
    stream=True, ...
):
    process(chunk)

# context manager
with client.chat.completions.create(stream=True, ...) as stream:
    for chunk in stream:
        process(chunk)

Per-Client Tracking

By default, agentbudget.init() patches all OpenAI/Anthropic calls globally. For finer control — multiple budgets, isolated clients, or production apps where global side effects are undesirable — use wrap_client():

import agentbudget
from agentbudget import AgentBudget
import openai

budget = AgentBudget(max_spend="$5.00")
with budget.session() as session:
    # Only this instance is tracked
    client = agentbudget.wrap_client(openai.OpenAI(), session)
    response = client.chat.completions.create(...)  # tracked

    other = openai.OpenAI()
    other.chat.completions.create(...)              # NOT tracked

Works with openai.OpenAI, openai.AsyncOpenAI, anthropic.Anthropic, and anthropic.AsyncAnthropic. Global patching via init() is unchanged — both approaches coexist.

Finalization Reserve

Prevent your agent from being cut off mid-task. Reserve a fraction of the budget exclusively for the final response step — the hard limit fires early, keeping that slice free.

budget = AgentBudget(
    max_spend="$1.00",
    finalization_reserve=0.05,  # hard limit at $0.95, $0.05 reserved for final call
)

For manual control, check before the final call with session.would_exceed():

with budget.session() as session:
    # ... do work ...

    if session.would_exceed(estimated_final_cost):
        return "Budget nearly exhausted — here is what was completed: ..."

    # Safe to proceed
    response = session.wrap(client.chat.completions.create(...))
would_exceed(cost) checks against the remaining budget without recording anything. Use it as a pre-flight check before expensive final steps.

Async Support

from agentbudget import AgentBudget

budget = AgentBudget(max_spend="$5.00")

async with budget.async_session() as session:
    response = await session.wrap_async(
        client.chat.completions.acreate(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello"}]
        )
    )

    @session.track_tool(cost=0.01)
    async def async_search(query):
        return await api.search(query)

Nested Budgets

Parent sessions allocate sub-budgets to child tasks. When the child finishes, its total spend is charged to the parent.

with budget.session() as parent:
    child = parent.child_session(max_spend=2.0)
    with child:
        child.track("result", cost=1.50, tool_name="sub_task")

    print(parent.spent)      # 1.50
    print(parent.remaining)  # 8.50
The child budget is automatically capped at the lesser of max_spend and the parent's remaining balance.

Webhooks

budget = AgentBudget(
    max_spend="$5.00",
    webhook_url="https://your-app.com/api/budget-events",
)

Events are sent as JSON POST requests with event_type ("soft_limit", "hard_limit", "loop_detected") and the full cost report. Failures are logged but never raise.

Event Callbacks

budget = AgentBudget(
    max_spend="$5.00",
    on_soft_limit=lambda r: logger.warning(f"90% used: {r}"),
    on_hard_limit=lambda r: alert_ops_team(r),
    on_loop_detected=lambda r: logger.error(f"Loop: {r}"),
)

When webhook_url is also set, both your callback and the webhook fire.

LangChain IntegrationPython only

Go and TypeScript integrations are coming in a future release.

pip install agentbudget[langchain]
from agentbudget.integrations.langchain import LangChainBudgetCallback

# Use as a context manager so the session is finalized.
with LangChainBudgetCallback(budget="$5.00") as callback:
    agent.invoke(
        {"input": "Research competitors in the CRM space"},
        config={"callbacks": [callback]},
    )

print(callback.get_report())

Costs are tracked from both legacy LLMResult usage and modern chat-model usage_metadata, so LangGraph runs and chat models are counted correctly. Pass tool_costs={"web_search": 0.01} to also charge tool calls against the budget.

CrewAI IntegrationPython only

Go and TypeScript integrations are coming in a future release.

from agentbudget.integrations.crewai import CrewAIBudgetMiddleware

with CrewAIBudgetMiddleware(budget="$3.00") as middleware:
    result = middleware.track(
        crew.kickoff(),
        cost=0.50,
        tool_name="crew_run"
    )

print(middleware.get_report())

AutoGen IntegrationPython only

Drop-in subclasses for AutoGen agents with built-in budget enforcement and cost tracking.

pip install agentbudget[autogen]
from agentbudget.integrations.autogen import BudgetedAssistantAgent, BudgetedUserProxyAgent

assistant = BudgetedAssistantAgent(name="assistant", budget="$5.00")
user = BudgetedUserProxyAgent(name="user", budget="$5.00")

user.initiate_chat(
    assistant,
    message="Research competitors in the CRM space"
)

print(assistant.get_report())

For patching existing agent instances without subclassing:

from agentbudget.integrations.autogen import AutoGenBudgetTracker

tracker = AutoGenBudgetTracker(budget="$5.00")
tracker.attach(existing_assistant)

# BudgetExhausted is raised automatically when the limit is hit
print(tracker.get_report())

API Reference

AgentBudget

AgentBudget(
    max_spend: str | float | int,
    soft_limit: float = 0.9,
    max_repeated_calls: int = 10,
    loop_window_seconds: float = 60.0,
    on_soft_limit: Callable = None,
    on_hard_limit: Callable = None,
    on_loop_detected: Callable = None,
    webhook_url: str = None,
    finalization_reserve: float = 0.0,  # fraction of budget reserved for final step
)
MethodReturnsDescription
.session()BudgetSessionCreate a sync budget session
.async_session()AsyncBudgetSessionCreate an async budget session
.max_spendfloatThe configured budget amount

BudgetSession

Method / PropertyDescription
.wrap(response)Extract model/tokens from LLM response and record cost. Returns response.
.track(result, cost, tool_name)Record a tool call cost. Returns the result.
.track_tool(cost, tool_name)Decorator that tracks cost on every call.
.child_session(max_spend)Create child session with sub-budget. Costs roll up.
.would_exceed(cost)Returns True if cost would exceed the remaining budget. Does not record anything.
.report()Full cost report as a dict.
.spentTotal dollars spent (float).
.remainingDollars remaining (float).

Supported Models

Built-in pricing for 50+ models. Updated February 2026.

OpenAI

ModelInput / 1MOutput / 1M
gpt-4.1$2.00$8.00
gpt-4.1-mini$0.40$1.60
gpt-4.1-nano$0.10$0.40
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
o3$2.00$8.00
o3-mini$1.10$4.40
o4-mini$1.10$4.40
o1$15.00$60.00

Anthropic

ModelInput / 1MOutput / 1M
claude-opus-4-6$5.00$25.00
claude-sonnet-4.5$3.00$15.00
claude-haiku-4.5$1.00$5.00
claude-3.5-sonnet$3.00$15.00
claude-3.5-haiku$0.80$4.00

Google Gemini

ModelInput / 1MOutput / 1M
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.30$2.50
gemini-2.0-flash$0.10$0.40
gemini-1.5-pro$1.25$5.00

Mistral & Cohere

ModelInput / 1MOutput / 1M
mistral-large$0.50$1.50
mistral-small$0.03$0.11
codestral$0.30$0.90
command-r-plus$2.50$10.00
Missing a model? Register it at runtime with register_model() or submit a PR to pricing.json and run python scripts/generate_pricing.py.

Custom Model Pricing

New model just launched? Don't wait for a release — register pricing at runtime.

Single model

import agentbudget

agentbudget.register_model(
    "gpt-5",
    input_price_per_million=5.00,
    output_price_per_million=20.00,
)

Batch register

agentbudget.register_models({
    "gpt-5": (5.00, 20.00),
    "gpt-5-mini": (0.50, 2.00),
})

Fuzzy matching

Dated model variants are automatically matched to their base model. For example, gpt-4o-2025-06-15 automatically uses gpt-4o pricing.

Resolution order: Custom pricing (via register_model) → Built-in table → Fuzzy match (strip date suffixes) → OpenRouter prefix strip ("openai/gpt-4o""gpt-4o").

Exceptions

ExceptionWhen
BudgetExhaustedSession exceeded its dollar budget (hard limit).
LoopDetectedRepeated calls to the same tool/model detected.
InvalidBudgetBudget string couldn't be parsed.
InvalidCostA tracked cost was negative, NaN, or infinite.
AgentBudgetErrorBase exception for all AgentBudget errors.