Per-Feature AI Cost Tracking: Tag Every LLM Call â€” Tokonomics

Q: What's the minimum tagging setup that gives useful insights?

At minimum, tag feature_name and environment on every request. This immediately tells you cost per feature and filters out staging noise. Add tenant_id when you need per-customer attribution. Start minimal and expand.

Q: Can I use tags for automated optimization decisions?

Yes. Your routing config can use tags to select models: if feature == classification → use deepseek-v4-flash. Tags become the routing key that determines which model, provider, and budget counter to use.

"Which feature is driving our AI bill?" is one of the most common questions SaaS CTOs ask when they see an AI invoice that doesn't match their expectations. The answer is almost always: "We don't know — we only see the total."

In 2025, Benchmarkit/Mavvrik found only 34% of companies have mature AI cost management — and among those, 57% still rely on spreadsheets. Per-feature cost attribution is the gap between knowing your total monthly AI spend and knowing which feature to fix.

This guide explains the tagging pattern, which metadata fields matter, and how to implement attribution in any stack.

Key Takeaways

Only 34% of companies have mature AI cost management; 57% use spreadsheets (Benchmarkit/Mavvrik, n=372, 2025)

Per-feature tracking transforms "our AI bill is $8,500 this month" into "the summarizer costs $5,200, the chat bot $2,100, and the code assistant $1,200"

The proxy-layer tagging approach works across all languages with zero modification to feature code

Most teams find one feature accounts for 60–70% of total AI spend — and it's rarely the one they expected

This post is part of our SaaS AI Features Cost Guide.

What Per-Feature Attribution Actually Looks Like

Without feature-level tracking, your monitoring dashboard looks like this:

Month	Total AI Cost
April	$6,200
May	$7,800
June	$8,500

With per-feature tagging, it looks like this:

Feature	June Cost	MoM Change	Avg Input Tokens
`support-bot`	$5,200	+42%	2,847
`chat-assistant`	$2,100	+8%	891
`code-reviewer`	$1,200	-3%	1,203

Now you know: the support-bot's prompt grew (token count up 40%) and that's driving 80% of the June increase. One prompt audit solves the problem. Without feature attribution, you'd have spent three weeks investigating everything.

Typical feature cost distribution in a SaaS product with 3 AI features. One feature almost always accounts for 50–70% of total AI spend. Source: illustrative data based on production cost patterns across monitored SaaS apps.

The Tagging Pattern

The implementation has two parts: tagging outbound requests, and logging/aggregating by tag.

Part 1: Tag Every LLM Request

The cleanest approach is custom HTTP headers passed through your proxy layer. These headers don't affect the LLM API — your proxy strips them before forwarding — but the proxy logs them with every cost record.

HTTP header approach (works in any language):

POST /proxy/openai/chat/completions
Authorization: Bearer mk_your_token
X-Feature-Name: support-bot
X-Tenant-ID: tenant_abc123
X-Environment: production
X-User-Tier: pro
Content-Type: application/json

{ ...your normal OpenAI request body... }

Alternative: SDK wrapper approach (for teams not using a proxy):

# Python example
def call_llm(prompt, feature_name, tenant_id):
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    # Log cost manually
    cost = calculate_cost(
        response.usage.prompt_tokens,
        response.usage.completion_tokens,
        "gpt-4o-mini"
    )
    log_cost(feature=feature_name, tenant=tenant_id, cost=cost)
    return response

The header approach is preferred because it's language-agnostic and requires no modification to existing feature code — just change the base URL and add headers.

Part 2: The Metadata Fields That Matter

Field	Type	Required	Example	Why
`feature_name`	string	✅	`support-bot`	Core attribution key
`tenant_id`	string	✅	`tenant_abc123`	Multi-tenant cost isolation
`environment`	enum	✅	`production`	Exclude staging from billing
`model`	string	Auto	`gpt-4o-mini`	Model cost validation
`user_tier`	string	Recommended	`pro`	Plan-based cap enforcement
`request_type`	string	Optional	`classification`	Task-level optimization signals
`version`	string	Optional	`v2.1`	A/B test cost comparison

The minimum viable tag is feature_name + tenant_id + environment. The rest add analytical depth.

What You Can Do With Per-Feature Data

Once you have feature-level cost attribution, four use cases become possible:

1. Cost spike diagnosis When your monthly bill jumps, you know which feature caused it — and often why (token count growth, usage spike, new deployment). Investigation time drops from days to minutes.

2. Feature-level optimization decisions You can calculate the ROI of optimizing each feature separately. "The support-bot costs $5,200/month. If I add prompt caching, projected savings are $3,640/month. Caching takes 4 hours to implement. That's $3,640/month for 4 hours of work."

3. Pricing model validation If your Pro plan includes "unlimited AI" and one feature costs $12/month per Pro user on average, you can see whether your plan pricing covers it.

4. A/B test cost comparison Tag requests with version: v1 vs version: v2. Compare cost per equivalent outcome. Before shipping a new model or prompt, run both versions in parallel and measure cost-per-task, not just quality.

Implementing in Specific Stacks

PHP (Laravel / vanilla):

$response = Http::withHeaders([
    'X-Feature-Name' => 'support-bot',
    'X-Tenant-ID'    => auth()->user()->tenant_id,
    'X-Environment'  => config('app.env'),
    'Authorization'  => 'Bearer ' . config('services.tokonomics.key'),
])->post(config('services.tokonomics.url') . '/proxy/openai/chat/completions', $payload);

Node.js:

const response = await fetch(`${PROXY_URL}/proxy/openai/chat/completions`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TOKONOMICS_KEY}`,
    'X-Feature-Name': 'support-bot',
    'X-Tenant-ID': req.user.tenantId,
    'X-Environment': process.env.NODE_ENV,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(payload)
});

Python:

import httpx
response = httpx.post(
    f"{PROXY_URL}/proxy/anthropic/messages",
    headers={
        "Authorization": f"Bearer {TOKONOMICS_KEY}",
        "X-Feature-Name": "code-reviewer",
        "X-Tenant-ID": current_tenant.id,
        "X-Environment": os.environ.get("ENV", "production"),
    },
    json=payload
)

The pattern is identical across languages: change the base URL and add three headers. The proxy handles the rest.

Frequently Asked Questions

What's the minimum tagging setup that gives useful insights?

At minimum, tag feature_name and environment on every request. This immediately tells you cost per feature and filters out staging noise. Add tenant_id when you need per-customer attribution. Start minimal and expand.

Does tagging slow down my API calls?

No. Headers are processed by the proxy in under 1ms. Your request latency is unchanged from the LLM provider's perspective — the headers are stripped before forwarding.

How do I retroactively add tags to existing features?

If you're adding a proxy layer to existing code, update each feature to pass the headers. This is typically a one-line change per feature (add headers to the HTTP client config). If you're instrumenting hundreds of endpoints, start with your highest-cost features first.

Can I use tags for automated optimization decisions?

Yes. Your routing config can use tags to select models: if feature == "classification" → use deepseek-v4-flash. Tags become the routing key that determines which model, which provider, and which budget counter to use. The tagging layer is the foundation for all downstream intelligence.

The Bottom Line

Per-feature cost attribution is the single most impactful monitoring change most SaaS teams can make. It costs one afternoon to implement. It makes every subsequent optimization decision faster, cheaper, and more precise.

Tag every LLM call. Know which feature owns every dollar of your AI spend. Optimize from data, not from guesses.

Tokonomics processes your feature tags automatically — real-time cost breakdown by feature, tenant, model, and environment — with zero changes to your LLM call logic.

About the authors: Written by the engineers behind Tokonomics. About → | Contact us →