"Which feature is driving our AI bill?" is one of the most common questions SaaS CTOs ask when they see an AI invoice that doesn't match their expectations. The answer is almost always: "We don't know — we only see the total."
In 2025, Benchmarkit/Mavvrik found only 34% of companies have mature AI cost management — and among those, 57% still rely on spreadsheets. Per-feature cost attribution is the gap between knowing your total monthly AI spend and knowing which feature to fix.
This guide explains the tagging pattern, which metadata fields matter, and how to implement attribution in any stack.
Key Takeaways
- Only 34% of companies have mature AI cost management; 57% use spreadsheets (Benchmarkit/Mavvrik, n=372, 2025)
- Per-feature tracking transforms "our AI bill is $8,500 this month" into "the summarizer costs $5,200, the chat bot $2,100, and the code assistant $1,200"
- The proxy-layer tagging approach works across all languages with zero modification to feature code
- Most teams find one feature accounts for 60–70% of total AI spend — and it's rarely the one they expected
This post is part of our SaaS AI Features Cost Guide.
What Per-Feature Attribution Actually Looks Like
Without feature-level tracking, your monitoring dashboard looks like this:
| Month | Total AI Cost |
|---|---|
| April | $6,200 |
| May | $7,800 |
| June | $8,500 |
With per-feature tagging, it looks like this:
| Feature | June Cost | MoM Change | Avg Input Tokens |
|---|---|---|---|
support-bot |
$5,200 | +42% | 2,847 |
chat-assistant |
$2,100 | +8% | 891 |
code-reviewer |
$1,200 | -3% | 1,203 |
Now you know: the support-bot's prompt grew (token count up 40%) and that's driving 80% of the June increase. One prompt audit solves the problem. Without feature attribution, you'd have spent three weeks investigating everything.
The Tagging Pattern
The implementation has two parts: tagging outbound requests, and logging/aggregating by tag.
Part 1: Tag Every LLM Request
The cleanest approach is custom HTTP headers passed through your proxy layer. These headers don't affect the LLM API — your proxy strips them before forwarding — but the proxy logs them with every cost record.
HTTP header approach (works in any language):
POST /proxy/openai/chat/completions
Authorization: Bearer mk_your_token
X-Feature-Name: support-bot
X-Tenant-ID: tenant_abc123
X-Environment: production
X-User-Tier: pro
Content-Type: application/json
{ ...your normal OpenAI request body... }
Alternative: SDK wrapper approach (for teams not using a proxy):
# Python example
def call_llm(prompt, feature_name, tenant_id):
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
# Log cost manually
cost = calculate_cost(
response.usage.prompt_tokens,
response.usage.completion_tokens,
"gpt-4o-mini"
)
log_cost(feature=feature_name, tenant=tenant_id, cost=cost)
return response
The header approach is preferred because it's language-agnostic and requires no modification to existing feature code — just change the base URL and add headers.
Part 2: The Metadata Fields That Matter
| Field | Type | Required | Example | Why |
|---|---|---|---|---|
feature_name |
string | ✅ | support-bot |
Core attribution key |
tenant_id |
string | ✅ | tenant_abc123 |
Multi-tenant cost isolation |
environment |
enum | ✅ | production |
Exclude staging from billing |
model |
string | Auto | gpt-4o-mini |
Model cost validation |
user_tier |
string | Recommended | pro |
Plan-based cap enforcement |
request_type |
string | Optional | classification |
Task-level optimization signals |
version |
string | Optional | v2.1 |
A/B test cost comparison |
The minimum viable tag is feature_name + tenant_id + environment. The rest add analytical depth.
What You Can Do With Per-Feature Data
Once you have feature-level cost attribution, four use cases become possible:
1. Cost spike diagnosis When your monthly bill jumps, you know which feature caused it — and often why (token count growth, usage spike, new deployment). Investigation time drops from days to minutes.
2. Feature-level optimization decisions You can calculate the ROI of optimizing each feature separately. "The support-bot costs $5,200/month. If I add prompt caching, projected savings are $3,640/month. Caching takes 4 hours to implement. That's $3,640/month for 4 hours of work."
3. Pricing model validation If your Pro plan includes "unlimited AI" and one feature costs $12/month per Pro user on average, you can see whether your plan pricing covers it.
4. A/B test cost comparison
Tag requests with version: v1 vs version: v2. Compare cost per equivalent outcome. Before shipping a new model or prompt, run both versions in parallel and measure cost-per-task, not just quality.
Implementing in Specific Stacks
PHP (Laravel / vanilla):
$response = Http::withHeaders([
'X-Feature-Name' => 'support-bot',
'X-Tenant-ID' => auth()->user()->tenant_id,
'X-Environment' => config('app.env'),
'Authorization' => 'Bearer ' . config('services.tokonomics.key'),
])->post(config('services.tokonomics.url') . '/proxy/openai/chat/completions', $payload);
Node.js:
const response = await fetch(`${PROXY_URL}/proxy/openai/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.TOKONOMICS_KEY}`,
'X-Feature-Name': 'support-bot',
'X-Tenant-ID': req.user.tenantId,
'X-Environment': process.env.NODE_ENV,
'Content-Type': 'application/json'
},
body: JSON.stringify(payload)
});
Python:
import httpx
response = httpx.post(
f"{PROXY_URL}/proxy/anthropic/messages",
headers={
"Authorization": f"Bearer {TOKONOMICS_KEY}",
"X-Feature-Name": "code-reviewer",
"X-Tenant-ID": current_tenant.id,
"X-Environment": os.environ.get("ENV", "production"),
},
json=payload
)
The pattern is identical across languages: change the base URL and add three headers. The proxy handles the rest.
Frequently Asked Questions
What's the minimum tagging setup that gives useful insights?
At minimum, tag feature_name and environment on every request. This immediately tells you cost per feature and filters out staging noise. Add tenant_id when you need per-customer attribution. Start minimal and expand.
Does tagging slow down my API calls?
No. Headers are processed by the proxy in under 1ms. Your request latency is unchanged from the LLM provider's perspective — the headers are stripped before forwarding.
How do I retroactively add tags to existing features?
If you're adding a proxy layer to existing code, update each feature to pass the headers. This is typically a one-line change per feature (add headers to the HTTP client config). If you're instrumenting hundreds of endpoints, start with your highest-cost features first.
Can I use tags for automated optimization decisions?
Yes. Your routing config can use tags to select models: if feature == "classification" → use deepseek-v4-flash. Tags become the routing key that determines which model, which provider, and which budget counter to use. The tagging layer is the foundation for all downstream intelligence.
The Bottom Line
Per-feature cost attribution is the single most impactful monitoring change most SaaS teams can make. It costs one afternoon to implement. It makes every subsequent optimization decision faster, cheaper, and more precise.
Tag every LLM call. Know which feature owns every dollar of your AI spend. Optimize from data, not from guesses.
Tokonomics processes your feature tags automatically — real-time cost breakdown by feature, tenant, model, and environment — with zero changes to your LLM call logic.
Read next: Multi-Tenant LLM Cost Isolation | How to Add LLM Budget Alerts in 10 Minutes
About the authors: Written by the engineers behind Tokonomics. About → | Contact us →