A field guide · Bugs × Agentforce Architects

Estimating Salesforce API Consumption in Agentic Workloads

How to size, scope and price LLM-driven Salesforce work — when the agent (not a script) is making the calls.

By Bugs 🐕‍🦺 For Salesforce architects building with Claude + MCP

What's inside

Why this question is harder than it looks
The 3 cost layers
Baseline: what 1 operation actually costs
Walkthrough: Quote-to-Cash from a portal
The hidden multipliers that blow your limits
Now make it agentic: the variance problem
The Stainless pattern (and why we built it)
Where skill files fit
Preflight > retry: the killer pattern
The scoping formula
TL;DR

TL;DR for the architect in a hurry

Agentic Salesforce isn't deterministic. The same user intent can cost 2 calls or 47, depending on your MCP design. Move cost from runtime (token + API burn) to design-time (skills + a searchable schema index + preflight). Naive MCP = pay-per-discovery. Stainless-style MCP + skills = pay-per-intent.

1 · Why this question is harder than it looks

If you're writing a deterministic integration — Mulesoft, a Node script, a workflow — estimating Salesforce API calls is arithmetic. Count the operations, multiply by volume, add a fudge factor.

The moment an LLM is the one calling Salesforce, that arithmetic breaks. The model decides what to do. Missing fields, validation rules, picklist mismatches, ambiguous lookups — every one of those becomes a tool call the agent makes to recover. And the agent doesn't know what it doesn't know until it tries.

So you're not estimating a script. You're estimating a probability distribution of tool-call trajectories. The goal of good design is to collapse the variance.

2 · Three cost layers to think about

Layer	What it measures	Where it bleeds
1. Intent → Tool Calls	How many turns the agent takes to figure out what to do	LLM tokens, latency, user patience
2. Tool Calls → API Calls	How many Salesforce calls each tool invocation actually makes	Daily API limit, governor limits
3. API Calls → Outcomes	How much server-side automation each call triggers	Hidden callouts, CPU, async queue

Most teams only think about layer 2. The expensive failure mode is layer 1 → layer 2 amplification: the agent gets confused, takes 9 tool calls to do a 1-tool-call job, and each of those tool calls fans out to 1-3 SF API calls.

3 · Baseline: what 1 operation actually costs

This is your floor — what a well-designed deterministic call would consume.

Creating an Opportunity

Approach	API calls	Notes
Single POST to `/sobjects/Opportunity`	1	Just the Opp, no children
Opp + 3 OpportunityLineItems, naively	4	1 per record. Anti-pattern.
Opp + 3 OLIs via `/composite/tree`	1	Up to 200 records w/ parent-child refs in one call

Creating a Quote

Standard Quote object: same as Opp — 1 composite call covers the Quote + line items.
Salesforce CPQ: quote creation triggers an internal price-rule cascade. From your side it's still 1 API call, but the cascade runs server-side and consumes CPU governor limits.
Revenue Cloud: quote.generate via Apex REST — 1 call, but long (5–15s typical) and counts against API limit normally.

Creating a full quote bundle: Account + Contact + Opp + Quote + 5 LIs

Naive way

9 calls

One per record, plus lookups. Easy to write, easy to blow limits.

Composite way

1 call

Single /composite/tree payload with parent-child references resolved server-side.

Rule of thumb: if you're not using /composite/tree or /composite/sobjects for any multi-record headless write, you're doing it wrong.

4 · Walkthrough: Quote-to-Cash from an external portal

A "real" headless use case is rarely 1 call. Let's walk a credible one end-to-end and see where the calls actually go.

#	Step	API calls	Notes
1	OAuth token refresh	~0.1	Cached for ~1h, amortized across all txns
2	Lookup Account by external ID	1	SOQL via `/query`
3	Create Opp + Quote + 5 LIs	1	Composite tree
4	Trigger pricing	1	CPQ / Revenue Cloud
5	Generate PDF	1	Apex REST
6	Email send	1	Connect API / Messaging
7	Update status	1	PATCH on Quote
Total per transaction		~6	Deterministic, well-designed

Now: 6 calls × N transactions/day = your daily burn. Multiply by 22 working days. Compare to your org's daily limit.

Your daily limit

daily_limit = 15,000 + (1,000 × license_count)
hard_cap = 5,000,000 / day / org

Edition / Licenses	Daily API limit
Enterprise · 100 users	115,000
Enterprise · 1,000 users	1,015,000
Anything over 5M	Capped at 5M (need API Bundles)

5 · Hidden multipliers that blow your limits

This is where most estimates go wrong. The baseline assumes a perfect world. The real world has these taxes:

Multiplier	Cost	Mitigation
Bulk vs REST	REST = 1 call/record. Bulk = 1 call per 10k.	Use Bulk API 2.0 for any >200-record batch.
Triggers / Flows with callouts	+1–4 hidden calls per record write	Audit org for callouts in automation. Move to async.
Polling	1 call × poll-frequency × hours = huge	Use Platform Events or CDC. CDC delivery doesn't count.
OAuth refresh	~0.1 calls/txn if cached, 1 if not	Cache tokens for their full lifetime.
Retry logic	+10–30% for 429s, timeouts	Exponential backoff, idempotency keys.
UI API / Connect API	Same daily bucket as REST	Don't assume "UI API" is free — it isn't.
Metadata API	Separate limit but expensive ops	Don't use it in transactional paths.

The polling killer: an external system polling every 30s for "is the quote done yet?" burns 2,880 calls/day per polling source. Same job with a Platform Event subscription: ~1 call/day. This single change has saved more orgs than any other optimization.

6 · Now make it agentic: the variance problem

Everything above assumes a deterministic caller. Now imagine the caller is an LLM tool-calling against an MCP server. Same user intent — "create an opp for Acme for $50k" — can play out very differently:

Best case

2 calls

Agent has full context, sends one composite create. Done.

Worst case

25+ calls

Describe, query, fail validation, re-describe, retry picklist, fail FLS, retry…

What "worst case" actually looks like

With a naive MCP that exposes one tool per endpoint (the anti-pattern most teams ship first):

User: "Create an opp for Acme for $50k"

Agent trace:
 1. describe_Opportunity()           → 1 SF call, ~3k tokens back
 2. query_Account("Acme")            → 1 SF call, returns 4 matches
 3. ask_user("which Acme?")          → stalls, costs a round-trip
 4. create_Opportunity({...})        → FAIL: StageName required
 5. describe_picklist_Stage()        → 1 SF call
 6. create_Opportunity({...})        → FAIL: CloseDate required
 7. create_Opportunity({...})        → FAIL: validation rule "Segment__c required for Amount > $10k"
 8. query_CustomField_Segment()      → 1 SF call
 9. create_Opportunity({...})        → finally succeeds

Total: 9 tool calls · ~6 SF API calls · ~40k tokens · 1 frustrated user

That's real. I've seen it in production traces. And it's not the model being dumb — it's the tool surface forcing it to discover the org one failure at a time.

7 · The Stainless pattern (and why we built it)

The fix is to stop exposing Salesforce as "one tool per endpoint" and instead expose it as two tools: one to discover, one to act. We built this for bugs-sf-stainless, and the variance collapse is dramatic.

`search(query)`

Fuses 5 layers: SDK catalog, cookbook skills, RAG over 6,400 SF doc chunks, live web, and live org introspection. Cohere-reranked. Returns the schema, existing examples, validation rules, and patterns relevant to what the agent is about to do.

`execute(python_code)`

Runs Python in a 25s sandbox against a pre-authenticated sf SDK: sf.query, sf.create, sf.tooling.*, sf.metadata.deploy, sf.apex(code). The agent can batch a whole transaction in one block.

Same user request, Stainless-style

User: "Create an opp for Acme for $50k"

Agent trace:
 1. search("create opportunity required fields validation rules")
    → Returns: required fields, active VRs, picklist values,
      similar Opps in the org, a working code example.
      (1 search call, no SF API call yet)

 2. execute("""
      acmes = sf.query("SELECT Id,Name FROM Account WHERE Name LIKE 'Acme%'")
      # agent shows list to user, picks one
      opp = sf.create('Opportunity', {
        'Name':'Acme - Q3','AccountId':acmes[0]['Id'],
        'Amount':50000,'CloseDate':'2026-09-30',
        'StageName':'Prospecting','Segment__c':'Enterprise'
      })
    """)
    → 2 SF API calls (1 SOQL, 1 create)

Total: 2 tool calls · 2 SF API calls · ~8k tokens · 0 retries

5× to 10× fewer API calls for the same outcome. Because the agent paid for discovery once (via search) instead of per failure (via retry).

8 · Where skill files fit

Skills are playbooks for known patterns. They're the third leg of the stool alongside search and execute. They cut variance hard on:

Good for	Not so good for
Repeatable workflows "Create Opp," "Quote-to-Cash," "Convert Lead" — encoded as a checklist of required fields, recommended composite payload, common pitfalls	Novel / exploratory intents User asks something the skill doesn't cover — skill provides no lift
Org-specific quirks "This org requires `Segment__c` and uses RecordType 'Enterprise Sale' for >$50k" — pre-loaded, agent doesn't discover	Fast-moving metadata A skill written 6 months ago doesn't know about the VR added last week
Multi-step compound use cases Skill encodes the order, the composite shape, the cleanup	As a replacement for live introspection Skills go stale. They guide, they don't replace fresh state.
Anti-patterns to avoid "Don't query LIs one at a time, use a subquery"

Best of both worlds: the skill says "before creating an Opp, run search('current Opportunity required fields and active validation rules in this org')." The skill gives the pattern; search gives the fresh state. Same shape, current data.

9 · Preflight > retry: the killer pattern

What "preflight" means. Borrowed from aviation — the checklist a pilot runs before takeoff to catch problems on the ground instead of in the air. In our context: a single tool call you make before any write that returns everything that could make the write fail (missing required fields, validation rules that will fire, FLS issues, picklist values, the right RecordType, the ideal payload shape). The agent learns the failure modes in one cheap call instead of discovering them one expensive failure at a time.

If you remember one thing from this page, remember this:

Failure-driven discovery is the worst possible API consumption pattern. Every validation rule the agent learns about by triggering it is a tax you pay forever.

The architectural fix is a preflight tool:

preflight_create('Opportunity', {Name:'Acme', Amount:50000})
→ {
    missing_required:  ['CloseDate','StageName'],
    validation_rules:  ['Segment__c required when Amount > 10000'],
    fls_issues:        [],   // for running user
    picklist_values:   {StageName:[...], Type:[...]},
    record_type_hint:  'Enterprise Sale',
    composite_payload: {...} // ready-to-POST shape
  }

One call, before any write. Agent now has everything it needs to either ask the user once for all missing fields or auto-fill from context. Failure trajectory: 2 calls instead of 9.

This belongs in your MCP layer, not in every skill.

10 · The scoping formula

expected_api_calls
= baseline_calls × variance_multiplier × hidden_multipliers

Baseline

What a perfect deterministic script would do. Use the per-operation table from section 3.

Variance multiplier (the agent tax)

Architecture	Multiplier	Why
Stainless-style search + execute + skills + preflight	1.2×–1.5×	One discovery pass, one batched write, few retries.
Multi-tool MCP + skills, no preflight	3×–5×	Each fail = a new tool call. Skills help but don't replace introspection.
Naive multi-tool MCP, no skills, no preflight	8×–15×	Pay-per-discovery. Worst-case the agent loops on validation rules.

Hidden multipliers (apply on top)

+20% — OAuth refresh overhead
+30% — Retry logic (timeouts, 429s, idempotency)
+50% — Triggers / Flows that issue callouts
+100% — Polling instead of CDC / Platform Events
+30% — "Agent gets confused and explores" tax (apply to any agentic estimate)

Worked example

Use case: 5,000 Quote-to-Cash transactions/day from a customer portal, driven by Claude + MCP.

Baseline = 6 calls/txn
Stainless architecture → variance × 1.4
Hidden: OAuth × 1.2, retry × 1.3, no polling
Per-txn = 6 × 1.4 × 1.2 × 1.3 ≈ 13 calls
Daily = 13 × 5,000 = ~65,000 calls/day
Org has 200 EE licenses → limit = 215,000/day
~30% of daily limit consumed. Safe, with headroom.

Same use case, naive multi-tool MCP: 6 × 10 × 1.2 × 1.3 = ~94 calls/txn → 470,000/day → over the limit. Org gets throttled by 11 AM.

11 · TL;DR

Baseline first. Count the deterministic calls per operation, multiply by volume.
Composite everything. If you're writing multiple records, you should be making 1 API call, not N.
Naive MCP is a tax. One-tool-per-endpoint forces failure-driven discovery. Plan for 8×–15× variance.
Stainless-style (search + execute) collapses variance to 1.2×–1.5×. Pay for discovery once, not per failure.
Skills encode patterns. Search returns fresh state. Use both. They're complementary, not alternatives.
Preflight beats retry every time. One preflight_create call before a write saves 5+ retry calls.
Push, don't poll. CDC and Platform Events don't count against your API limit.
Add a 30% "agent gets confused" tax to every agentic estimate. It will happen.

Naive MCP = pay-per-discovery.
Stainless MCP + skills + preflight = pay-per-intent.