A field guide Β· Bugs Γ— Agentforce Architects

Estimating Salesforce API Consumption in Agentic Workloads

How to size, scope and price LLM-driven Salesforce work β€” when the agent (not a script) is making the calls.

By Bugs πŸ•β€πŸ¦Ί For Salesforce architects building with Claude + MCP
What's inside
  1. Why this question is harder than it looks
  2. The 3 cost layers
  3. Baseline: what 1 operation actually costs
  4. Walkthrough: Quote-to-Cash from a portal
  5. The hidden multipliers that blow your limits
  6. Now make it agentic: the variance problem
  7. The Stainless pattern (and why we built it)
  8. Where skill files fit
  9. Preflight > retry: the killer pattern
  10. The scoping formula
  11. TL;DR

TL;DR for the architect in a hurry

Agentic Salesforce isn't deterministic. The same user intent can cost 2 calls or 47, depending on your MCP design. Move cost from runtime (token + API burn) to design-time (skills + a searchable schema index + preflight). Naive MCP = pay-per-discovery. Stainless-style MCP + skills = pay-per-intent.

1 Β· Why this question is harder than it looks

If you're writing a deterministic integration β€” Mulesoft, a Node script, a workflow β€” estimating Salesforce API calls is arithmetic. Count the operations, multiply by volume, add a fudge factor.

The moment an LLM is the one calling Salesforce, that arithmetic breaks. The model decides what to do. Missing fields, validation rules, picklist mismatches, ambiguous lookups β€” every one of those becomes a tool call the agent makes to recover. And the agent doesn't know what it doesn't know until it tries.

So you're not estimating a script. You're estimating a probability distribution of tool-call trajectories. The goal of good design is to collapse the variance.

2 Β· Three cost layers to think about

LayerWhat it measuresWhere it bleeds
1. Intent β†’ Tool CallsHow many turns the agent takes to figure out what to doLLM tokens, latency, user patience
2. Tool Calls β†’ API CallsHow many Salesforce calls each tool invocation actually makesDaily API limit, governor limits
3. API Calls β†’ OutcomesHow much server-side automation each call triggersHidden callouts, CPU, async queue

Most teams only think about layer 2. The expensive failure mode is layer 1 β†’ layer 2 amplification: the agent gets confused, takes 9 tool calls to do a 1-tool-call job, and each of those tool calls fans out to 1-3 SF API calls.

3 Β· Baseline: what 1 operation actually costs

This is your floor β€” what a well-designed deterministic call would consume.

Creating an Opportunity

ApproachAPI callsNotes
Single POST to /sobjects/Opportunity1Just the Opp, no children
Opp + 3 OpportunityLineItems, naively41 per record. Anti-pattern.
Opp + 3 OLIs via /composite/tree1Up to 200 records w/ parent-child refs in one call

Creating a Quote

Creating a full quote bundle: Account + Contact + Opp + Quote + 5 LIs

Naive way

9 calls

One per record, plus lookups. Easy to write, easy to blow limits.

Composite way

1 call

Single /composite/tree payload with parent-child references resolved server-side.

Rule of thumb: if you're not using /composite/tree or /composite/sobjects for any multi-record headless write, you're doing it wrong.

4 Β· Walkthrough: Quote-to-Cash from an external portal

A "real" headless use case is rarely 1 call. Let's walk a credible one end-to-end and see where the calls actually go.

#StepAPI callsNotes
1OAuth token refresh~0.1Cached for ~1h, amortized across all txns
2Lookup Account by external ID1SOQL via /query
3Create Opp + Quote + 5 LIs1Composite tree
4Trigger pricing1CPQ / Revenue Cloud
5Generate PDF1Apex REST
6Email send1Connect API / Messaging
7Update status1PATCH on Quote
Total per transaction~6Deterministic, well-designed

Now: 6 calls Γ— N transactions/day = your daily burn. Multiply by 22 working days. Compare to your org's daily limit.

Your daily limit

daily_limit = 15,000 + (1,000 Γ— license_count)
hard_cap = 5,000,000 / day / org
Edition / LicensesDaily API limit
Enterprise Β· 100 users115,000
Enterprise Β· 1,000 users1,015,000
Anything over 5MCapped at 5M (need API Bundles)

5 Β· Hidden multipliers that blow your limits

This is where most estimates go wrong. The baseline assumes a perfect world. The real world has these taxes:

MultiplierCostMitigation
Bulk vs RESTREST = 1 call/record. Bulk = 1 call per 10k.Use Bulk API 2.0 for any >200-record batch.
Triggers / Flows with callouts+1–4 hidden calls per record writeAudit org for callouts in automation. Move to async.
Polling1 call Γ— poll-frequency Γ— hours = hugeUse Platform Events or CDC. CDC delivery doesn't count.
OAuth refresh~0.1 calls/txn if cached, 1 if notCache tokens for their full lifetime.
Retry logic+10–30% for 429s, timeoutsExponential backoff, idempotency keys.
UI API / Connect APISame daily bucket as RESTDon't assume "UI API" is free β€” it isn't.
Metadata APISeparate limit but expensive opsDon't use it in transactional paths.
The polling killer: an external system polling every 30s for "is the quote done yet?" burns 2,880 calls/day per polling source. Same job with a Platform Event subscription: ~1 call/day. This single change has saved more orgs than any other optimization.

6 Β· Now make it agentic: the variance problem

Everything above assumes a deterministic caller. Now imagine the caller is an LLM tool-calling against an MCP server. Same user intent β€” "create an opp for Acme for $50k" β€” can play out very differently:

Best case
2 calls

Agent has full context, sends one composite create. Done.

Worst case
25+ calls

Describe, query, fail validation, re-describe, retry picklist, fail FLS, retry…

What "worst case" actually looks like

With a naive MCP that exposes one tool per endpoint (the anti-pattern most teams ship first):

User: "Create an opp for Acme for $50k"

Agent trace:
 1. describe_Opportunity()           β†’ 1 SF call, ~3k tokens back
 2. query_Account("Acme")            β†’ 1 SF call, returns 4 matches
 3. ask_user("which Acme?")          β†’ stalls, costs a round-trip
 4. create_Opportunity({...})        β†’ FAIL: StageName required
 5. describe_picklist_Stage()        β†’ 1 SF call
 6. create_Opportunity({...})        β†’ FAIL: CloseDate required
 7. create_Opportunity({...})        β†’ FAIL: validation rule "Segment__c required for Amount > $10k"
 8. query_CustomField_Segment()      β†’ 1 SF call
 9. create_Opportunity({...})        β†’ finally succeeds

Total: 9 tool calls Β· ~6 SF API calls Β· ~40k tokens Β· 1 frustrated user

That's real. I've seen it in production traces. And it's not the model being dumb β€” it's the tool surface forcing it to discover the org one failure at a time.

7 Β· The Stainless pattern (and why we built it)

The fix is to stop exposing Salesforce as "one tool per endpoint" and instead expose it as two tools: one to discover, one to act. We built this for bugs-sf-stainless, and the variance collapse is dramatic.

search(query)

Fuses 5 layers: SDK catalog, cookbook skills, RAG over 6,400 SF doc chunks, live web, and live org introspection. Cohere-reranked. Returns the schema, existing examples, validation rules, and patterns relevant to what the agent is about to do.

execute(python_code)

Runs Python in a 25s sandbox against a pre-authenticated sf SDK: sf.query, sf.create, sf.tooling.*, sf.metadata.deploy, sf.apex(code). The agent can batch a whole transaction in one block.

Same user request, Stainless-style

User: "Create an opp for Acme for $50k"

Agent trace:
 1. search("create opportunity required fields validation rules")
    β†’ Returns: required fields, active VRs, picklist values,
      similar Opps in the org, a working code example.
      (1 search call, no SF API call yet)

 2. execute("""
      acmes = sf.query("SELECT Id,Name FROM Account WHERE Name LIKE 'Acme%'")
      # agent shows list to user, picks one
      opp = sf.create('Opportunity', {
        'Name':'Acme - Q3','AccountId':acmes[0]['Id'],
        'Amount':50000,'CloseDate':'2026-09-30',
        'StageName':'Prospecting','Segment__c':'Enterprise'
      })
    """)
    β†’ 2 SF API calls (1 SOQL, 1 create)

Total: 2 tool calls Β· 2 SF API calls Β· ~8k tokens Β· 0 retries
5Γ— to 10Γ— fewer API calls for the same outcome. Because the agent paid for discovery once (via search) instead of per failure (via retry).

8 Β· Where skill files fit

Skills are playbooks for known patterns. They're the third leg of the stool alongside search and execute. They cut variance hard on:

Good forNot so good for
Repeatable workflows "Create Opp," "Quote-to-Cash," "Convert Lead" β€” encoded as a checklist of required fields, recommended composite payload, common pitfallsNovel / exploratory intents User asks something the skill doesn't cover β€” skill provides no lift
Org-specific quirks "This org requires Segment__c and uses RecordType 'Enterprise Sale' for >$50k" β€” pre-loaded, agent doesn't discoverFast-moving metadata A skill written 6 months ago doesn't know about the VR added last week
Multi-step compound use cases Skill encodes the order, the composite shape, the cleanupAs a replacement for live introspection Skills go stale. They guide, they don't replace fresh state.
Anti-patterns to avoid "Don't query LIs one at a time, use a subquery"
Best of both worlds: the skill says "before creating an Opp, run search('current Opportunity required fields and active validation rules in this org')." The skill gives the pattern; search gives the fresh state. Same shape, current data.

9 Β· Preflight > retry: the killer pattern

What "preflight" means. Borrowed from aviation β€” the checklist a pilot runs before takeoff to catch problems on the ground instead of in the air. In our context: a single tool call you make before any write that returns everything that could make the write fail (missing required fields, validation rules that will fire, FLS issues, picklist values, the right RecordType, the ideal payload shape). The agent learns the failure modes in one cheap call instead of discovering them one expensive failure at a time.

If you remember one thing from this page, remember this:

Failure-driven discovery is the worst possible API consumption pattern. Every validation rule the agent learns about by triggering it is a tax you pay forever.

The architectural fix is a preflight tool:

preflight_create('Opportunity', {Name:'Acme', Amount:50000})
β†’ {
    missing_required:  ['CloseDate','StageName'],
    validation_rules:  ['Segment__c required when Amount > 10000'],
    fls_issues:        [],   // for running user
    picklist_values:   {StageName:[...], Type:[...]},
    record_type_hint:  'Enterprise Sale',
    composite_payload: {...} // ready-to-POST shape
  }

One call, before any write. Agent now has everything it needs to either ask the user once for all missing fields or auto-fill from context. Failure trajectory: 2 calls instead of 9.

This belongs in your MCP layer, not in every skill.

10 Β· The scoping formula

expected_api_calls
  = baseline_calls Γ— variance_multiplier Γ— hidden_multipliers

Baseline

What a perfect deterministic script would do. Use the per-operation table from section 3.

Variance multiplier (the agent tax)

ArchitectureMultiplierWhy
Stainless-style search + execute + skills + preflight1.2×–1.5Γ—One discovery pass, one batched write, few retries.
Multi-tool MCP + skills, no preflight3×–5Γ—Each fail = a new tool call. Skills help but don't replace introspection.
Naive multi-tool MCP, no skills, no preflight8×–15Γ—Pay-per-discovery. Worst-case the agent loops on validation rules.

Hidden multipliers (apply on top)

Worked example

Use case: 5,000 Quote-to-Cash transactions/day from a customer portal, driven by Claude + MCP.

Same use case, naive multi-tool MCP: 6 Γ— 10 Γ— 1.2 Γ— 1.3 = ~94 calls/txn β†’ 470,000/day β†’ over the limit. Org gets throttled by 11 AM.

11 Β· TL;DR

  1. Baseline first. Count the deterministic calls per operation, multiply by volume.
  2. Composite everything. If you're writing multiple records, you should be making 1 API call, not N.
  3. Naive MCP is a tax. One-tool-per-endpoint forces failure-driven discovery. Plan for 8×–15Γ— variance.
  4. Stainless-style (search + execute) collapses variance to 1.2×–1.5Γ—. Pay for discovery once, not per failure.
  5. Skills encode patterns. Search returns fresh state. Use both. They're complementary, not alternatives.
  6. Preflight beats retry every time. One preflight_create call before a write saves 5+ retry calls.
  7. Push, don't poll. CDC and Platform Events don't count against your API limit.
  8. Add a 30% "agent gets confused" tax to every agentic estimate. It will happen.
Naive MCP = pay-per-discovery.
Stainless MCP + skills + preflight = pay-per-intent.