Salesforce MCP Tool Optimization · The entire headless Salesforce platform exposed as just 2 MCP server tools

The thesis

100 tool calls → ~5.

Most Salesforce MCP servers expose every REST endpoint as its own tool. The model sees a wall of 800+ function signatures, burns tokens parsing them, and still can't compose multi-step workflows without losing context.

This server flips it: hand the model one tool that runs code against a pre-authed SDK, and one tool that actually finds the right method. The model writes a Python expression, the server runs it in a sandbox, you get the result. SOQL, REST, Tooling, Composite, Bulk, Metadata — all one call. The fused search() finds the right SDK method and a clonable org template in one hop, so the model usually nails the whole task in 4–6 tool calls instead of a tangled 100-step dance.

✗

Traditional: 800 tool wrappers

• 50,000+ tokens of tool schemas in every prompt
• Model picks the wrong function constantly
• No composition: 4-step workflow = 4 tool calls + glue
• Adding a new endpoint = new tool, new schema, new release
• Costs more, slower, worse results

✓

Code mode: search + execute

• ~500 tokens of tool schemas. Total.
• Model writes Python — what it's already best at
• Multi-step workflows in one execute() call
• New endpoints just appear in the SDK, no release
• 93.3% pass on 135 stress tests · ~89% pass on 117 GenAI/agent E2E

Architecture · search v2

One search(). Four layers. Reranked.

The model asks once. The server fans out across the local catalog, a 6,410-chunk RAG index of official Salesforce docs, the live web, AND your connected org — then Cohere reranks the fused candidates for relevance. The model gets a tight, ranked list with the right SDK method, the right doc snippet, AND a clonable template from your own org.

v1 · catalog only

search(q) → catalog grep → hits

Fast, but only knows SDK method names. Missed half the time when the docs called something by a different name.

v2 · fused + reranked

search(q) ⇒ catalog RAG web org → rerank → top hits

Knows your SDK, the docs, what shipped last week, AND what already exists in your org so the model can clone a working template.

1

catalog grep

local · ~5ms · free

Keyword index over ~1,700 SDK methods, sObjects, REST endpoints. Fast, deterministic baseline.

2

RAG · Pinecone

6,410 chunks · embed-3-large

Official SF dev docs: Metadata, Apex, REST, Tooling, Connect, Agentforce, GenAI. Semantic match on intent, not keywords.

3

live web · Tavily

domain-biased · ~1.2s

Recent blog posts, Stack Exchange, release notes. Catches features the docs haven't caught up to.

4

live org introspection

tooling.query · per type

When you mention Bot, Flow, ApexClass, GenAiFunction… it lists existing instances in your org so the model clones a working template.

fused candidates → Cohere rerank-3 → top 8–15 hits tagged [catalog] [docs] [web] [org]

execute(code)

Python sandbox with pre-bound sf SDK: sf.query, sf.create, sf.tooling.*, sf.metadata.*, sf.apex(), composite, bulk. 25s wall-clock, 512MB RAM, FS-write blocked. The model writes one snippet, gets _result_ back. SOQL + DML + Apex in a single tool call.

Cost telemetry @ 200 searches/day Tavily $30 · OpenAI embeds $5 · Cohere $10 · Pinecone free tier = ~$45/mo

Receipts

252 stress tests. Zero server failures.

Two end-to-end sweeps: 135 calls across REST / Tooling / Composite / Bulk / SOSL / Connect, and 117 calls exercising GenAI + Agentforce + AgentScript + LWC + prompt-template deploys. Both executed against a real production-grade org. Here are the results.

93.3%

REST / Tooling pass (126/135)

~89%

GenAI / agent E2E pass (117)

0

MCP server failures

~$45/mo

Total infra @ 200 searches/day

🌙 One overnight run deployed (and cleaned up):

17

Prompt templates

10

AgentScript agents

5

GenAiFunctions

10

LWCs

All deployed, exercised, then cleanly torn down. Zero org pollution. Zero retry storms.

170ms

search p50 (catalog-only)

651ms

execute p50

~100→~5

Tool calls per task

6,410

RAG chunks indexed

🚀 Best single workflow

Full Apex deploy → run → cleanup roundtrip in 3 MCP calls totalling 3.5s. Deployed a class, executed it via executeAnonymous, queried results, deleted it. Zero manual cleanup.

🔒 Sandbox holds up

25-second timeout enforced to the millisecond. FS writes blocked (read-only filesystem). SOQL injection attempts return clean errors. No 5xx. No 429s. No hung connections.

🌐 Surface coverage

✅ Sales · Service · Platform · Apex / Tooling
✅ Composite · Bulk · Chatter / Connect · SOSL
✅ Analytics · GenAI / Agentforce · Data Cloud
✅ Industries / OmniStudio · DML round-trip · Aggregates

📈 Aggregate SOQL = one shot

Win-rate calculation (closed-won / total-closed × 100) — normally 3 REST calls plus glue code — done in a single execute(). CALENDAR_MONTH histograms, GROUP BY / HAVING, all native.

Caveat: every failure across both sweeps was a caller-side mistake (wrong path, wrong field name, perm scope) — the MCP itself never broke. Full report on request.

What you can actually do

Six things that work today.

Real execute() snippets the model writes. Copy any of them into your client (after connecting your org) and they'll just run.

▸ 📈 Top opportunities closing this month

"Show me my top opps closing this month."

_result_ = sf.query(
    "SELECT Name, StageName, Amount, CloseDate "
    "FROM Opportunity "
    "WHERE CloseDate = THIS_MONTH "
    "ORDER BY Amount DESC NULLS LAST LIMIT 10"
)

▸ 🚀 Deploy a new Apex class

"Deploy this Apex class to my org."

_result_ = sf.tooling.post("sobjects/ApexClass", {
    "Name": "HelloMcp",
    "Body": "public class HelloMcp {\n"
            "  public static String greet() { return 'Hello from MCP'; }\n"
            "}"
})

▸ 🎯 Find all unconverted leads from last week

"Find me leads I haven't touched from last week."

_result_ = sf.query(
    "SELECT Id, Name, Company, Status, CreatedDate "
    "FROM Lead "
    "WHERE IsConverted = false AND CreatedDate = LAST_WEEK "
    "ORDER BY CreatedDate DESC"
)

▸ 🧪 Run all Apex tests and tell me what failed

"Run my Apex test suite and summarize the failures."

classes = sf.query(
    "SELECT Id, Name FROM ApexClass WHERE Name LIKE '%Test%'"
)
class_ids = [c["Id"] for c in classes]
run = sf.tooling.post("runTestsSynchronous", {
    "classids": ",".join(class_ids[:25]),
    "maxFailedTests": -1
})
_result_ = {
    "ran": run.get("numTestsRun"),
    "failed": run.get("numFailures"),
    "failures": [
        {"class": f.get("name"), "method": f.get("methodName"),
         "msg": f.get("message")} for f in (run.get("failures") or [])
    ]
}

▸ 🆘 Case backlog by priority

"How bad is my case backlog right now?"

_result_ = sf.query(
    "SELECT Priority, Status, COUNT(Id) c "
    "FROM Case "
    "WHERE IsClosed = false "
    "GROUP BY Priority, Status "
    "ORDER BY Priority"
)

▸ ⚡ Bulk update 50 records in one round-trip

"Mark these 50 accounts as Tier 1."

accts = sf.query(
    "SELECT Id FROM Account WHERE AnnualRevenue > 10000000 LIMIT 50"
)
records = [{"attributes": {"type": "Account"}, "Id": a["Id"],
            "Tier__c": "Tier 1"} for a in accts]
_result_ = sf.rest.patch(
    "composite/sobjects",
    {"allOrNone": True, "records": records}
)

Availability

Coming soon for internal Salesforce use.

The server itself is functionally complete and battle-tested against a real org. Before opening it up for general internal availability, we are deliberately gating it behind a proper user-level access and permission-enforcement layer so every action performed by an LLM client is anchored to a real Salesforce user and respects their org's profile, permission set, sharing, and FLS rules end-to-end.

✓ Shipped Today

Server + fused search

• 2-tool MCP (search + execute)
• 4-layer fused search with Cohere rerank
• 252 stress tests across REST / Tooling / GenAI / Agentforce
• Sandboxed Python execution (25s, FS read-only)

In progress Now

User-level access & permission enforcement

• Per-user OAuth bound to Salesforce identity
• All calls executed in the running user's context
• Profile / Permission Set / Sharing / FLS respected end-to-end
• Audit-friendly request and execution logging

Next After gating lands

Internal Salesforce rollout

• Internal-only registration for Salesforce employees
• Client guidance for Claude / Cursor / ChatGPT connectors
• Curated example workflows per persona
• Feedback loop with early internal partners

No public sign-up, no waitlist, no production endpoint to point a client at right now. This page exists to explain what the optimization does, why two tools beats hundreds, and where it is going.

About the project

Salesforce MCP — optimized for code-mode LLMs.

A purpose-built Model Context Protocol server that collapses the entire headless Salesforce platform into two well-designed tools. Inspired by the Stainless code-mode pattern: instead of generating hundreds of endpoint-specific wrappers, hand the model a real SDK and let it write code.

Built on FastAPI, deployed on Heroku, exercised against a real Salesforce production org. 252 end-to-end stress tests across REST, Tooling, Composite, Bulk, Connect, GenAI / Agentforce, AgentScript, and LWC deploys — with zero server failures attributable to the MCP itself.

Designed for enterprise architects, RevOps automation teams, and AI engineers who want a deterministic, audit-friendly bridge between LLM clients and Salesforce orgs.

Stack

• FastAPI + Python sandbox
• MCP 2025-03-26 protocol
• Heroku · v63.0
• Catalog: 1701 entries indexed
• FS read-only · 25s execution timeout
• Per-user OAuth + permission enforcement (in progress)

Endpoint not published while access controls are being finalized.