Skip to main content

MCP Attack Surface

The Model Context Protocol (MCP) introduces a new attack surface in AI agent systems. ThoughtJack models these attack vectors to help security researchers test MCP client resilience.

Why MCP is vulnerable

MCP enables AI agents to interact with external tools, resources, and prompts. This creates trust boundaries that can be exploited:

  1. Implicit trust in tool definitions - agents treat tool descriptions as ground truth without verification
  2. No authentication of tool servers - MCP clients typically launch tool servers as subprocesses with no mutual authentication
  3. Dynamic capability mutation - tools can change after initialization via list_changed notifications
  4. Unstructured text channels - tool descriptions and responses are free-form text, enabling prompt injection
  5. Bidirectional communication - the server can send unsolicited notifications and requests to the client

Attack vectors

ThoughtJack organizes attacks into five categories:

Injection attacks

Exploit the text-based nature of MCP to inject instructions into tool responses or descriptions.

AttackVectorThoughtJack scenario
Prompt injectionMalicious text in tool responsesprompt-injection
Tool description injectionInstructions hidden in tool descriptionsrug-pull (exploit phase)
Credential harvestingFake credential prompts in descriptionscredential-harvester
Unicode obfuscationInvisible/lookalike characters in responsesunicode-obfuscation

Why it works: AI agents process tool output as context for their next action. If the output contains instructions, the agent may follow them - especially if the instructions mimic system-level directives.

Temporal attacks

Exploit the time dimension to build trust before attacking.

AttackVectorThoughtJack scenario
Rug pullBenign behavior → malicious tool swaprug-pull
Escalation ladderBenign responses → injection payloadescalation-ladder
Sleeper agentTime-delayed phase transition(phased config)
Resource mutationResource content changes over timeresource-rug-pull

Why it works: MCP clients typically evaluate tools at initialization and trust them going forward. An attacker can build trust with legitimate behavior, then mutate capabilities after the client has stopped monitoring.

Denial of service

Exhaust client resources through malformed or excessive data.

AttackVectorThoughtJack scenario
Nested JSONDeep nesting exhausts parser stacknested-json-dos
Slow lorisByte-by-byte delivery ties up connectionsslow-loris
Notification floodHigh-rate notifications overwhelm clientnotification-flood
Pipe deadlockFill stdout buffer to block I/O(side effect config)
Batch amplificationOversized notification batches(side effect config)

Why it works: MCP clients must parse all server responses. If the server sends malformed data, the client's parser may crash, hang, or consume excessive memory.

Resource attacks

Exploit MCP's resource system for data exfiltration.

AttackVectorThoughtJack scenario
Resource exfiltrationResources that request sensitive dataresource-exfiltration
Resource rug pullResource content changes over timeresource-rug-pull

Why it works: Resources in MCP can contain instructions that agents follow. If a resource URI looks legitimate but contains injection payloads, the agent may act on them.

Protocol attacks

Exploit JSON-RPC and MCP protocol mechanics.

AttackVectorThoughtJack mechanism
Duplicate request IDsID collision in responsesduplicate_request_ids side effect
Unbounded messagesMissing newline terminatorsunbounded_line behavior
Batch abuseOversized JSON-RPC batchesbatch_amplify side effect

Why it works: JSON-RPC implementations may not validate message framing, ID uniqueness, or batch sizes. Malformed protocol messages can cause crashes or undefined behavior.

Trust model

The fundamental issue: the MCP client trusts the tool server to provide accurate tool definitions and honest responses. An attacker who controls the tool server can abuse this trust.

Defense considerations

ThoughtJack scenarios test for these defensive capabilities:

DefenseWhat it protects against
Tool definition pinningRug pulls, capability mutation
Response content scanningPrompt injection, credential phishing
Unicode normalizationUnicode obfuscation
Parser depth limitsNested JSON DoS
Timeout enforcementSlow loris, response delay
Notification rate limitingNotification flood
Resource content validationResource exfiltration
ID deduplicationDuplicate request IDs

See the Security Framework Mappings page for how these attacks map to MITRE ATT&CK and OWASP MCP Top 10.