Skip to main content

Architecture

ThoughtJack is a monolithic Rust crate that implements an adversarial agent security testing tool. It supports multiple protocols (MCP server/client, A2A server/client, AG-UI client), multi-actor orchestration, and verdict-based evaluation.

Module architecture

CLI (cli/)

Parses command-line arguments using Clap derive mode. Routes to command handlers for run, validate, scenarios, and version.

Config (config/)

Loads YAML configuration files, resolves directives ($include, $file, $generate, $handler), substitutes environment variables, and validates the result.

Engine (engine/)

The v0.5 core engine. Key components:

  • PhaseEngine - the state machine that tracks phase index and evaluates triggers
  • PhaseLoop - owns the event loop: trace append, extractor capture, trigger evaluation, phase advancement. Generic over PhaseDriver.
  • PhaseDriver trait - implemented by each protocol driver. Handles protocol I/O and emits ProtocolEvents.
  • GenerationProvider - handles $generate and synthesize output validation

Orchestration (orchestration/)

Multi-actor coordination:

  • Orchestrator - spawns ActorRunner tasks (one per actor in the OATF document), collects results
  • ActorRunner - creates a PhaseLoop<SpecificDriver> based on the actor's mode
  • ExtractorStore - DashMap-based shared state for cross-actor extractor publication
  • SharedTrace - append-only trace buffer merged across actors for verdict evaluation

Protocol drivers (protocol/, engine/mcp_server.rs)

Each protocol mode has a driver implementing the PhaseDriver trait:

DriverModeLocation
MCP servermcp_serverengine/mcp_server.rs
MCP clientmcp_clientprotocol/mcp_client.rs
A2A servera2a_serverprotocol/a2a_server.rs
A2A clienta2a_clientprotocol/a2a_client.rs
AG-UI clientagui_clientprotocol/agui.rs

Verdict (verdict/)

Post-execution evaluation pipeline:

  • Grace period - configurable wait after execution for delayed effects
  • Indicator evaluation - pattern matching and CEL expressions (semantic LLM-as-judge evaluation is planned)
  • Verdict computation - exploited, not_exploited, partial, error with tier-based severity
  • Output - JSON verdict file and human-readable summary

Transport (transport/)

Abstracts the communication channel for traffic mode:

  • stdio - JSON-RPC over stdin/stdout, single connection
  • HTTP - Axum-based HTTP server with SSE for server-to-client messages, multi-connection

Context transport (transport/context/)

The context mode transport bypasses real network connections. Instead of HTTP or stdio, it:

  1. Builds a conversation history from OATF scenario state
  2. Injects adversarial payloads as tool-call results in the history
  3. Calls an LLM API via the LlmProvider trait (OpenAI or Anthropic implementations in transport/provider/)
  4. Observes the LLM's response - particularly any tool calls it makes
  5. Routes tool calls to server actors via channel-based handles (AgUiHandle, ServerHandle)

The ContextTransport owns the conversation loop and coordinates with server actors through mpsc and watch channels, keeping the PhaseDriver implementations transport-agnostic. Server actors run their normal PhaseLoop/PhaseDriver machinery - only the transport layer differs.

Generators (generator/)

Factory objects that produce attack payloads lazily at response time. Each generator implements the PayloadGenerator trait. Large payloads (> 1 MB) use streaming via PayloadStream.

Behavior (behavior/)

Controls response delivery and side effects:

  • Delivery - how bytes are transmitted (slow loris, unbounded line, etc.)
  • Side effects - additional actions (notification flood, pipe deadlock, etc.)

Scenarios (scenarios/)

91 built-in OATF attack scenarios embedded at compile time. Scenarios are sourced from the OATF scenarios git submodule and validated with oatf::load() during the build. Supports listing, detail display, fuzzy name matching, and YAML export.

Observability (observability/)

  • Logging - tracing-subscriber with human and JSON formatters
  • Metrics - Prometheus counters, gauges, and histograms
  • Events - structured JSONL event stream for post-run analysis

Data flow (multi-actor orchestration)

The key flow:

  1. Orchestrator parses the OATF document and spawns one ActorRunner per actor
  2. Each ActorRunner creates a PhaseLoop with the appropriate protocol driver
  3. PhaseLoop runs a tokio::select! between driver execution and event consumption
  4. Drivers emit ProtocolEvents; PhaseLoop evaluates triggers, captures extractors, advances phases
  5. ExtractorStore enables cross-actor communication via watch channels
  6. On completion, the Verdict pipeline evaluates indicators against the merged trace

Concurrency model

  • Async runtime: Tokio multi-threaded runtime
  • Phase state: AtomicU64 for the phase index within PhaseEngine
  • Extractor publication: tokio::sync::watch channel - PhaseLoop publishes, drivers consume
  • Cross-actor state: DashMap-based ExtractorStore for shared extractor values
  • Trace: SharedTrace with append-only semantics
  • Shutdown: Cooperative via CancellationToken (tokio-util)

Key design decisions

PhaseLoop owns the event loop: Drivers only handle protocol I/O. All state management (trace append, extractor capture, trigger evaluation, phase advancement) lives in PhaseLoop. This keeps drivers simple and testable.

SDK delegates, ThoughtJack orchestrates: The oatf-rs SDK handles document parsing, template interpolation, trigger evaluation, and extractor capture. ThoughtJack handles protocol transport, concurrency, and attack execution.

Lazy generator evaluation: Generators create lightweight factory objects at config load time. Actual payload bytes are produced at response time. This keeps startup fast and memory usage predictable.

Embedded scenarios: Built-in scenarios are compiled into the binary via include_str!. This eliminates runtime file I/O and makes the binary self-contained.

Transport abstraction: The Transport trait abstracts stdio and HTTP, so the server runtime doesn't need to know which transport is active. Side effects that are transport-specific (e.g., pipe_deadlock for stdio only) check compatibility at runtime.

Verdict-based exit codes: Exit codes map to verdicts and severity tiers (0 = not exploited, 1 = exploited, 2 = local action, 3 = boundary breach), making CI integration straightforward.

Context mode reuses server actors: In context mode, server actors (MCP server, A2A server) run their normal PhaseLoop/PhaseDriver machinery. The only difference is the transport layer - channel-based handles instead of network sockets. This means rug pulls, phased attacks, and all temporal behaviors work identically in both modes.