Skip to main content

Integrate with CI/CD

ThoughtJack can run in CI/CD pipelines to validate configurations and test agent implementations against attack scenarios.

Validate configs in CI

Add config validation to your pipeline to catch errors before deployment:

.github/workflows/validate.yml
name: Validate ThoughtJack Configs
on: [push, pull_request]

jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install ThoughtJack
run: |
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/thoughtgate/thoughtjack/releases/latest/download/thoughtjack-installer.sh | sh

- name: Validate all configs
run: |
for f in scenarios/*.yaml; do
thoughtjack validate "$f"
done

Machine-readable output

Use --format json for structured output from listing commands:

# JSON scenario listing
thoughtjack scenarios list --format json

JSON log format

When running scenarios in CI, use JSON logs for machine parsing:

thoughtjack run my_config.yaml --log-format json -vv

Each log line is a JSON object:

{"timestamp":"2025-01-15T10:30:00Z","level":"INFO","target":"thoughtjack::engine","message":"Phase transition","phase":"exploit"}

Event file output

Write structured events to a JSONL file for post-run analysis:

thoughtjack run my_config.yaml \
--events-file events.jsonl \
--log-format json

After the run, parse events.jsonl:

# Count phase transitions
jq 'select(.type == "PhaseEntered")' events.jsonl | wc -l

# Get verdict
jq 'select(.type == "VerdictComputed")' events.jsonl

Exit codes

ThoughtJack uses verdict-based exit codes that CI pipelines can check. Exit codes 1–3 indicate exploitation at increasing severity tiers:

CodeNameCI Interpretation
0not_exploitedPass
1exploitedFail (no tier, or Ingested)
2exploited_local_actionFail (LocalAction tier)
3exploited_boundary_breachFail (BoundaryBreach tier)
4partialWarning
5errorUnstable
10Runtime errorInfra failure
64Usage errorInvalid args
130InterruptedSIGINT
143TerminatedSIGTERM
# Context mode (self-contained, no agent needed):
- name: Run scenario
run: |
thoughtjack run test.yaml \
--context --context-provider openai --context-model gpt-4o \
--context-api-key ${{ secrets.OPENAI_API_KEY }} \
-o verdict.json
continue-on-error: false

# Traffic mode requires a running agent as a service in the CI job.
# See the "Test Agent Frameworks" guide for setup.

Context mode in CI

Context mode is self-contained - no agent infrastructure needed. Set API credentials via environment variables:

.github/workflows/context-test.yml
- name: Run context-mode scenario
env:
THOUGHTJACK_CONTEXT_API_KEY: ${{ secrets.OPENAI_API_KEY }}
THOUGHTJACK_CONTEXT_MODEL: gpt-4o
run: |
thoughtjack scenarios run oatf-001 \
--context \
-o verdict.json \
--export-trace trace.jsonl

Use --max-turns to control LLM API cost per run. Use --context-timeout to set a per-request deadline that matches your CI timeout budget.

Prometheus metrics

For long-running test sessions, expose metrics:

thoughtjack run my_config.yaml \
--mcp-server 127.0.0.1:8080 \
--metrics-port 9090

Scrape http://localhost:9090/metrics for request counts, phase transitions, delivery durations, and error rates.