Known CI Test Failures

These 8 cursor-tester scenarios fail consistently in CI. They are environment limitations, not planner/search bugs.

The scenarios are valid E2E integration tests. They fail in CI because the runner lacks external infrastructure (Docker, GitHub API network access, AI model credentials). Fixing them requires provisioning that infrastructure or mocking dependencies, not improving the planner or search engine.

The Level 2 planner backlog is distinct from these – tracked in docs/planner/.

CI-dependent (6)

Scenario	Root cause
`docker_timeout`	Requires Docker daemon – not available in CI
`failure_without_logs`	Requires live GitHub workflow JSON – CI runner has no network access to GitHub API beyond checkout
`multiple_failed_jobs`	Same – requires live workflow data
`structural_extraction`	Same – requires live workflow data
`truncated_logs`	Same – requires live workflow data
`workflow_cancelled`	Same – requires live workflow data

Diagnostics / Telemetry (2)

Scenario	Root cause
`evidence_export`	Depends on `cursor-agent --export-evidence` being a complete end-to-end run; CI runner lacks model/AI service configuration
`replay_integrity`	Requires reading and writing trace files with a full execution pipeline; CI runner limitations prevent deterministic replay

Validation pattern

To verify a change introduced no new failures, compare against this baseline:

./build/bin/cursor-tester scenarios/
# Expected: 41 scenarios, 33 passed, 8 failed
# Failed should be exactly the 8 above

CI exit code

The CI job exits with code 8 when these 8 scenarios fail. This is the expected value. A new regression would change the failure count or add new scenario names to the failure list.