documentation

Known CI Test Failures

These 8 cursor-tester scenarios fail consistently in CI. They are environment limitations, not planner/search bugs.

The scenarios are valid E2E integration tests. They fail in CI because the runner lacks external infrastructure (Docker, GitHub API network access, AI model credentials). Fixing them requires provisioning that infrastructure or mocking dependencies, not improving the planner or search engine.

The Level 2 planner backlog is distinct from these – tracked in docs/planner/.

CI-dependent (6)

Scenario Root cause
docker_timeout Requires Docker daemon – not available in CI
failure_without_logs Requires live GitHub workflow JSON – CI runner has no network access to GitHub API beyond checkout
multiple_failed_jobs Same – requires live workflow data
structural_extraction Same – requires live workflow data
truncated_logs Same – requires live workflow data
workflow_cancelled Same – requires live workflow data

Diagnostics / Telemetry (2)

Scenario Root cause
evidence_export Depends on cursor-agent --export-evidence being a complete end-to-end run; CI runner lacks model/AI service configuration
replay_integrity Requires reading and writing trace files with a full execution pipeline; CI runner limitations prevent deterministic replay

Validation pattern

To verify a change introduced no new failures, compare against this baseline:

./build/bin/cursor-tester scenarios/
# Expected: 41 scenarios, 33 passed, 8 failed
# Failed should be exactly the 8 above

CI exit code

The CI job exits with code 8 when these 8 scenarios fail. This is the expected value. A new regression would change the failure count or add new scenario names to the failure list.

toggle portrait / landscape