Release Checklist
This checklist gates every release of cursor-agent.
Gates are ordered. A failure at any gate blocks everything below it. Search Correctness is a mandatory gate – this is a code-search product. Operational correctness (exit codes, routing) is necessary but not sufficient.
Gate 1 – Build
| # | Check | How to verify | Status |
|---|---|---|---|
| 1.1 | Clean build | cmake --build build/ 2>&1 \| grep -c error → 0 |
☐ |
| 1.2 | Self-test passes | cursor-agent --self-test → 8/8 passed |
☐ |
| 1.3 | Doctor passes | cursor-agent --doctor → 0 failed |
☐ |
| 1.4 | Benchmark suite passes | cursor-agent --benchmark → 14/14 |
☐ |
Gate 2 – Execution
| # | Check | How to verify | Status |
|---|---|---|---|
| 2.1 | Commands execute correctly | audit_runner.py all exit 0 |
☐ |
| 2.2 | Tool routing verified | --self-test direct command routing 9/9 |
☐ |
| 2.3 | Exit codes verified | --capabilities exits 0; --version exits 0 |
☐ |
| 2.4 | Checkpoint contamination absent | Search results from include/ and src/ only |
☐ |
| 2.5 | cmake / ctest routing correct | cursor-agent "build the project" no stderr |
☐ |
Gate 3 – Search Correctness (mandatory product gate)
Definition: A query is semantically correct if and only if the engine returns
the canonical declaration, definition, or use site – not a fixture, document, or
forward-declaration header – for the queried symbol or intent.
Exit code 0 does not satisfy this gate.
| # | Check | Target | Status |
|---|---|---|---|
| 3.1 | Search correctness audit run | All queries in benchmark set audited | ☐ |
| 3.2 | No Critical semantic failures | 0 queries with goal_type: General Chat for named-symbol inputs |
☐ |
| 3.3 | Declaration queries resolve to source | find struct/class X → include/ or src/, never scenarios/ or docs/ |
☐ |
| 3.4 | Reference queries resolve to call sites | who calls X / where is X referenced → .cpp caller, not defining .h |
☐ |
| 3.5 | Ownership queries trigger investigation | what owns X / what depends on X → references search, not General Chat |
☐ |
| 3.6 | Git intent queries execute live commands | what is the git diff → git diff, not codebase grep |
☐ |
| 3.7 | Phrase queries tokenize correctly | Multi-word phrases searched as tokens, not underscore-joined symbols | ☐ |
| 3.8 | Grep fallback rate below target | grep_fallback_rate ≤ 20% across benchmark set |
☐ |
| 3.9 | Semantic correctness ≥ 90% | Pass + Partial-correct ≥ 90% of audited queries | ☐ |
Source: docs/telemetry/search_correctness_report.md
Gate 4 – UX
| # | Check | How to verify | Status |
|---|---|---|---|
| 4.1 | Progress timeline visible | Every investigation query shows stage headers before JSON | ☐ |
| 4.2 | Evidence visible | files_examined populated in JSON output |
☐ |
| 4.3 | Insufficient evidence distinguishable | outcome: insufficient_evidence does not report exit 1 |
☐ |
| 4.4 | Errors distinguishable from empty results | Stderr captures tool errors; stdout clean | ☐ |
Gate 5 – Telemetry
| # | Check | How to verify | Status |
|---|---|---|---|
| 5.1 | Dashboard metrics current | cursor-agent --dashboard reflects recent sessions |
☐ |
| 5.2 | Failure topology updated | docs/telemetry/failure_topology.md reflects current failure classes |
☐ |
| 5.3 | Search correctness report current | docs/telemetry/search_correctness_report.md dated within this cycle |
☐ |
| 5.4 | Calibration report reviewed | cursor-agent --calibrate confidence bands reviewed |
☐ |
Decision
| Field | Value |
|---|---|
| Gate 1 – Build | |
| Gate 2 – Execution | |
| Gate 3 – Search Correctness | |
| Gate 4 – UX | |
| Gate 5 – Telemetry | |
| Overall decision | ☐ READY ☐ BLOCKED |
| Blocking items | |
| Release authorized by | |
| Date |
Release Pipeline
Gate 1: Build
↓ (fail → fix build)
Gate 2: Execution
↓ (fail → fix routing / exit codes)
Gate 3: Search Correctness ← primary product gate
↓ (fail → fix classifier / ranking / normalization)
Gate 4: UX
↓ (fail → fix progress / evidence visibility)
Gate 5: Telemetry
↓ (fail → update reports)
Release
A gate may only be skipped with an explicit written waiver documenting:
- which checks are skipped
- why they are not blocking for this specific release
- what telemetry will be monitored post-release
No waivers are valid for Gate 3 items 3.2, 3.3, 3.4, or 3.5. Those are unconditional requirements for a code-search product.