Query Journey Audit: 50 Developer Tasks
Date: 2026-06-28
Method: Static trace through classify_goal() → select_next_tool() → execution loop → user-facing output
Goal: Identify every hesitation point, incorrect classification, and surprising behavior
How to Read This Audit
Each journey records:
| Column |
Meaning |
| Task |
What the developer wants |
| Input |
What they type |
| GoalType |
How it’s classified |
| Tools |
What runs (and how many) |
| Time |
Approximate wall-clock (tool count × typical latency) |
| What user sees |
Exact normal-mode output sequence |
| Friction |
What feels wrong |
| Severity |
Critical / High / Medium / Low |
1. STATUS QUERIES (Working Tree)
The most fragmented intent in the system.
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 01 |
Show modified files |
"show modified files" |
CommitHistory |
1 (git status) |
~100ms |
Investigating... → ✓ complete → answer |
OK |
– |
| 02 |
Check changed files |
"check changed files" |
CommitHistory |
1 (git status) |
~100ms |
Same as above |
OK |
– |
| 03 |
What files changed |
"what files changed" |
CommitHistory |
1 (git status) |
~100ms |
Same as above |
OK |
– |
| 04 |
Show uncommitted changes |
"show me uncommitted changes" |
GeneralChat |
0 |
0ms |
cursor\n\n[AI answers from knowledge, no tools run] |
Wrong answer. No git status run. AI hallucinates. |
Critical |
| 05 |
What files are modified |
"what files are modified" |
CodebaseQuery |
2-3 (find → grep → read) |
~2s |
Investigating... → find/grep/read → "Insufficient evidence" |
Wrong investigation path. Runs code search on a status query. |
Critical |
| 06 |
Did I edit anything |
"did I edit anything" |
GeneralChat |
0 |
0ms |
cursor\n\n[AI general answer] |
No tools run. AI guesses. |
Critical |
| 07 |
Current status |
"git status" |
CommitHistory |
1 (git status) |
~100ms |
Status shown |
OK |
– |
| 08 |
What branch |
"what branch am I on" |
CommitHistory |
1 (git status) |
~100ms |
Branch shown |
OK |
– |
Pattern: Three phrasings of the same intent route to three different GoalTypes, two of which produce wrong results. The keyword list for status queries in CommitHistory is long but has gaps. “Uncommitted changes” and “are modified” fall through.
2. COMMIT HISTORY
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 09 |
Last commit |
"what is the last commit" |
CommitHistory |
2 (git log -10 + git log -1) |
~200ms |
Investigating... → ✓ complete → answer |
OK |
– |
| 10 |
Recent commits |
"show recent commits" |
CommitHistory |
2 |
~200ms |
Same |
OK |
– |
| 11 |
What changed |
"what changed" |
CommitHistory |
1 (git status) |
~100ms |
Status shown (not log!) |
Minor: “what changed” does git status, not git log. Shows working tree diff, not commit history. |
Low |
| 12 |
What changed last week |
"what changed last week" |
CommitHistory |
1 (git status) |
~100ms |
Status shown (ignores “last week”) |
Ignored time qualifier. git status has no date concept. |
Medium |
| 13 |
Commit history |
"commit history" |
CommitHistory |
2 (git log -10 + git log -1) |
~200ms |
Same |
OK |
– |
| 14 |
Check the files we changed |
"check the files we changed" |
CommitHistory |
1 (git status) |
~100ms |
Same |
OK |
– |
Pattern: Lines 11-12 show the limit of keyword matching – “what changed” always means git status, not git log --since=last.week. The planner has no concept of time qualifiers.
3. ARCHITECTURE & DESIGN QUESTIONS
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 15 |
How is this designed |
"how is this agent designed" |
CodebaseOverview |
2 (discovery + read README) |
~3s |
Investigating... → ✓ complete → answer |
OK |
– |
| 16 |
Explain architecture |
"explain the architecture" |
CodebaseOverview |
2 |
~3s |
Same |
OK |
– |
| 17 |
Tell me how repo investigation works |
"tell me how repository investigation works" |
CodebaseQuery |
2-3 (find/grep + read) |
~2s |
Runs code search instead of overview |
Wrong path. Should be CodebaseOverview (high-level). Runs file search on implementation. |
High |
| 18 |
How does the build system work |
"how does the build system work" |
CodebaseQuery |
2-3 (find/grep + read about “build”) |
~2s |
File search on “build” instead of project overview |
Wrong path. Should describe build system, not grep for “build”. |
High |
| 19 |
Review architecture |
"review the architecture" |
ArchitectureReview |
11 (full audit) |
~15-30s |
Investigating... (stalls for 15-30s with no progress change) |
No progress updates during 11-tool audit. User sees frozen terminal. |
Medium |
| 20 |
Review codebase |
"review codebase" |
ArchitectureReview |
11 |
~15-30s |
Same as above |
Same stall problem |
Medium |
| 21 |
Explain this codebase |
"explain this codebase" |
CodebaseQuery |
2-3 |
~2s |
Code search instead of overview |
Wrong path. Should be CodebaseOverview. |
High |
| 22 |
What isthis repository |
"what is this repository" |
CodebaseQuery |
2-3 |
~2s |
Code search instead of overview |
Wrong path. |
High |
Pattern: Queries starting with “tell me how”, “how does the”, “explain this” – ambiguous between architecture explanation and code search. The current classifier sends them to CodebaseQuery. A human would recognize these as overview questions.
4. CODE SEARCH (Working Well)
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 23 |
Find CommandRouter |
"find CommandRouter" |
CodebaseQuery |
2 (find + read) |
~1s |
Investigating... → ✓ complete → answer |
OK |
– |
| 24 |
Grep for Agent |
"grep Agent" |
CodebaseQuery |
2-3 (find+grep+read) |
~2s |
Same |
OK |
– |
| 25 |
Where is ReplayService used |
"where is ReplayService used" |
CodebaseQuery |
2 (find + read) |
~1s |
Same |
OK |
– |
| 26 |
How does auth work (code-level) |
"how does auth work in this project" |
CodebaseQuery |
2-3 |
~2s |
Same |
OK |
– |
| 27 |
Find binary |
"find the cursor binary" |
CodebaseQuery |
2-3 (find fails, grep fallback) |
~2s |
Now works (was a retrieval bug, fixed) |
Resolved |
– |
| 28 |
Read file |
"read file src/main.cpp" |
CodebaseQuery |
1 (read src/main.cpp) |
~100ms |
OK |
OK |
– |
5. CI / GITHUB INVESTIGATION
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 29 |
Why did CI fail |
"why did CI fail" |
CICheck |
1-3 (gh + optional grep/read) |
~3-10s |
Investigating... → ✓ complete → answer |
OK |
– |
| 30 |
Check workflow |
"check my CI workflow" |
CICheck |
1 (gh list) |
~3s |
Same |
OK |
– |
| 31 |
Investigate run URL |
paste of github.com/.../actions/runs/12345 |
GitHubInvestigation |
2 (gh run view + logs) |
~5-15s |
Same |
OK |
– |
| 32 |
Check this log |
"can you check this log https://..." |
GeneralChat |
0 |
0ms |
AI answers without running gh |
Wrong. Should be GitHubInvestigation. The URL is present but “check this log” is a GeneralChat pattern that matches before the URL check. |
Critical |
Pattern: Line 32 is a priority-ordering bug. The GeneralChat exclusion check (Level 8) fires before the GitHubInvestigation URL check (Level 6) if the query starts with general-language patterns. The order matters: URL-containing queries should be checked early regardless of surrounding language.
6. SESSION STATE (Well-Handled)
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 33 |
What model am I on |
"what model am i on" |
SessionState |
0 |
~0ms |
cursor\n\n[state answer] |
OK |
– |
| 34 |
Am I online |
"am i online" |
SessionState |
0 |
~0ms |
OK |
OK |
– |
| 35 |
What provider |
"what provider am i using" |
SessionState |
0 |
~0ms |
OK |
OK |
– |
7. CODE CHANGES (Long Running)
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 36 |
Add a CLI command |
"add a new CLI command" |
CodeChange |
5 (discovery, grep, read, cmake, ctest) |
~30s+ |
Investigating... (no change for 30s) |
No progress updates for 5 tools across ~30s. Terminal appears frozen. |
High |
| 37 |
Fix failing test |
"fix the failing unit test" |
CodeChange |
5 |
~30s+ |
Same |
Same stall problem + after completion, user prompted to apply |
High |
| 38 |
Refactor auth |
"refactor the authentication service" |
CodeChange |
5 |
~30s+ |
Same |
Same stall problem |
High |
| 39 |
Build project |
"build the project" |
CodeChange |
5 |
~30s+ |
Same |
Same |
High |
Pattern: CodeChange is the most tool-heavy path (5 tools) with no progress differentiation. The user sees Investigating... for 30+ seconds. Build and test steps (cmake, ctest) are particularly slow. No feedback on which phase is running.
8. GENERAL CHAT (Correctly Routed)
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 40 |
How are you |
"how are you" |
GeneralChat |
0 |
~0ms |
AI chat answer |
OK |
– |
| 41 |
What can you do |
"what can you do" |
GeneralChat |
0 |
~0ms |
AI chat answer |
OK |
– |
| 42 |
How do I install python |
"how do I install python" |
GeneralChat |
0 |
~0ms |
AI chat answer |
OK |
– |
| 43 |
What is the difference |
"what is the difference between X and Y" |
GeneralChat |
0 |
~0ms |
AI chat answer |
OK (correct) |
– |
| 44 |
Hello |
"hello" |
GeneralChat |
0 |
~0ms |
AI chat answer |
OK |
– |
9. EDGE CASES & COMMAND OVERRIDES
| # |
Task |
Input |
GoalType |
Tools |
Time |
What user sees |
Friction |
Sev |
| 45 |
Explicit git prefix |
"git:status" |
(direct command) |
1 |
~100ms |
Direct output |
Only if user knows git: prefix exists. Not discoverable. |
Medium |
| 46 |
Typo: comit |
"last comit" |
CommitHistory |
2 |
~200ms |
OK (typo in keyword list) |
OK |
– |
| 47 |
Typo: codbease |
"codbease overview" |
CodebaseOverview |
2 |
~3s |
Normalized by command_router before classify_goal() |
OK (if routed through command_router) |
– |
| 48 |
Ambiguous: plan |
"plan the implementation" |
CodeChange |
5 |
~30s+ |
Full investigation |
“Plan” is not in any keyword list. Might miss CodeChange path. |
Medium |
| 49 |
Long query |
200-character multi-sentence question |
CodebaseQuery |
2-3 |
~2s |
Works (keyword matching doesn’t penalize length) |
OK |
– |
| 50 |
Multi-step: find + status |
"show me the last commit and the current branch" |
CommitHistory |
2 (log + status) |
~200ms |
Both shown |
OK (both are CommitHistory) |
– |
Summary of Findings
Friction Count by Severity
| Severity |
Count |
Issues |
| Critical |
4 |
#04, #05, #06 (status queries misclassified) + #32 (URL check bypassed) |
| High |
5 |
#17, #18, #21, #22 (overview queries misclassified) + #36-39 (no progress during long ops) |
| Medium |
5 |
#12 (time qualifier ignored), #19-20 (11-tool audit, no progress), #45 (git: prefix hidden), #48 (plan keyword missing) |
| Low |
1 |
#11 (“what changed” = status not log) |
Classification Failures (Wrong Path)
| Wrong classification |
Count |
Examples |
GeneralChat instead of CommitHistory |
2 |
“show me uncommitted changes”, “did I edit anything” |
CodebaseQuery instead of CommitHistory |
1 |
“what files are modified” |
CodebaseQuery instead of CodebaseOverview |
4 |
“tell me how repository investigation works”, “how does the build system work”, “explain this codebase”, “what is this repository” |
GeneralChat instead of GitHubInvestigation |
1 |
“can you check this log https://…” |
Progress Visibility Failures
| Path |
Tools |
Typical time |
Current feedback |
Gap |
| ArchitectureReview |
11 |
15-30s |
Static Investigating... |
No progress for 11 tools across 30s |
| CodeChange |
5 |
30s+ |
Static Investigating... |
No progress for 5 tools across 30s+ |
| CICheck (with failure) |
2-3 |
5-15s |
Static Investigating... |
No progress during long gh calls |
Terminal Quality Failures
| Issue |
Affected interactions |
Effort to fix |
Impact |
| Ctrl+C kills process |
All interactions |
Low |
Critical: lost state |
| No bracketed paste |
Pasting code/stacktraces |
Low |
Medium: accidental execution |
| No clickable paths |
All file outputs |
Medium |
Medium: navigation friction |
| No resize handling |
Long sessions |
Low |
Low: visual glitch |
Journey Heatmap
Task type: Status Commit Arch Search CI Change Chat
--------------------------------------------------------------------------------
Classification OK? ❌ 3/8 ✅ 6/6 ❌ 2/5 ✅ 6/6 ✅ 3/4 ✅ 4/4 ✅ 5/5
Progress visible? ✅ ✅ ❌ ✅ ❌ ❌ ✅
Ctrl+C safe? ❌ ❌ ❌ ❌ ❌ ❌ ❌
Result useful? ❌ ✅ ✅ ✅ ✅ ✅ ✅
Takeaway: Status queries, architecture requests, and CI/Change operations have the most friction. Ctrl+C safety is a universal gap. ArchitectureReview and CodeChange lack progress feedback during their long execution paths.
Regression Checklist
Every UX change must be verified against these journeys before closing:
| # |
Journey |
What to check |
| 01 |
Normal code question |
Classification correct, tools run, answer useful |
| 02 |
Long investigation (>5s) |
Progress visible, terminal not frozen |
| 03 |
Review mode |
Read-only enforcement visible, prompt reflects mode |
| 04 |
Apply mode |
Confirmation prompt shown, changes applied only on approval |
| 05 |
/inspect |
Full investigation detail shown, no planner metadata leaks |
| 06 |
Shell command (!) |
Command executes, output shown, mode toggles work |
| 07 |
Ctrl+C during investigation |
Terminal restored, prompt redraws, no corruption |
| 08 |
Ctrl+C during answer generation |
Terminal restored, no partial output |
| 09 |
Terminal resize |
Status line width adapts, prompt not displaced |
| 10 |
Paste 100+ lines |
Bracketed paste handled, no accidental execution |
| 11 |
Unicode file paths |
Non-ASCII paths display and navigate correctly |
| 12 |
Windows terminal (conhost) |
ANSI rendering, line editor, menus functional |
| 13 |
Linux terminal (gnome, xterm) |
All features work on common Linux terminals |
| 14 |
macOS terminal (Terminal.app, iTerm2) |
All features work on common macOS terminals |
| 15 |
First launch |
Startup hint shown, disappears after first prompt |
| 16 |
Recovery – low confidence |
Planner attempts recovery, user sees progress, no tool internals |
Each journey must pass before the change is considered complete. If a journey produces unexpected output or terminal corruption, the change is not ready.