Query Journey Audit: 50 Developer Tasks

Date: 2026-06-28
Method: Static trace through classify_goal() → select_next_tool() → execution loop → user-facing output
Goal: Identify every hesitation point, incorrect classification, and surprising behavior

How to Read This Audit

Each journey records:

Column	Meaning
Task	What the developer wants
Input	What they type
GoalType	How it’s classified
Tools	What runs (and how many)
Time	Approximate wall-clock (tool count × typical latency)
What user sees	Exact normal-mode output sequence
Friction	What feels wrong
Severity	Critical / High / Medium / Low

1. STATUS QUERIES (Working Tree)

The most fragmented intent in the system.

#	Task	Input	GoalType	Tools	Time	What user sees	Friction	Sev
01	Show modified files	`"show modified files"`	CommitHistory	1 (`git status`)	~100ms	`Investigating... → ✓ complete → answer`	OK	–
02	Check changed files	`"check changed files"`	CommitHistory	1 (`git status`)	~100ms	Same as above	OK	–
03	What files changed	`"what files changed"`	CommitHistory	1 (`git status`)	~100ms	Same as above	OK	–
04	Show uncommitted changes	`"show me uncommitted changes"`	GeneralChat	0	0ms	`cursor\n\n[AI answers from knowledge, no tools run]`	Wrong answer. No git status run. AI hallucinates.	Critical
05	What files are modified	`"what files are modified"`	CodebaseQuery	2-3 (find → grep → read)	~2s	`Investigating... → find/grep/read → "Insufficient evidence"`	Wrong investigation path. Runs code search on a status query.	Critical
06	Did I edit anything	`"did I edit anything"`	GeneralChat	0	0ms	`cursor\n\n[AI general answer]`	No tools run. AI guesses.	Critical
07	Current status	`"git status"`	CommitHistory	1 (`git status`)	~100ms	Status shown	OK	–
08	What branch	`"what branch am I on"`	CommitHistory	1 (`git status`)	~100ms	Branch shown	OK	–

Pattern: Three phrasings of the same intent route to three different GoalTypes, two of which produce wrong results. The keyword list for status queries in CommitHistory is long but has gaps. “Uncommitted changes” and “are modified” fall through.

2. COMMIT HISTORY

#	Task	Input	GoalType	Tools	Time	What user sees	Friction	Sev
09	Last commit	`"what is the last commit"`	CommitHistory	2 (`git log -10` + `git log -1`)	~200ms	`Investigating... → ✓ complete → answer`	OK	–
10	Recent commits	`"show recent commits"`	CommitHistory	2	~200ms	Same	OK	–
11	What changed	`"what changed"`	CommitHistory	1 (`git status`)	~100ms	Status shown (not log!)	Minor: “what changed” does `git status`, not `git log`. Shows working tree diff, not commit history.	Low
12	What changed last week	`"what changed last week"`	CommitHistory	1 (`git status`)	~100ms	Status shown (ignores “last week”)	Ignored time qualifier. `git status` has no date concept.	Medium
13	Commit history	`"commit history"`	CommitHistory	2 (`git log -10` + `git log -1`)	~200ms	Same	OK	–
14	Check the files we changed	`"check the files we changed"`	CommitHistory	1 (`git status`)	~100ms	Same	OK	–

Pattern: Lines 11-12 show the limit of keyword matching – “what changed” always means git status, not git log --since=last.week. The planner has no concept of time qualifiers.

3. ARCHITECTURE & DESIGN QUESTIONS

#	Task	Input	GoalType	Tools	Time	What user sees	Friction	Sev
15	How is this designed	`"how is this agent designed"`	CodebaseOverview	2 (discovery + read README)	~3s	`Investigating... → ✓ complete → answer`	OK	–
16	Explain architecture	`"explain the architecture"`	CodebaseOverview	2	~3s	Same	OK	–
17	Tell me how repo investigation works	`"tell me how repository investigation works"`	CodebaseQuery	2-3 (find/grep + read)	~2s	Runs code search instead of overview	Wrong path. Should be CodebaseOverview (high-level). Runs file search on implementation.	High
18	How does the build system work	`"how does the build system work"`	CodebaseQuery	2-3 (find/grep + read about “build”)	~2s	File search on “build” instead of project overview	Wrong path. Should describe build system, not grep for “build”.	High
19	Review architecture	`"review the architecture"`	ArchitectureReview	11 (full audit)	~15-30s	`Investigating...` (stalls for 15-30s with no progress change)	No progress updates during 11-tool audit. User sees frozen terminal.	Medium
20	Review codebase	`"review codebase"`	ArchitectureReview	11	~15-30s	Same as above	Same stall problem	Medium
21	Explain this codebase	`"explain this codebase"`	CodebaseQuery	2-3	~2s	Code search instead of overview	Wrong path. Should be CodebaseOverview.	High
22	What isthis repository	`"what is this repository"`	CodebaseQuery	2-3	~2s	Code search instead of overview	Wrong path.	High

Pattern: Queries starting with “tell me how”, “how does the”, “explain this” – ambiguous between architecture explanation and code search. The current classifier sends them to CodebaseQuery. A human would recognize these as overview questions.

4. CODE SEARCH (Working Well)

#	Task	Input	GoalType	Tools	Time	What user sees	Friction	Sev
23	Find CommandRouter	`"find CommandRouter"`	CodebaseQuery	2 (find + read)	~1s	`Investigating... → ✓ complete → answer`	OK	–
24	Grep for Agent	`"grep Agent"`	CodebaseQuery	2-3 (find+grep+read)	~2s	Same	OK	–
25	Where is ReplayService used	`"where is ReplayService used"`	CodebaseQuery	2 (find + read)	~1s	Same	OK	–
26	How does auth work (code-level)	`"how does auth work in this project"`	CodebaseQuery	2-3	~2s	Same	OK	–
27	Find binary	`"find the cursor binary"`	CodebaseQuery	2-3 (find fails, grep fallback)	~2s	Now works (was a retrieval bug, fixed)	Resolved	–
28	Read file	`"read file src/main.cpp"`	CodebaseQuery	1 (`read src/main.cpp`)	~100ms	OK	OK	–

5. CI / GITHUB INVESTIGATION

#	Task	Input	GoalType	Tools	Time	What user sees	Friction	Sev
29	Why did CI fail	`"why did CI fail"`	CICheck	1-3 (gh + optional grep/read)	~3-10s	`Investigating... → ✓ complete → answer`	OK	–
30	Check workflow	`"check my CI workflow"`	CICheck	1 (gh list)	~3s	Same	OK	–
31	Investigate run URL	paste of `github.com/.../actions/runs/12345`	GitHubInvestigation	2 (gh run view + logs)	~5-15s	Same	OK	–
32	Check this log	`"can you check this log https://..."`	GeneralChat	0	0ms	AI answers without running `gh`	Wrong. Should be GitHubInvestigation. The URL is present but “check this log” is a GeneralChat pattern that matches before the URL check.	Critical

Pattern: Line 32 is a priority-ordering bug. The GeneralChat exclusion check (Level 8) fires before the GitHubInvestigation URL check (Level 6) if the query starts with general-language patterns. The order matters: URL-containing queries should be checked early regardless of surrounding language.

6. SESSION STATE (Well-Handled)

#	Task	Input	GoalType	Time	What user sees	Friction	Sev
33	What model am I on	`"what model am i on"`	SessionState	~0ms	`cursor\n\n[state answer]`	OK	–
34	Am I online	`"am i online"`	SessionState	~0ms	OK	OK	–
35	What provider	`"what provider am i using"`	SessionState	~0ms	OK	OK	–

7. CODE CHANGES (Long Running)

#	Task	Input	GoalType	Tools	Time	What user sees	Friction	Sev
36	Add a CLI command	`"add a new CLI command"`	CodeChange	5 (discovery, grep, read, cmake, ctest)	~30s+	`Investigating...` (no change for 30s)	No progress updates for 5 tools across ~30s. Terminal appears frozen.	High
37	Fix failing test	`"fix the failing unit test"`	CodeChange	5	~30s+	Same	Same stall problem + after completion, user prompted to apply	High
38	Refactor auth	`"refactor the authentication service"`	CodeChange	5	~30s+	Same	Same stall problem	High
39	Build project	`"build the project"`	CodeChange	5	~30s+	Same	Same	High

Pattern: CodeChange is the most tool-heavy path (5 tools) with no progress differentiation. The user sees Investigating... for 30+ seconds. Build and test steps (cmake, ctest) are particularly slow. No feedback on which phase is running.

8. GENERAL CHAT (Correctly Routed)

#	Task	Input	GoalType	Time	What user sees	Friction	Sev
40	How are you	`"how are you"`	GeneralChat	~0ms	AI chat answer	OK	–
41	What can you do	`"what can you do"`	GeneralChat	~0ms	AI chat answer	OK	–
42	How do I install python	`"how do I install python"`	GeneralChat	~0ms	AI chat answer	OK	–
43	What is the difference	`"what is the difference between X and Y"`	GeneralChat	~0ms	AI chat answer	OK (correct)	–
44	Hello	`"hello"`	GeneralChat	~0ms	AI chat answer	OK	–

9. EDGE CASES & COMMAND OVERRIDES

#	Task	Input	GoalType	Tools	Time	What user sees	Friction	Sev
45	Explicit git prefix	`"git:status"`	(direct command)	1	~100ms	Direct output	Only if user knows `git:` prefix exists. Not discoverable.	Medium
46	Typo: comit	`"last comit"`	CommitHistory	2	~200ms	OK (typo in keyword list)	OK	–
47	Typo: codbease	`"codbease overview"`	CodebaseOverview	2	~3s	Normalized by command_router before classify_goal()	OK (if routed through command_router)	–
48	Ambiguous: plan	`"plan the implementation"`	CodeChange	5	~30s+	Full investigation	“Plan” is not in any keyword list. Might miss CodeChange path.	Medium
49	Long query	200-character multi-sentence question	CodebaseQuery	2-3	~2s	Works (keyword matching doesn’t penalize length)	OK	–
50	Multi-step: find + status	`"show me the last commit and the current branch"`	CommitHistory	2 (log + status)	~200ms	Both shown	OK (both are CommitHistory)	–

Summary of Findings

Friction Count by Severity

Severity	Count	Issues
Critical	4	#04, #05, #06 (status queries misclassified) + #32 (URL check bypassed)
High	5	#17, #18, #21, #22 (overview queries misclassified) + #36-39 (no progress during long ops)
Medium	5	#12 (time qualifier ignored), #19-20 (11-tool audit, no progress), #45 (git: prefix hidden), #48 (plan keyword missing)
Low	1	#11 (“what changed” = status not log)

Classification Failures (Wrong Path)

Wrong classification	Count	Examples
`GeneralChat` instead of `CommitHistory`	2	“show me uncommitted changes”, “did I edit anything”
`CodebaseQuery` instead of `CommitHistory`	1	“what files are modified”
`CodebaseQuery` instead of `CodebaseOverview`	4	“tell me how repository investigation works”, “how does the build system work”, “explain this codebase”, “what is this repository”
`GeneralChat` instead of `GitHubInvestigation`	1	“can you check this log https://…”

Progress Visibility Failures

Path	Tools	Typical time	Current feedback	Gap
ArchitectureReview	11	15-30s	Static `Investigating...`	No progress for 11 tools across 30s
CodeChange	5	30s+	Static `Investigating...`	No progress for 5 tools across 30s+
CICheck (with failure)	2-3	5-15s	Static `Investigating...`	No progress during long gh calls

Terminal Quality Failures

Issue	Affected interactions	Effort to fix	Impact
Ctrl+C kills process	All interactions	Low	Critical: lost state
No bracketed paste	Pasting code/stacktraces	Low	Medium: accidental execution
No clickable paths	All file outputs	Medium	Medium: navigation friction
No resize handling	Long sessions	Low	Low: visual glitch

Journey Heatmap

Task type:              Status   Commit  Arch  Search  CI    Change  Chat  
--------------------------------------------------------------------------------
Classification OK?      ❌ 3/8   ✅ 6/6  ❌ 2/5  ✅ 6/6  ✅ 3/4  ✅ 4/4  ✅ 5/5
Progress visible?       ✅      ✅     ❌     ✅     ❌     ❌     ✅
Ctrl+C safe?            ❌      ❌     ❌     ❌     ❌     ❌     ❌
Result useful?          ❌      ✅     ✅     ✅     ✅     ✅     ✅

Takeaway: Status queries, architecture requests, and CI/Change operations have the most friction. Ctrl+C safety is a universal gap. ArchitectureReview and CodeChange lack progress feedback during their long execution paths.

Regression Checklist

Every UX change must be verified against these journeys before closing:

#	Journey	What to check
01	Normal code question	Classification correct, tools run, answer useful
02	Long investigation (>5s)	Progress visible, terminal not frozen
03	Review mode	Read-only enforcement visible, prompt reflects mode
04	Apply mode	Confirmation prompt shown, changes applied only on approval
05	`/inspect`	Full investigation detail shown, no planner metadata leaks
06	Shell command (`!`)	Command executes, output shown, mode toggles work
07	Ctrl+C during investigation	Terminal restored, prompt redraws, no corruption
08	Ctrl+C during answer generation	Terminal restored, no partial output
09	Terminal resize	Status line width adapts, prompt not displaced
10	Paste 100+ lines	Bracketed paste handled, no accidental execution
11	Unicode file paths	Non-ASCII paths display and navigate correctly
12	Windows terminal (conhost)	ANSI rendering, line editor, menus functional
13	Linux terminal (gnome, xterm)	All features work on common Linux terminals
14	macOS terminal (Terminal.app, iTerm2)	All features work on common macOS terminals
15	First launch	Startup hint shown, disappears after first prompt
16	Recovery – low confidence	Planner attempts recovery, user sees progress, no tool internals

Each journey must pass before the change is considered complete. If a journey produces unexpected output or terminal corruption, the change is not ready.