Reference Search: Implementation Report
Date: 2026-06-26
Phase: Complete (Reference Search capability exposed, validated via 8/8 regressions passing)
1. Problem Statement
Previously, the agent retrieval layer lacked the ability to track symbol usage/calls (Reference Search). Queries asking about caller hierarchies (e.g., "who calls ReplayService" or "where is SessionState used") had to fall back to broad keyword searches (grep) or generic searches (find). This resulted in:
- High token overhead from reading unrelated grep matches.
- Inefficiency: broad search queries could return dozens of matches across many files, exhausting the iteration budget.
- Higher rates of
InsufficientEvidenceoutcomes.
Acceptance Queries & Verification Target
The target queries were:
who calls ReplayServicewhere is CommandRouter referencedwho uses ToolResultwhere is SessionState used
2. Root Cause
- Symbol Search was a refinement, Reference Search was a new capability.
- While
SymbolService::find_referencesexisted in the backend (scanning code occurrences and finding where symbols were referenced), it was never exposed to the agent (ExecutionEngineand the tool router layer).
- While
- Missing classification keywords:
- Classification rules in
classify_goalwere unaware of common referencing verbs (“calls”, “uses”, etc.) without explicit suffix matches, causing them to fall back toGeneralChatwhich ran no tools at all.
- Classification rules in
3. Implementation
3.1 Exposing references Tool to Command Router
We exposed references as a tool call. It runs SymbolService::find_references(dir, symbol).
- It registers caller file paths to
last_find_results. - This enables a subsequent empty-argument
readtool call to read the exact caller files. - Files affected:
src/app/command_router.cpp(and mock updates in diagnostics/tests).
3.2 Query Classification and Execution Engine Integration
- Classification: Updated
classify_goalto map keywordscall,calls,called,use,uses,used,using,reference,references,referencedtoCodebaseQuery. - Query Detection: Added
is_reference_queryusing substring matches to route the query straight to thereferencestool rather than file search (find) orgrep. - Completion Gating: Updated
check_completionfor codebase queries. A reference query is complete whenreferences:resultsandreadare in the evidence facts, or whenreferences:noresultsis verified. - Evidence Gating: Mapped
referencestool success to theEvidenceClass::FileSearchclass. Bypassed required evidence check ifreferencesreturns no results (ensuring correct completion outcomes for non-referenced symbols).
3.3 Telemetry Metrics
Added two telemetry metrics to RetrievalMetrics (include/core/metrics.h, src/services/replay_service.cpp):
reference_tool_hits(int): Number of times thereferencestool was invoked.caller_resolution_rate(double): Set to1.0if a reference query is successfully resolved (check_completionpasses) withreferencestool usage and without falling back to broadgrepcalls. Otherwise,0.0.
4. Verification & Metrics
All 8/8 regression scenarios (including the 4 new regression files) passed.
Before vs. After Analysis
| Query | Pre-Fix Sequence | Pre-Fix Outcome | Post-Fix Sequence | Post-Fix Outcome | caller_resolution_rate |
|---|---|---|---|---|---|
who calls ReplayService |
None (GeneralChat) | Failure / No Tools | references ReplayService → read |
Success | 1.0 |
where is CommandRouter referenced |
find CommandRouter |
Detour (read header) | references CommandRouter → read |
Success | 1.0 |
who uses ToolResult |
None (GeneralChat) | Failure / No Tools | references ToolResult → read |
Success | 1.0 |
where is SessionState used |
find sessionstate |
Detour (read cpp) | references SessionState → read |
Success | 1.0 |
5. Active Constraints
- No AST-indexing, semantic search, or tree-sitter layers were added.
- No LLM-based ranking was introduced.
- Strict retrieval scope was maintained.