Retrieval (BM25 + semantic)¶
Retrieval subsystem: intent classification, search, policies.
intent ¶
Improved intent classifier for AI task queries.
Fixes the overlapping-keyword problem from naive implementations by
- Using weighted scoring per trigger
- Normalizing scores to [0.0, 1.0] (confidence)
- Requiring a minimum confidence gap to avoid ties
- Context-aware disambiguation: "test" and "fix" are NOT stopwords when they're the primary verb of the query
Intent categories and their retrieval implications
- implement -> full signatures + docstrings, similar file context
- debug -> recent changes (git context), error-adjacent symbols
- refactor -> all references to target symbol, callers/callees
- test -> module under test, existing test patterns
- explain -> rich docstrings, class hierarchies
- review -> file-level overview, recent modifications
- search -> FTS5 keyword search, broad context
classify ¶
Classify a natural-language query into an intent.
Returns IntentResult with the winning intent, its confidence, and the list of matched trigger phrases.
When no intent reaches min_confidence, returns intent="search" as the safe default.
policy ¶
Per-task retrieval policies.
Maps intent -> retrieval strategy configuration. Policies control how search results are filtered, ranked, and enriched before being handed to the context builder.
RetrievalPolicy
dataclass
¶
RetrievalPolicy(intent, prefer_kinds=list(), include_related=False, include_git_context=False, context_level=2, budget_fraction=0.7, max_symbols=30, include_file_tree=True)
Retrieval parameters for a specific intent.
get_policy ¶
Return the RetrievalPolicy for the given intent.
Falls back to the "search" policy for unknown intents.
search ¶
FTS5 + optional semantic search fallback.
Primary path: SQLite FTS5 (always available, zero deps) Semantic path: cosine similarity on stored embeddings (requires numpy/sentence-transformers)
The search module is stateless - all state lives in the DB connection.
search ¶
Search symbols matching query.
Strategy: 1. Exact name match (high priority) 2. FTS5 full-text search on name + signature + docstring 3. Prefix match fallback if FTS returns nothing
Results are deduplicated and sorted by score descending.
search_by_file ¶
Return symbols from files whose path contains the given fragment.