Pular para conteúdo

Retrieval (BM25 + semantic)

Retrieval subsystem: intent classification, search, policies.

intent

Improved intent classifier for AI task queries.

Fixes the overlapping-keyword problem from naive implementations by
  1. Using weighted scoring per trigger
  2. Normalizing scores to [0.0, 1.0] (confidence)
  3. Requiring a minimum confidence gap to avoid ties
  4. Context-aware disambiguation: "test" and "fix" are NOT stopwords when they're the primary verb of the query
Intent categories and their retrieval implications
  • implement -> full signatures + docstrings, similar file context
  • debug -> recent changes (git context), error-adjacent symbols
  • refactor -> all references to target symbol, callers/callees
  • test -> module under test, existing test patterns
  • explain -> rich docstrings, class hierarchies
  • review -> file-level overview, recent modifications
  • search -> FTS5 keyword search, broad context

classify

classify(query, min_confidence=0.25)

Classify a natural-language query into an intent.

Returns IntentResult with the winning intent, its confidence, and the list of matched trigger phrases.

When no intent reaches min_confidence, returns intent="search" as the safe default.

classify_many

classify_many(queries)

Classify multiple queries in batch.

policy

Per-task retrieval policies.

Maps intent -> retrieval strategy configuration. Policies control how search results are filtered, ranked, and enriched before being handed to the context builder.

RetrievalPolicy dataclass

RetrievalPolicy(intent, prefer_kinds=list(), include_related=False, include_git_context=False, context_level=2, budget_fraction=0.7, max_symbols=30, include_file_tree=True)

Retrieval parameters for a specific intent.

get_policy

get_policy(intent)

Return the RetrievalPolicy for the given intent.

Falls back to the "search" policy for unknown intents.

all_policies

all_policies()

Return all registered policies.

search

FTS5 + optional semantic search fallback.

Primary path: SQLite FTS5 (always available, zero deps) Semantic path: cosine similarity on stored embeddings (requires numpy/sentence-transformers)

The search module is stateless - all state lives in the DB connection.

search

search(conn, query, limit=20, prefer_exact=True)

Search symbols matching query.

Strategy: 1. Exact name match (high priority) 2. FTS5 full-text search on name + signature + docstring 3. Prefix match fallback if FTS returns nothing

Results are deduplicated and sorted by score descending.

search_by_file

search_by_file(conn, rel_path_fragment, limit=50)

Return symbols from files whose path contains the given fragment.