Pular para conteúdo

Geração de contexto

Context assembly: budget, levels, and builder.

budget

Token budget management with tiktoken (optional) or char-based estimate.

TokenBudget

TokenBudget(total)

Stateful token budget tracker.

Usage

budget = TokenBudget(8000) if budget.fits(chunk): budget.consume(chunk)

consume

consume(text)

Add text to the budget. Returns tokens consumed.

try_consume

try_consume(text)

Consume text if it fits. Returns True on success.

count_tokens

count_tokens(text)

Count tokens in text.

Uses tiktoken cl100k_base when available (accurate for GPT-4/Claude). Falls back to len(text) / 3.5 otherwise.

count_tokens_many

count_tokens_many(texts)

Count total tokens across multiple text chunks.

fits_in_budget

fits_in_budget(text, budget, used=0)

Return True if text fits within the remaining token budget.

builder

Context assembly orchestrator.

Coordinates intent classification -> retrieval policy -> symbol search -> token-budgeted formatting into a single context string.

build_context

build_context(conn, query, token_budget=8000, root=None, force_level=None)

Build a context string for an AI assistant given a natural-language query.

Steps: 1. Classify intent 2. Select retrieval policy 3. Search for relevant symbols 4. Format within token budget 5. Append file tree if policy requests it and budget remains

build_context_for_file

build_context_for_file(conn, rel_path, token_budget=8000, level=2)

Build context for a specific file (used by MCP get_context tool).

levels

L0-L3 context formatters.

Each level produces progressively richer context:

L0 - File tree only: paths and stats L1 - Symbol names + kinds per file (outline) L2 - L1 + signatures + first line of docstring (default) L3 - L2 + full docstrings + git context

Level selection follows the retrieval policy's context_level setting.

format_file_tree

format_file_tree(files, root)

L0: compact file tree grouped by directory.

format_l1

format_l1(symbols)

L1: symbol names and kinds, grouped by file.

format_l2

format_l2(symbols)

L2: signatures + first line of docstring, grouped by file.

format_l3

format_l3(symbols)

L3: full docstrings + git context, grouped by file.

format_by_level

format_by_level(symbols, level, files=None, root=None)

Dispatch to the appropriate level formatter.