Indexer (tree-sitter)¶
Indexer subsystem: walk, hash, parse, extract.
coordinator ¶
Main indexer coordinator.
Orchestrates: walk -> hash -> parse -> extract -> git blame -> store.
Features: - Debounce: skip re-index if re-run within the last 2 seconds - Lock file: prevent concurrent writes with .ctx/indexing.lock - Incremental: only re-process files whose SHA-1 has changed - Progress callback for CLI display
run_index ¶
Run a full incremental index of the project at root.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
Path
|
Project root directory. |
required |
config
|
SemtreeConfig | None
|
Configuration (loaded from .ctx/semtree.json if None). |
None
|
force
|
bool
|
Re-index all files even if SHA-1 matches. |
False
|
progress
|
ProgressCallback | None
|
Optional callback(rel_path, current, total) for progress reporting. |
None
|
Returns:
| Type | Description |
|---|---|
IndexStats
|
IndexStats with counts and timing. |
docstrings ¶
Multi-language docstring extraction.
Handles
Python - first string literal in function/class body JS/TS - JSDoc blocks (/* ... /) immediately above declarations Go - // comment blocks above func declarations Rust - /// doc comments above items
extract_python_docstring ¶
Extract docstring from a Python function or class tree-sitter node.
Looks for the first expression_statement containing a string literal in the function/class body.
extract_jsdoc_from_lines ¶
Find the JSDoc block /* ... / immediately before decl_line.
Scans backwards from decl_line to find a closing / and then the matching /* opener.
extract_go_doc_from_lines ¶
Extract the // comment block immediately preceding a Go func declaration.
Consecutive // lines ending just before decl_line (allowing blank lines to break the block) form the doc comment.
extract_rust_doc_from_lines ¶
Extract /// doc comments immediately preceding a Rust item declaration.
Also handles //! module-level doc comments when decl_line is 0.
extract_python_docstring_regex ¶
Fallback: regex extraction of Python docstrings when tree-sitter is unavailable.
extractor ¶
Symbol extraction: tree-sitter primary path, regex fallback.
Produces a list of symbol dicts suitable for store.replace_file_symbols().
extract_symbols ¶
Extract symbols from source code.
Tries tree-sitter first. Falls back to regex if unavailable. Returns a list of dicts with keys: name, kind, line_start, line_end, signature, docstring
gitblame ¶
Git blame / log integration for per-symbol last-modified metadata.
Provides author name and ISO date for a line range in a file. Gracefully no-ops when git is unavailable or the file is not tracked.
blame_line ¶
Return (author, iso_date) for the given line in the file.
Returns ("", "") when git blame fails or git is not available.
annotate_symbols ¶
Add git_author and git_date to each symbol dict in-place.
When enabled=False or git is unavailable, leaves fields as empty strings. Only fetches blame for the first line of each symbol to keep it fast.
hasher ¶
parser ¶
Tree-sitter parser pool.
Caches one parser instance per language to avoid repeated Language object construction overhead on large codebases.
Falls back gracefully when tree-sitter or a language grammar is not installed.
get_parser ¶
Return a cached tree-sitter Parser for the given language, or None.
Thread-safe. Returns None when tree-sitter or the grammar is unavailable.
parse_source ¶
Parse source code and return the tree-sitter Tree, or None.
available_languages ¶
Return language ids for which a tree-sitter grammar is installed.