Pular para conteúdo

Indexer (tree-sitter)

Indexer subsystem: walk, hash, parse, extract.

coordinator

Main indexer coordinator.

Orchestrates: walk -> hash -> parse -> extract -> git blame -> store.

Features: - Debounce: skip re-index if re-run within the last 2 seconds - Lock file: prevent concurrent writes with .ctx/indexing.lock - Incremental: only re-process files whose SHA-1 has changed - Progress callback for CLI display

run_index

run_index(root, config=None, force=False, progress=None)

Run a full incremental index of the project at root.

Parameters:

Name Type Description Default
root Path

Project root directory.

required
config SemtreeConfig | None

Configuration (loaded from .ctx/semtree.json if None).

None
force bool

Re-index all files even if SHA-1 matches.

False
progress ProgressCallback | None

Optional callback(rel_path, current, total) for progress reporting.

None

Returns:

Type Description
IndexStats

IndexStats with counts and timing.

docstrings

Multi-language docstring extraction.

Handles

Python - first string literal in function/class body JS/TS - JSDoc blocks (/* ... /) immediately above declarations Go - // comment blocks above func declarations Rust - /// doc comments above items

extract_python_docstring

extract_python_docstring(node)

Extract docstring from a Python function or class tree-sitter node.

Looks for the first expression_statement containing a string literal in the function/class body.

extract_jsdoc_from_lines

extract_jsdoc_from_lines(source_lines, decl_line)

Find the JSDoc block /* ... / immediately before decl_line.

Scans backwards from decl_line to find a closing / and then the matching /* opener.

extract_go_doc_from_lines

extract_go_doc_from_lines(source_lines, decl_line)

Extract the // comment block immediately preceding a Go func declaration.

Consecutive // lines ending just before decl_line (allowing blank lines to break the block) form the doc comment.

extract_rust_doc_from_lines

extract_rust_doc_from_lines(source_lines, decl_line)

Extract /// doc comments immediately preceding a Rust item declaration.

Also handles //! module-level doc comments when decl_line is 0.

extract_python_docstring_regex

extract_python_docstring_regex(source, func_line)

Fallback: regex extraction of Python docstrings when tree-sitter is unavailable.

extractor

Symbol extraction: tree-sitter primary path, regex fallback.

Produces a list of symbol dicts suitable for store.replace_file_symbols().

extract_symbols

extract_symbols(path, source, language)

Extract symbols from source code.

Tries tree-sitter first. Falls back to regex if unavailable. Returns a list of dicts with keys: name, kind, line_start, line_end, signature, docstring

gitblame

Git blame / log integration for per-symbol last-modified metadata.

Provides author name and ISO date for a line range in a file. Gracefully no-ops when git is unavailable or the file is not tracked.

blame_line

blame_line(repo_root, rel_path, line)

Return (author, iso_date) for the given line in the file.

Returns ("", "") when git blame fails or git is not available.

annotate_symbols

annotate_symbols(symbols, repo_root, rel_path, enabled=True)

Add git_author and git_date to each symbol dict in-place.

When enabled=False or git is unavailable, leaves fields as empty strings. Only fetches blame for the first line of each symbol to keep it fast.

hasher

SHA-1 incremental hashing for change detection.

sha1_file

sha1_file(path)

Return the SHA-1 hex digest of a file's contents.

Reads in 64 KB chunks to handle large files without loading them entirely into memory.

sha1_text

sha1_text(text)

Return the SHA-1 hex digest of a UTF-8 string.

is_changed

is_changed(path, stored_sha1)

Return True when the file on disk differs from the stored hash.

parser

Tree-sitter parser pool.

Caches one parser instance per language to avoid repeated Language object construction overhead on large codebases.

Falls back gracefully when tree-sitter or a language grammar is not installed.

get_parser

get_parser(language)

Return a cached tree-sitter Parser for the given language, or None.

Thread-safe. Returns None when tree-sitter or the grammar is unavailable.

parse_source

parse_source(language, source)

Parse source code and return the tree-sitter Tree, or None.

available_languages

available_languages()

Return language ids for which a tree-sitter grammar is installed.

walker

File walker with .gitignore and semtree exclude support.

walk_project

walk_project(root, include_extensions, exclude_dirs, max_file_size_kb=512, use_gitignore=True)

Yield absolute paths of indexable source files under root.

Respects .gitignore patterns and explicit exclude_dirs. Files larger than max_file_size_kb are silently skipped.

detect_language

detect_language(path)

Map file extension to a language identifier.