Modern AI agents often struggle with a fundamental bottleneck: the time required to parse and index massive codebases. Developers frequently face a choice between slow, brute-force grep searches or expensive, latency-heavy API-based embedding services. This week, the release of Semble offers a third path, demonstrating that high-performance code retrieval can be achieved locally without the overhead of heavy transformer operations.

Performance Metrics and Indexing Efficiency

Semble fundamentally changes the retrieval landscape by replacing traditional, resource-intensive indexing with a streamlined, static approach. Compared to the 137M parameter CodeRankEmbed Hybrid model, Semble delivers a 218x increase in indexing speed. This efficiency allows the system to maintain a 99% search quality rate while drastically reducing the physical time an agent spends scanning files. By moving away from the standard grep-and-read methodology, Semble reduces token consumption by approximately 98%, as it retrieves only the precise, relevant code chunks rather than entire files.

In practical terms, the system achieves an average indexing speed of 250ms, with query response times hovering around 1.5ms. Because the architecture is designed to run entirely on a CPU, it eliminates the need for GPU acceleration or external API calls. This local-first execution model removes network latency and recurring API costs, providing a secure, self-contained environment for enterprise codebases where data privacy is non-negotiable.

Static Embeddings and Hybrid Search

The core of Semble’s performance lies in its departure from standard transformer-based forward passes during query time. Instead, it utilizes tree-sitter for logical code chunking, which preserves the structural integrity of functions, classes, and methods. By moving beyond simple line-based or character-based splitting, the system ensures that the semantic boundaries of the code remain intact.

To handle the retrieval logic, Semble employs Model2Vec’s potion-code-16M, a static embedding structure. By mapping tokens to fixed vectors, the system avoids the heavy matrix operations typically required by modern transformer models. This is paired with BM25 lexical matching to ensure that specific variable names and API calls are retrieved with high precision. The results from both the semantic and lexical searches are integrated using Reciprocal Rank Fusion (RRF), which combines the outputs based on ranking rather than requiring complex normalization. This hybrid approach ensures that the agent benefits from both the functional context of the code and the exactness of keyword matching.

Claude Code and MCP Integration

Semble is built to integrate seamlessly into existing developer workflows through support for the Model Context Protocol (MCP). This allows developers to connect Semble to Claude Code, Cursor, Codex, and other MCP-compatible agents without custom middleware. The system automatically caches repositories during a session, significantly reducing the overhead of switching between different tools.

Developers can define agent behavior by inserting search commands directly into configuration files like AGENTS.md or CLAUDE.md:

bash
semble search
semble find-related

The CLI supports both local paths and direct Git URLs, defaulting to the current directory if no path is specified. For teams building custom internal toolchains, Semble provides a Python library that allows for programmatic integration:

python
SembleIndex.from_path
SembleIndex.from_git

By utilizing the `search` and `find_related` methods, enterprises can embed high-speed retrieval directly into their proprietary AI agents. Released under an MIT license, Semble removes the legal and financial barriers to commercial adoption, positioning itself as a foundational component for local, high-performance code analysis.

By prioritizing structural efficiency over raw computational power, Semble provides the necessary speed to keep AI agents within their optimal reasoning loops.