by ViperJuice
Provides fast, local‑first code indexing and search across dozens of programming languages, enabling LLMs and developers to retrieve symbols, definitions, and code snippets with sub‑100 ms latency.
Code Index MCP is a modular, extensible code indexer that runs entirely on the developer's machine. It builds searchable SQLite/FTS5 indexes for up to 48 languages, supports real‑time file‑system updates, and optionally adds AI‑powered semantic search via Voyage AI embeddings. The service speaks the Model Context Protocol, allowing tools like Claude Code to query symbols, definitions, and references instantly.
pip install code-index-mcp # or pip install "code-index-mcp[dev]" for testing tools
.mcp.json configuration (the provided setup scripts generate the appropriate file for your environment).mcp-index index rebuild # scans the workspace and creates `.indexes/` files
uvicorn mcp_server.gateway:app --host 0.0.0.0 --port 8000
curl -X POST http://localhost:8000/search -H "Content-Type: application/json" -d '{"query": "def parse"}'
VOYAGE_AI_API_KEY in a .env file or the environment and enable it in the server config..gitignore and secret files when sharing indexes.PluginBase interface and register it in the dispatcher.VOYAGE_AI_API_KEY in the .env file and restart the server; new semantic queries will use the new key..indexes/ directory if needed.Modular, extensible local-first code indexer designed to enhance Claude Code and other LLMs with deep code understanding capabilities. Built on the Model Context Protocol (MCP) for seamless integration with AI assistants.
Version: 1.0.0 (MVP Release) Core Features: Stable - Local indexing, symbol/text search, 48-language support Optional Features: Semantic search (requires Voyage AI), Index sync (beta) Performance: Sub-100ms queries, <10s indexing for cached repositories
New to Code-Index-MCP? Check out our Getting Started Guide for a quick walkthrough.
.indexes/ (relative to MCP server)The Code-Index-MCP follows a modular, plugin-based architecture designed for extensibility and performance:
🌐 System Context (Level 1)
📦 Container Architecture (Level 2)
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ API Gateway │────▶│ Dispatcher │────▶│ Plugins │
│ (FastAPI) │ │ │ │ (Language) │
└─────────────────┘ └──────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐
│ Local Index │ │ File Watcher │ │ Embedding │
│ (SQLite+FTS5) │ │ (Watchdog) │ │ Service │
└─────────────────┘ └──────────────┘ └─────────────┘
🔧 Component Details (Level 3)
The project follows a clean, organized structure. See docs/PROJECT_STRUCTURE.md for detailed layout.
Key directories:
mcp_server/ - Core MCP server implementationscripts/ - Development and utility scriptstests/ - Comprehensive test suite with fixturesdocs/ - Documentation and guidesarchitecture/ - System design and diagramsdocker/ - Docker configurations and compose filesdata/ - Database files and indexeslogs/ - Application and test logsreports/ - Generated performance reports and analysisanalysis_archive/ - Historical analysis and archived researchProduction-Ready Features:
Language Categories:
| Category | Languages | Features |
|---|---|---|
| Dedicated Plugins | Python, JavaScript, TypeScript, C, C++, Dart, HTML/CSS | Enhanced analysis, framework support |
| Systems Languages | Go, Rust, C, C++, Zig, Nim, D, V | Memory safety, performance analysis |
| JVM Languages | Java, Kotlin, Scala, Clojure | Package analysis, build tool integration |
| Web Technologies | JavaScript, TypeScript, HTML, CSS, SCSS, PHP | Framework detection, bundler support |
| Scripting Languages | Python, Ruby, Perl, Lua, R, Julia | Dynamic typing, REPL integration |
| Functional Languages | Haskell, Elixir, Erlang, F#, OCaml | Pattern matching, type inference |
| Mobile Development | Swift, Kotlin, Dart, Objective-C | Platform-specific APIs |
| Infrastructure | Dockerfile, Bash, PowerShell, Makefile, CMake | Build automation, CI/CD |
| Data Formats | JSON, YAML, TOML, XML, GraphQL, SQL | Schema validation, query optimization |
| Documentation | Markdown, LaTeX, reStructuredText | Cross-references, formatting |
Implementation Status: Production-Ready - All languages supported via the enhanced dispatcher with:
# Auto-configures MCP for your environment
./scripts/setup-mcp-json.sh
# Or interactive mode
./scripts/setup-mcp-json.sh --interactive
This automatically detects your environment and creates the appropriate .mcp.json configuration.
# Install MCP Index with Docker
curl -sSL https://raw.githubusercontent.com/ViperJuice/Code-Index-MCP/main/scripts/install-mcp-docker.sh | bash
# Index your current directory
docker run -it -v $(pwd):/workspace ghcr.io/code-index-mcp/mcp-index:minimal
# Set your API key (get one at https://voyageai.com)
export VOYAGE_AI_API_KEY=your-key
# Run with semantic search
docker run -it -v $(pwd):/workspace -e VOYAGE_AI_API_KEY ghcr.io/code-index-mcp/mcp-index:standard
# PowerShell
.\scripts\setup-mcp-json.ps1
# Or manually with Docker Desktop
docker run -it -v ${PWD}:/workspace ghcr.io/code-index-mcp/mcp-index:minimal
# Install Docker Desktop or use Homebrew
brew install --cask docker
# Run setup
./scripts/setup-mcp-json.sh
# Install Docker (no Desktop needed)
curl -fsSL https://get.docker.com | sh
# Run setup
./scripts/setup-mcp-json.sh
# With Docker Desktop integration
./scripts/setup-mcp-json.sh # Auto-detects WSL+Docker
# Without Docker Desktop
cp .mcp.json.templates/native.json .mcp.json
pip install -e .
# For VS Code/Cursor dev containers
# Option 1: Use native Python (already in container)
cp .mcp.json.templates/native.json .mcp.json
# Option 2: Use Docker sidecar (avoids dependency conflicts)
docker-compose -f docker/compose/development/docker-compose.mcp-sidecar.yml up -d
cp .mcp.json.templates/docker-sidecar.json .mcp.json
The setup script creates the appropriate .mcp.json for your environment. Manual examples:
{
"mcpServers": {
"code-index-native": {
"command": "python",
"args": ["scripts/cli/mcp_server_cli.py"],
"cwd": "${workspace}"
}
}
}
{
"mcpServers": {
"code-index-docker": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"-v", "${workspace}:/workspace",
"ghcr.io/code-index-mcp/mcp-index:minimal"
]
}
}
}
| Feature | Minimal | Standard | Full | Cost |
|---|---|---|---|---|
| Code Search | ✅ | ✅ | ✅ | Free |
| 48 Languages | ✅ | ✅ | ✅ | Free |
| Semantic Search | ❌ | ✅ | ✅ | ~$0.05/1M tokens |
| GitHub Sync | ❌ | ✅ | ✅ | Free |
| Monitoring | ❌ | ❌ | ✅ | Free |
# Install the package
pip install code-index-mcp
# Or install with dev tools for testing
pip install code-index-mcp[dev]
# Clone the repository
git clone https://github.com/ViperJuice/Code-Index-MCP.git
cd Code-Index-MCP
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in editable mode
pip install -e .
# Build index for your project (from project root)
mcp-index index rebuild
# Check index status
mcp-index index status
# Start the API server
uvicorn mcp_server.gateway:app --host 0.0.0.0 --port 8000
# Test the API
curl http://localhost:8000/status
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "def parse"}'
Create a .env file for configuration:
# Optional: Voyage AI for semantic search
VOYAGE_AI_API_KEY=your_api_key_here
# Server settings
MCP_SERVER_HOST=0.0.0.0
MCP_SERVER_PORT=8000
MCP_LOG_LEVEL=INFO
# Workspace settings
MCP_WORKSPACE_ROOT=.
MCP_MAX_FILE_SIZE=10485760 # 10MB
# GitHub Artifact Sync (privacy settings)
MCP_ARTIFACT_SYNC=false # Set to true to enable
AUTO_UPLOAD=false # Auto-upload on changes
AUTO_DOWNLOAD=true # Auto-download on clone
Control how your code index is shared:
// .mcp-index.json
{
"github_artifacts": {
"enabled": false, // Disable sync entirely
"auto_upload": false, // Manual upload only
"auto_download": true, // Still get team indexes
"exclude_patterns": [ // Additional exclusions
"internal/*",
"proprietary/*"
]
}
}
Privacy Features:
The system includes multiple reranking strategies to improve search relevance:
# Configure reranking in your searches
from mcp_server.indexer.reranker import RerankConfig, TFIDFReranker
config = RerankConfig(
enabled=True,
reranker=TFIDFReranker(), # Or CohereReranker(), CrossEncoderReranker()
top_k=20
)
# Search with reranking
results = await search_engine.search(query, rerank_config=config)
Available Rerankers:
Prevent accidental sharing of sensitive files:
# Analyze current index for security issues
python scripts/utilities/analyze_gitignore_security.py
# Create secure index export (filters gitignored files)
python scripts/utilities/secure_index_export.py
# The secure export will:
# - Exclude all gitignored files
# - Remove sensitive patterns (*.env, *.key, etc.)
# - Create audit logs of excluded files
Combines traditional full-text search with semantic search:
# The system automatically uses hybrid search when available
# Configure weights in settings:
HYBRID_SEARCH_BM25_WEIGHT=0.3
HYBRID_SEARCH_SEMANTIC_WEIGHT=0.5
HYBRID_SEARCH_FUZZY_WEIGHT=0.2
The enhanced dispatcher includes timeout protection and automatic fallback:
from mcp_server.dispatcher.dispatcher_enhanced import EnhancedDispatcher
from mcp_server.storage.sqlite_store import SQLiteStore
store = SQLiteStore(".indexes/YOUR_REPO_ID/current.db")
dispatcher = EnhancedDispatcher(
sqlite_store=store,
semantic_search_enabled=True, # Enable if Qdrant available
lazy_load=True, # Load plugins on-demand
use_plugin_factory=True # Use dynamic plugin loading
)
# Search with automatic optimization
results = list(dispatcher.search("your query", limit=10))
For maximum performance with BM25-only search:
from mcp_server.dispatcher.simple_dispatcher import create_simple_dispatcher
# Ultra-fast BM25 search without plugin overhead
dispatcher = create_simple_dispatcher(".indexes/YOUR_REPO_ID/current.db")
results = list(dispatcher.search("your query", limit=10))
Configure dispatcher behavior via environment variables:
# Dispatcher settings
MCP_DISPATCHER_TIMEOUT=5 # Plugin loading timeout (seconds)
MCP_USE_SIMPLE_DISPATCHER=false # Use simple dispatcher
MCP_PLUGIN_LAZY_LOAD=true # Load plugins on-demand
# Performance tuning
MCP_BM25_BYPASS_ENABLED=true # Enable direct BM25 bypass
MCP_MAX_PLUGIN_MEMORY=1024 # Max memory for plugins (MB)
All indexes are now stored centrally at .indexes/ (relative to the MCP project) for better organization and to prevent accidental commits:
.indexes/
├── {repo_hash}/ # Unique hash for each repository
│ ├── main_abc123.db # Index for main branch at commit abc123
│ ├── main_abc123.metadata.json
│ └── current.db -> main_abc123.db # Symlink to active index
├── qdrant/ # Semantic search embeddings
│ └── main.qdrant/ # Centralized Qdrant database
Benefits:
Migration: For existing repositories with local indexes:
python scripts/move_indexes_to_central.py
This project uses GitHub Actions Artifacts for efficient index sharing, eliminating reindexing time while keeping the repository lean.
# First time setup - pull latest indexes
python scripts/cli/mcp_cli.py artifact pull --latest
# After making changes - rebuild locally
python scripts/cli/mcp_cli.py index rebuild
# Share your indexes with the team
python scripts/cli/mcp_cli.py artifact push
# Check sync status
python scripts/cli/mcp_cli.py artifact sync
# Optional: Install git hooks for automatic sync
mcp-index hooks install
# Now indexes upload automatically on git push
# and download automatically on git pull
Enable portable index management in any repository with zero GitHub compute costs:
# One-line install
curl -sSL https://raw.githubusercontent.com/ViperJuice/Code-Index-MCP/main/scripts/install-mcp.sh | bash
# Or via npm
npm install -g mcp-index-kit
mcp-index init
Zero-Cost Architecture:
Portable Design:
Usage:
# Initialize in your repo
cd your-repo
mcp-index init
# Build index locally
mcp-index build
# Push to GitHub Artifacts
mcp-index push
# Pull latest index
mcp-index pull
# Auto sync
mcp-index sync
Semantic Search Configuration
To enable semantic search capabilities, you need a Voyage AI API key. Get one from https://www.voyageai.com/.
Method 1: Claude Code Configuration (Recommended)
Create or edit .mcp.json in your project root:
{
"mcpServers": {
"code-index-mcp": {
"command": "uvicorn",
"args": ["mcp_server.gateway:app", "--host", "0.0.0.0", "--port", "8000"],
"env": {
"VOYAGE_AI_API_KEY": "your-voyage-ai-api-key-here",
"SEMANTIC_SEARCH_ENABLED": "true"
}
}
}
}
Method 2: Claude Code CLI
claude mcp add code-index-mcp -e VOYAGE_AI_API_KEY=your_key -e SEMANTIC_SEARCH_ENABLED=true -- uvicorn mcp_server.gateway:app
Method 3: Environment Variables
export VOYAGE_AI_API_KEY=your_key
export SEMANTIC_SEARCH_ENABLED=true
Method 4: .env File
Create a .env file in your project root:
VOYAGE_AI_API_KEY=your_key
SEMANTIC_SEARCH_ENABLED=true
Check Configuration
Verify your semantic search setup:
python scripts/cli/mcp_cli.py index check-semantic
Index Configuration
Edit .mcp-index.json in your repository:
{
"enabled": true,
"auto_download": true,
"artifact_retention_days": 30,
"github_artifacts": {
"enabled": true,
"max_size_mb": 100
}
}
See mcp-index-kit for full documentation
python scripts/cli/mcp_cli.py artifact info 12345
#### Index Management
```bash
# Check index status
python scripts/cli/mcp_cli.py index status
# Check compatibility
python scripts/cli/mcp_cli.py index check-compatibility
# Rebuild indexes locally
python scripts/cli/mcp_cli.py index rebuild
# Create backup
python scripts/cli/mcp_cli.py index backup my_backup
# Restore from backup
python scripts/cli/mcp_cli.py index restore my_backup
Clone Repository
git clone https://github.com/yourusername/Code-Index-MCP.git
cd Code-Index-MCP
Get Latest Indexes
python scripts/cli/mcp_cli.py artifact pull --latest
Make Your Changes
Share Updates
# Your indexes are already updated locally
python scripts/cli/mcp_cli.py artifact push
The system tracks embedding model versions to ensure compatibility:
voyage-code-3 (1024 dimensions)If you use a different embedding model, the system will detect incompatibility and rebuild locally with your configuration.
Create plugin structure
mkdir -p mcp_server/plugins/my_language_plugin
cd mcp_server/plugins/my_language_plugin
touch __init__.py plugin.py
Implement the plugin interface
from mcp_server.plugin_base import PluginBase
class MyLanguagePlugin(PluginBase):
def __init__(self):
self.tree_sitter_language = "my_language"
def index(self, file_path: str) -> Dict:
# Parse and index the file
pass
def getDefinition(self, symbol: str, context: Dict) -> Dict:
# Find symbol definition
pass
def getReferences(self, symbol: str, context: Dict) -> List[Dict]:
# Find symbol references
pass
Register the plugin
# In dispatcher.py
from .plugins.my_language_plugin import MyLanguagePlugin
self.plugins['my_language'] = MyLanguagePlugin()
# Run all tests
pytest
# Run specific test
pytest test_python_plugin.py
# Run with coverage
pytest --cov=mcp_server --cov-report=html
# View C4 architecture diagrams
docker run --rm -p 8080:8080 \
-v "$(pwd)/architecture":/usr/local/structurizr \
structurizr/lite
# Open http://localhost:8080 in your browser
GET /symbolGet symbol definition
GET /symbol?symbol_name=parseFile&file_path=/path/to/file.py
Query parameters:
symbol_name (required): Name of the symbol to findfile_path (optional): Specific file to search inGET /searchSearch for code patterns
GET /search?query=async+def.*parse&file_extensions=.py,.js
Query parameters:
query (required): Search pattern (regex supported)file_extensions (optional): Comma-separated list of extensionsAll API responses follow a consistent JSON structure:
Success Response:
{
"status": "success",
"data": { ... },
"timestamp": "2024-01-01T00:00:00Z"
}
Error Response:
{
"status": "error",
"error": "Error message",
"code": "ERROR_CODE",
"timestamp": "2024-01-01T00:00:00Z"
}
The project includes multiple Docker configurations for different environments:
Development (Default):
# Uses docker-compose.yml + Dockerfile
docker-compose up -d
# - SQLite database
# - Uvicorn development server
# - Volume mounts for code changes
# - Debug logging enabled
Production:
# Uses docker-compose.production.yml + Dockerfile.production
docker-compose -f docker-compose.production.yml up -d
# - PostgreSQL database
# - Gunicorn + Uvicorn workers
# - Multi-stage optimized builds
# - Security hardening (non-root user)
# - Production logging
Enhanced Development:
# Uses both compose files with development overrides
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d
# - Development base + enhanced debugging
# - Source code volume mounting
# - Read-write code access
Important: By default, docker-compose restart uses the DEVELOPMENT configuration:
docker-compose restart → Uses docker-compose.yml (Development)docker-compose -f docker-compose.production.yml restart → Uses ProductionFor production environments, we provide:
k8s/ directorySee our Deployment Guide for detailed instructions including:
For quick setup, download pre-built indexes from our GitHub releases:
# List available releases
python scripts/download-release.py --list
# Download latest release
python scripts/download-release.py --latest
# Download specific version
python scripts/download-release.py --tag v2024.01.15 --output ./my-index
Maintainers can create new releases with pre-built indexes:
# Create a new release (as draft)
python scripts/create-release.py --version 1.0.0
# Create and publish immediately
python scripts/create-release.py --version 1.0.0 --publish
The project includes Git hooks for automatic index synchronization:
Install hooks with: mcp-index hooks install
We welcome contributions! Please see our Contributing Guide for details.
git checkout -b feature/amazing-feature)| Operation | Performance Target | Current Status |
|---|---|---|
| Symbol Lookup | <100ms (p95) | ✅ Achieved - All queries < 100ms |
| Code Search | <500ms (p95) | ✅ Achieved - BM25 search < 50ms |
| File Indexing | 10K files/min | ✅ Achieved - 152K files indexed |
The system follows C4 model architecture patterns:
For detailed architectural documentation, see the architecture/ directory.
See ROADMAP.md for detailed development plans and current progress.
Current Status: v1.0.0 MVP Release
Recent Improvements:
mcp-index command for index managementPerformance optimization features are implemented and available:
INDEXING_BATCH_SIZE environment variableINDEXING_MAX_FILE_SIZE environment variableINDEXING_MAX_WORKERSThis project is licensed under the MIT License - see the LICENSE file for details.
Please log in to share your review and rating for this MCP.
Explore related MCPs that share similar capabilities and solve comparable challenges
by modelcontextprotocol
A Model Context Protocol server for Git repository interaction and automation.
by zed-industries
A high‑performance, multiplayer code editor designed for speed and collaboration.
by modelcontextprotocol
Model Context Protocol Servers
by modelcontextprotocol
A Model Context Protocol server that provides time and timezone conversion capabilities.
by cline
An autonomous coding assistant that can create and edit files, execute terminal commands, and interact with a browser directly from your IDE, operating step‑by‑step with explicit user permission.
by upstash
Provides up-to-date, version‑specific library documentation and code examples directly inside LLM prompts, eliminating outdated information and hallucinated APIs.
by daytonaio
Provides a secure, elastic infrastructure that creates isolated sandboxes for running AI‑generated code with sub‑90 ms startup, unlimited persistence, and OCI/Docker compatibility.
by continuedev
Enables faster shipping of code by integrating continuous AI agents across IDEs, terminals, and CI pipelines, offering chat, edit, autocomplete, and customizable agent workflows.
by github
Connects AI tools directly to GitHub, enabling natural‑language interactions for repository browsing, issue and pull‑request management, CI/CD monitoring, code‑security analysis, and team collaboration.