by WonderMr
Provides a semantic router that dynamically loads specialized agent personas, domain‑specific skills, and cognitive reasoning implants to fulfill user queries via any MCP‑compatible client.
The project implements a universal MCP server that acts as a semantic router. It discovers the most suitable agent persona, enriches the prompt with relevant skill chunks and cognitive implants, and returns a ready‑to‑use context for the client. The system is self‑contained: embeddings are generated locally with FastEmbed (ONNX), vector stores are cached on disk, and optional observability is handled through LangFuse.
git clone https://github.com/WonderMr/Agents.git
cd Agents
./scripts/init_repo.sh
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp env.example .env # then add your API keys
.venv/bin/python src/server.py
mcp.json that points to the command above.route_and_load(query), list_agents(), load_implants(...), log_interaction(...), etc., from the client side.main branch.describe_repo, log_interaction, read_history).AGENTS_DEBUG.Q: Do I need an API key for the routing engine? A: No external model API is required for routing; embeddings are generated locally. API keys are only needed for optional services such as LangFuse or Anthropic OCR.
Q: Can the server run on Windows? A: Yes, as long as Python 3.9+ and ONNX Runtime are available. The init script is POSIX‑sh, but the manual setup steps work on Windows PowerShell.
Q: How does auto‑update avoid breaking my local work?
A: It only fast‑forwards the checked‑out main branch when the working tree is clean, never merges or rebases. Failures roll back to the previous commit.
Q: Where are interaction logs stored?
A: In history.md at the repository root, rotated into history/YYYY-MM.md when the file exceeds 512 KB. The file is git‑ignored by default.
Q: How can I add a new agent?
A: Create agents/<agent_name>/system_prompt.mdc with the required front‑matter (identity, routing keywords, etc.). The server discovers it on the next startup.
Universal MCP Server for AI Agent Roles, Skills & Cognitive Implants
A semantic router that dynamically loads specialized agent personas, domain skills, and cognitive reasoning implants based on user queries. Works with any MCP-compatible client (Claude Code, Cursor, Windsurf, and others).
git clone <repository-url>
cd Agents
# Run initialization script
./scripts/init_repo.sh
The script will:
.venv/).env configuration file# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp env.example .env
# Edit .env with your API keys
Create .env file with:
LANGFUSE_PUBLIC_KEY=pk-lf-... # Optional: observability
LANGFUSE_SECRET_KEY=sk-lf-... # Optional: observability
LANGFUSE_HOST=https://cloud.langfuse.com
ANTHROPIC_API_KEY=sk-ant-... # Optional: for document OCR
AGENTS_DEBUG=0 # Set to 1 for JSON debug logging in logs/
Note: Embeddings are handled locally by
fastembed(ONNX Runtime). Model is selected during setup — no external API key is required for core routing.
The server can keep itself current. On startup a daemon thread (non-blocking, so it never delays serving) fast-forwards the install's own git repo and rebuilds the vector stores; the pulled code takes effect on the next start (for per-session stdio servers, the next spawn). The heavy reindex runs in the background of the current session so the next one starts fast.
It is safe by default:
AGENTS_AUTO_UPDATE_BRANCH (default
main) — a no-op on feature branches, so local development is never touched;AGENTS_AUTO_UPDATE=1 # 0 to disable
AGENTS_AUTO_UPDATE_REMOTE=origin
AGENTS_AUTO_UPDATE_BRANCH=main # only updates when this branch is checked out
AGENTS_AUTO_UPDATE_TIMEOUT=30 # seconds per git op
AGENTS_AUTO_UPDATE_INTERVAL=900 # throttle network checks (0 = every start)
AGENTS_AUTO_UPDATE_REINDEX_TIMEOUT=600
Run a manual rebuild any time with python -m src.reindex.
The server exposes MCP tools that any compatible client can call:
| Tool | Purpose |
|---|---|
route_and_load(query) |
Semantic routing — finds the best agent, enriches its prompt with relevant skills & implants |
get_agent_context(agent_name, query) |
Direct agent loading when the target is already known |
load_implants(query|task_type) |
Load cognitive reasoning strategies by semantic query or preset bundle |
list_agents() |
Enumerate all available agents with metadata |
log_interaction(agent_name, query, response_content, intent?, action?, outcome?, files?, tags?) |
End-of-turn logger — appends to history.md (deduped by content hash) and, if configured, sends a Langfuse generation trace |
clear_session_cache() |
Reset session cache |
describe_repo(force_refresh=False) |
One-shot repo bootstrap — writes a structured summary into the managed Repository Memory section of CLAUDE.md |
read_history(limit?, since?, query?) |
Recent entries or lazy semantic recall over the action log |
route_and_load(query) → Single-hop routing via semantic cacheuniversal_agentcontext_hash enables delta optimization on follow-up queriesAgents/
├── agents/ # Agent personas (system prompts, 38 agents)
│ ├── software_engineer/
│ │ └── system_prompt.mdc
│ ├── common/ # Shared agent resources
│ ├── capabilities/ # Capability compositions (registry.yaml)
│ └── schemas/ # Validation schemas
├── skills/ # Reusable knowledge chunks (RAG)
│ └── skill-*.mdc
├── implants/ # Cognitive reasoning strategies (RAG)
│ └── implant-*.mdc
├── src/
│ ├── server.py # MCP Server entrypoint (FastMCP)
│ ├── engine/
│ │ ├── router.py # Semantic routing (cache-first)
│ │ ├── skills.py # Skill retrieval (vector search)
│ │ ├── implants.py # Implant retrieval (vector search)
│ │ ├── config.py # Centralized configuration
│ │ ├── embedder.py # FastEmbed wrapper (ONNX Runtime)
│ │ ├── vector_store.py # NumPy-based vector store
│ │ ├── enrichment.py # Tier-based context enrichment
│ │ ├── capabilities.py # Capability registry resolution
│ │ ├── context.py # Context retrieval (history formatting)
│ │ └── language.py # Language detection
│ └── utils/
│ ├── prompt_loader.py
│ ├── debug_logger.py # Optional JSON debug logging
│ └── langfuse_compat.py # Optional Langfuse layer
├── data/ # Vector store cache (auto-initialized)
├── mcp.json # MCP server configuration
├── pyproject.toml # Python project metadata
└── requirements.txt
| Component | Description |
|---|---|
| Agents | Specialized personas with unique system prompts |
| Skills | Domain-specific knowledge chunks (retrieved via RAG) |
| Implants | Cognitive patterns & reasoning strategies |
| Router | Semantic matching + caching for fast agent selection |
.mcp.json in project root){
"mcpServers": {
"Agents-Core": {
"command": ".venv/bin/python",
"args": ["src/server.py"]
}
}
}
mcp.json in project root){
"mcpServers": {
"Agents-Core": {
"command": ".venv/bin/python",
"args": ["src/server.py"]
}
}
}
source .venv/bin/activate
python src/server.py
# Server communicates via stdin/stdout using MCP protocol
agents/<agent_name>/system_prompt.mdc with frontmatter:---
identity:
name: "my_agent"
display_name: "My Agent"
role: "Expert in X"
tone: "Professional, Clear"
routing:
domain_keywords: ["keyword1", "keyword2"]
trigger_command: "/my_command"
---
# My Agent System Prompt
## Identity
You are an expert in X...
The agent will be auto-discovered by the MCP server on next startup.
Instead of listing skills per agent, you can declare high-level capabilities:
capabilities: [development, dev-security]
The enrichment pipeline resolves capabilities to skill bundles via agents/capabilities/registry.yaml. Available capabilities: critical-analysis, content-structure, development, dense-summary, trust-weighted-research, bio-health, tech-documentation, dev-security, consultative-intake, creative-writing, psychology, 3d-printing, data-investigation, epistemic-analysis, code-review, decision-making, product-thinking, temporal-research, performance-engineering, prompt-design, prompt-security, roblox-development, dev-tools, blender-scripting, health-optimization, consumer-research, visualization, child-psychology.
The server ships with a per-repo memory subsystem so each new Claude session does not have to re-explore the codebase from scratch:
describe_repo — generates a compressed, LLM-consumable repo overview via MCP sampling and writes it into the managed Repository Memory section of CLAUDE.md. Idempotent: re-runs are no-ops unless the repo manifest changes or force_refresh=True.log_interaction — end-of-turn logger. Appends intent / action / outcome entries (with optional files and tags) to history.md at the repo root; deduplicated by content hash; rotated to history/YYYY-MM.md when the file exceeds 512 KB. Also sends a Langfuse generation trace if keys are configured.read_history — returns recent entries by recency/since filter, or runs a lazy semantic search backed by the same NumpyVectorStore used for routing.The full design and step-by-step rationale lives in docs/memory-subsystem-spec.md.
⚠️ Privacy warning —
history.mdcaptures raw prompts and responses. If you paste secrets (API keys, tokens, credentials) into Claude, they will land in this file. It is gitignored by default to keep them out of git history; if you want the action log visible in PRs, removehistory.md/history/from.gitignoreand review entries before pushing.
The framework integrates with LangFuse for tracing:
Configure LangFuse in .env or leave blank for local-only operation.
source .venv/bin/activate
python src/server.py
Enable detailed per-call JSON logging:
AGENTS_DEBUG=1 python src/server.py
Logs are written to logs/{YYYY-MM-DD}/{HH-MM-SS.fff}_{tool}_{direction}.json. Zero overhead when disabled.
MIT
Please log in to share your review and rating for this MCP.
Explore related MCPs that share similar capabilities and solve comparable challenges
by modelcontextprotocol
An MCP server implementation that provides a tool for dynamic and reflective problem-solving through a structured thinking process.
by danny-avila
Provides a self‑hosted ChatGPT‑style interface supporting numerous AI models, agents, code interpreter, image generation, multimodal interactions, and secure multi‑user authentication.
by block
Automates engineering tasks on local machines, executing code, building projects, debugging, orchestrating workflows, and interacting with external APIs using any LLM.
by RooCodeInc
Provides an autonomous AI coding partner inside the editor that can understand natural language, manipulate files, run commands, browse the web, and be customized via modes and instructions.
by pydantic
A Python framework that enables seamless integration of Pydantic validation with large language models, providing type‑safe agent construction, dependency injection, and structured output handling.
by mcp-use
A Python SDK that simplifies interaction with MCP servers and enables developers to create custom agents with tool‑calling capabilities.
by lastmile-ai
Build effective agents using Model Context Protocol and simple, composable workflow patterns.
by Klavis-AI
Provides production‑ready MCP servers and a hosted service for integrating AI applications with over 50 third‑party services via standardized APIs, OAuth, and easy Docker or hosted deployment.
by nanbingxyz
A cross‑platform desktop AI assistant that connects to major LLM providers, supports a local knowledge base, and enables tool integration via MCP servers.