by gefsikatsinelou
Offers a FastAPI‑based metasearch backend that aggregates results from multiple web, academic, and developer providers, normalizes them into a stable JSON schema, and exposes both an HTTP API and an MCP server for LLM agents.
MetaSearchMCP aggregates search results from dozens of providers (Google, DuckDuckGo, Bing, Wikipedia, GitHub, arXiv, finance APIs, etc.), deduplicates and normalizes the data, and delivers a predictable JSON contract that LLM workflows can consume directly. It also implements an MCP server to enable seamless tool calls from agents such as Claude Desktop, Cline, or Continue.
Installation
python scripts/install.pypython scripts/install.py --dev --test --runpython scripts/install.py --mode docker or use the provided Dockerfile/Compose.pip install -e "[dev]" (or uv pip install -e "[dev]").Configuration
.env.example to .env and fill in API keys for enabled providers (e.g., SERPBASE_API_KEY, SERPER_API_KEY, BRAVE_API_KEY).ALLOW_UNSTABLE_PROVIDERS=true to enable direct Google scraping, MAX_RESULTS_PER_PROVIDER, timeout settings, etc.Running the services
python -m metasearchmcp.server (exposes http://localhost:8000).python -m metasearchmcp.broker (or the shortcut metasearchmcp-mcp).docker run --rm -p 8000:8000 --env-file .env metasearchmcp.API examples
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query":"rust async runtime","providers":["duckduckgo","wikipedia"],"params":{"num_results":5,"max_total_results":8}}'
Specific Google route:
curl -X POST http://localhost:8000/search/google \
-H "Content-Type: application/json" \
-d '{"query":"site:github.com rust tokio"}'
engine, results, related_searches, answer_box, and per‑provider metadata.web, academic, code, google, etc.) with any or all matching semantics.num_results per provider, max_total_results overall)./search/google or tag‑filtered searches without dealing with HTML parsing or API quirks.Q: Do I need to obtain API keys for every provider? A: Only for providers that require authentication (e.g., SerpBase, Serper, Brave, GitHub, Alpha Vantage, Finnhub). All other providers work via public HTML or unauthenticated APIs.
Q: What happens if one provider fails or times out?
A: The orchestrator isolates each provider; failures are reported in the errors array and do not abort the whole request.
Q: Can I limit the search to a specific category without listing every provider?
A: Yes, use the tags field (e.g., "tags": ["academic", "knowledge"]). Tag matching can be any (default) or all.
Q: How do I enable direct Google scraping?
A: Set ALLOW_UNSTABLE_PROVIDERS=true in the .env file. Note that this method may be blocked from some datacenter IPs.
Q: Is there a way to see which providers are currently available?
A: Call GET /providers; you can filter by tag and view the provider catalog with descriptions and tag mappings.
Open-source metasearch backend for MCP, AI agents, and LLM workflows.
MetaSearchMCP aggregates results from multiple search providers, normalizes them into a stable JSON schema, and exposes both an HTTP API and an MCP server for agent tooling.
Most search aggregators are designed around browser UX: HTML pages, pagination, and interactive result cards. Agents and LLM workflows need a different contract: predictable JSON, stable field names, partial-failure tolerance, and provider-level execution metadata.
MetaSearchMCP is built for that machine-consumable workflow. It is not a SearXNG clone. The design is centered on search orchestration, normalized contracts, and MCP integration.
web, academic, code, and googleGoogle support now includes a direct scraper provider implemented inside this project.
The implementation direction is based on the same broad strategy used by SearXNG's Google engine: browser-like requests, consent cookie handling, locale-aware query parameters, and HTML result parsing. It is implemented locally in this repository rather than proxying through a SearXNG instance.
Currently supported Google providers:
| Provider | Env var | Notes |
|---|---|---|
| Direct Google | ALLOW_UNSTABLE_PROVIDERS=true |
Primary path; HTML scraping, best effort, may be blocked from datacenter IPs |
| serpbase.dev | SERPBASE_API_KEY |
Pay-per-use; typically cheaper for low-volume usage |
| serper.dev | SERPER_API_KEY |
Includes a free tier, then pay-per-use |
Provider priority for /search/google is now google first, then google_serpbase, then google_serper.
| Provider | Name | Method |
|---|---|---|
| Direct Google | google |
HTML scraping modeled after SearXNG's approach |
| SerpBase | google_serpbase |
Hosted Google SERP API |
| Serper | google_serper |
Hosted Google SERP API |
| Provider | Name | Method |
|---|---|---|
| DuckDuckGo | duckduckgo |
HTML scraping |
| Bing | bing |
RSS feed |
| Yahoo | yahoo |
HTML scraping, best effort |
| Brave | brave |
Official Search API |
| Mwmbl | mwmbl |
Public JSON API |
| Ecosia | ecosia |
HTML scraping |
| Mojeek | mojeek |
HTML scraping |
| Startpage | startpage |
HTML scraping, best effort |
| Qwant | qwant |
Internal JSON API, best effort |
| Yandex | yandex |
HTML scraping, best effort |
| Baidu | baidu |
JSON endpoint, best effort |
| Provider | Name | Method |
|---|---|---|
| Wikipedia | wikipedia |
MediaWiki API |
| Wikidata | wikidata |
Wikidata API |
| Internet Archive | internet_archive |
Advanced Search API |
| Open Library | openlibrary |
Open Library search API |
| Provider | Name | Method |
|---|---|---|
| GitHub | github |
GitHub REST API |
| GitLab | gitlab |
GitLab REST API |
| Stack Overflow | stackoverflow |
Stack Exchange API |
| Hacker News | hackernews |
Algolia HN API |
reddit |
Reddit API | |
| npm | npm |
npm registry API |
| PyPI | pypi |
HTML scraping |
| RubyGems | rubygems |
RubyGems search API |
| crates.io | crates |
crates.io API |
| lib.rs | lib_rs |
HTML scraping |
| Docker Hub | dockerhub |
Docker Hub search API |
| pkg.go.dev | pkg_go_dev |
HTML scraping |
| MetaCPAN | metacpan |
MetaCPAN REST API |
| Provider | Name | Method |
|---|---|---|
| arXiv | arxiv |
Atom API |
| PubMed | pubmed |
NCBI E-utilities |
| Semantic Scholar | semanticscholar |
Graph API |
| CrossRef | crossref |
REST API |
| Provider | Name | Key Required | Free Tier |
|---|---|---|---|
| Yahoo Finance | yahoo_finance |
No | Unofficial endpoint, no key needed |
| Alpha Vantage | alpha_vantage |
ALPHA_VANTAGE_API_KEY |
25 req/day — get key |
| Finnhub | finnhub |
FINNHUB_API_KEY |
60 req/min — get key |
One-command local install:
python scripts/install.py
Install, run tests, and start the HTTP API:
python scripts/install.py --dev --test --run
Deploy with Docker Compose:
python scripts/install.py --mode docker
The installer creates .env from .env.example when .env does not already exist. Existing .env files are kept unless --force-env is passed.
Manual install:
git clone https://github.com/gefsikatsinelou/MetaSearchMCP
cd MetaSearchMCP
pip install -e ".[dev]"
Or with uv:
uv pip install -e ".[dev]"
Copy .env.example to .env and configure any providers you want to enable.
cp .env.example .env
Key settings:
HOST=0.0.0.0
PORT=8000
DEFAULT_TIMEOUT=10
AGGREGATOR_TIMEOUT=15
SERPBASE_API_KEY=
SERPER_API_KEY=
BRAVE_API_KEY=
GITHUB_TOKEN=
STACKEXCHANGE_API_KEY=
REDDIT_CLIENT_ID=
REDDIT_CLIENT_SECRET=
NCBI_API_KEY=
SEMANTIC_SCHOLAR_API_KEY=
ALPHA_VANTAGE_API_KEY=
FINNHUB_API_KEY=
ENABLED_PROVIDERS=
ALLOW_UNSTABLE_PROVIDERS=false
MAX_RESULTS_PER_PROVIDER=10
python -m metasearchmcp.server
# or
metasearchmcp
The API starts on http://localhost:8000.
python -m metasearchmcp.broker
# or
metasearchmcp-mcp
The MCP server communicates over stdio.
docker build -t metasearchmcp .
docker run --rm -p 8000:8000 --env-file .env metasearchmcp
Or with Compose:
docker compose up --build
POST /searchAggregate across all enabled providers or a selected provider subset.
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "rust async runtime",
"providers": ["duckduckgo", "wikipedia"],
"params": {"num_results": 5, "max_total_results": 8, "language": "en"}
}'
You can also narrow providers by tags:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "transformer attention",
"tags": ["academic", "knowledge"],
"params": {"num_results": 5, "max_total_results": 6}
}'
When multiple tags are provided, the default behavior is tag_match="any".
Set tag_match to "all" when you want providers that satisfy every requested tag:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "npm cli argument parser",
"tags": ["code", "packages"],
"tag_match": "all",
"params": {"num_results": 5, "max_total_results": 6}
}'
num_results controls how many results each provider can contribute. max_total_results caps the final merged response after deduplication.
POST /search/googleSearch Google through the configured Google provider chain. If ALLOW_UNSTABLE_PROVIDERS=true, MetaSearchMCP will prefer the direct google provider automatically.
curl -X POST http://localhost:8000/search/google \
-H "Content-Type: application/json" \
-d '{"query": "site:github.com rust tokio"}'
To force the direct Google route explicitly:
curl -X POST http://localhost:8000/search/google \
-H "Content-Type: application/json" \
-d '{"query": "site:github.com rust tokio", "provider": "google"}'
GET /providersReturn the currently available provider catalog.
The response includes provider descriptions and a tag-to-provider index for quick discovery.
You can filter the catalog by tag:
curl "http://localhost:8000/providers?tag=academic&tag=web"
Use tag_match=all to require every tag instead of the default any-match behavior:
curl "http://localhost:8000/providers?tag=code&tag=packages&tag_match=all"
GET /healthSimple health check endpoint. Returns service status, version, provider count, and the current provider name list.
Every aggregated response includes:
enginequeryresultsrelated_searchessuggestionsanswer_boxtiming_msproviderserrorsEvery result item includes:
titleurlsnippetsourcerankproviderpublished_dateextraExample response:
{
"engine": "metasearchmcp",
"query": "rust async runtime",
"results": [
{
"title": "Tokio - An asynchronous Rust runtime",
"url": "https://tokio.rs",
"snippet": "Tokio is an event-driven, non-blocking I/O platform...",
"source": "tokio.rs",
"rank": 1,
"provider": "duckduckgo",
"published_date": null,
"extra": {}
}
],
"related_searches": [],
"suggestions": [],
"answer_box": null,
"timing_ms": 843.2,
"providers": [
{
"name": "duckduckgo",
"success": true,
"result_count": 10,
"latency_ms": 840.1,
"error": null
}
],
"errors": []
}
MetaSearchMCP exposes these MCP tools:
search_websearch_googlesearch_academicsearch_githubcompare_enginessearch_web also accepts optional tags so agents can limit search to categories such as web, academic, code, or google. When multiple tags are present, tag_match="all" requires a provider to satisfy the full set.
All search tools accept max_total_results to keep the final payload compact.
Example Claude Desktop config:
{
"mcpServers": {
"MetaSearchMCP": {
"command": "metasearchmcp-mcp",
"env": {
"ALLOW_UNSTABLE_PROVIDERS": "true",
"SERPBASE_API_KEY": "your_key",
"SERPER_API_KEY": "your_key"
}
}
}
}
pip install -e ".[dev]"
pytest
uvicorn metasearchmcp.server:app --reload
The public package is organized around these modules:
contracts.py: request and response modelscatalog.py: provider discovery and selectionorchestrator.py: concurrent search execution and response assemblymerge.py: URL normalization and deduplicationserver.py: FastAPI entrypointbroker.py: MCP entrypointLegacy module names are kept as compatibility shims for earlier imports.
MIT
Please log in to share your review and rating for this MCP.
Explore related MCPs that share similar capabilities and solve comparable challenges
by exa-labs
Provides real-time web search capabilities to AI assistants via a Model Context Protocol server, enabling safe and controlled access to the Exa AI Search API.
by perplexityai
Enables Claude and other MCP‑compatible applications to perform real‑time web searches through the Perplexity (Sonar) API without leaving the MCP ecosystem.
by MicrosoftDocs
Provides semantic search and fetch capabilities for Microsoft official documentation, returning content in markdown format via a lightweight streamable HTTP transport for AI agents and development tools.
by elastic
Enables natural‑language interaction with Elasticsearch indices via the Model Context Protocol, exposing tools for listing indices, fetching mappings, performing searches, running ES|QL queries, and retrieving shard information.
by graphlit
Enables integration between MCP clients and the Graphlit platform, providing ingestion, extraction, retrieval, and RAG capabilities across a wide range of data sources and connectors.
by ihor-sokoliuk
Provides web search capabilities via the SearXNG API, exposing them through an MCP server for seamless integration with AI agents and tools.
by mamertofabian
Fast cross‑platform file searching leveraging the Everything SDK on Windows, Spotlight on macOS, and locate/plocate on Linux.
by spences10
Provides unified access to multiple search engines, AI response tools, and content processing services through a single Model Context Protocol server.
by cr7258
Provides Elasticsearch and OpenSearch interaction via Model Context Protocol, enabling document search, index management, cluster monitoring, and alias operations.