by renl
Provides an API to store and retrieve text passages based on semantic similarity, leveraging Ollama for embeddings and ChromaDB for vector storage and similarity search.
Memory Server offers a simple local RAG (Retrieval‑Augmented Generation) service that lets you memorize arbitrary text – single sentences, multiple entries, or full PDF documents – and later retrieve the most relevant fragments through semantic similarity.
docker‑compose up
). This launches ChromaDB and Ollama.docker exec -it ollama ollama pull all-minilm:l6-v2
).serverConfig
).memorize_text
for a single passage.memorize_multiple_texts
for a list.memorize_pdf_file
to ingest PDFs in 20‑page chunks.http://localhost:8322
for browsing and managing the vector store.Q: Do I need an internet connection?
A: Only for the initial ollama pull
of the embedding model. After that the service runs entirely offline.
Q: Which embedding model is used?
A: all-minilm:l6-v2
from Ollama, a lightweight sentence‑embedding model.
Q: Can I change the storage port?
A: Yes, adjust CHROMADB_PORT
and OLLAMA_PORT
in the server configuration.
Q: How large a PDF can be processed? A: Any size; the tool processes it in 20‑page increments, looping until the end.
Q: Is there a way to delete memorized texts? A: Use the ChromaDB admin GUI or invoke ChromaDB’s collection deletion APIs.
This MCP server provides a simple API for storing and retrieving text passages based on their semantic meaning, not just keywords. It uses Ollama for generating text embeddings and ChromaDB for vector storage and similarity search. You can "memorize" any text and later retrieve the most relevant stored texts for a given query.
You can simply ask the LLM to memorize a text for you in natural language:
User: Memorize this text: "Singapore is an island country in Southeast Asia."
LLM: Text memorized successfully.
You can also ask the LLM to memorize several texts at once:
User: Memorize these texts:
LLM: All texts memorized successfully.
This will store all provided texts for later semantic retrieval.
You can also ask the LLM to memorize the contents of a PDF file via memorize_pdf_file
. The MCP tool will read up to 20 pages at a time from the PDF, return the extracted text, and have the LLM chunk it into meaningful segments. The LLM then uses the memorize_multiple_texts
tool to store these chunks.
This process is repeated: the MCP tool continues to read the next 20 pages, the LLM chunks and memorizes them, and so on, until the entire PDF is processed and memorized.
User:
Memorize this PDF file: C:\path\to\document.pdf
LLM: Reads the first 20 pages, chunks the text, stores the chunks, and continues with the next 20 pages until the whole document is memorized.
You can also specify a starting page if you want to begin from a specific page:
MCP to LLM:
Memorize this PDF file starting from page 40: C:\path\to\document.pdf
LLM: Reads pages 40–59, chunks and stores the text, then continues with the next set of pages until the end of the document.
If you have a long text, you can ask the LLM to help you split it into short, meaningful chunks and store them. For example:
User: Please chunk the following long text and memorize all the chunks.
{large body of text}
LLM:
Splits the text into short, relevant segments and calls memorize_multiple_texts
to store them. If the text is too long to store in one go, the LLM will continue chunking and storing until the entire text is memorized.
User: Are all the text chunks stored?
LLM: Checks and, if not all are stored, continues until the process is complete.
This conversational approach ensures that even very large texts are fully chunked and memorized, with the LLM handling the process interactively.
To recall information, just ask the LLM a question:
User: What is Singapore?
LLM: Returns the most relevant stored texts along with a human-readable description of their relevance.
First, clone this git repository and change into the cloned directory:
git clone <repository-url>
cd mcp-rag-local
Install uv (a fast Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | sh
If you are on Windows, install uv using PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Run the following command to start ChromaDB and Ollama using Docker Compose:
docker-compose up
After the containers are running, pull the embedding model for Ollama:
docker exec -it ollama ollama pull all-minilm:l6-v2
Add the following to your MCP server configuration:
"mcp-rag-local": {
"command": "uv",
"args": [
"--directory",
"path\\to\\mcp-rag-local",
"run",
"main.py"
],
"env": {
"CHROMADB_PORT": "8321",
"OLLAMA_PORT": "11434"
}
}
A web-based GUI for ChromaDB(Memory Server's db) is included for easy inspection and management of stored memory.
Please log in to share your review and rating for this MCP.
{ "mcpServers": { "mcp-rag-local": { "command": "uv", "args": [ "--directory", "path\\to\\mcp-rag-local", "run", "main.py" ], "env": { "CHROMADB_PORT": "8321", "OLLAMA_PORT": "11434" } } } }
Explore related MCPs that share similar capabilities and solve comparable challenges
by modelcontextprotocol
A basic implementation of persistent memory using a local knowledge graph. This lets Claude remember information about the user across chats.
by topoteretes
Provides dynamic memory for AI agents through modular ECL (Extract, Cognify, Load) pipelines, enabling seamless integration with graph and vector stores using minimal code.
by basicmachines-co
Enables persistent, local‑first knowledge management by allowing LLMs to read and write Markdown files during natural conversations, building a traversable knowledge graph that stays under the user’s control.
by smithery-ai
Provides read and search capabilities for Markdown notes in an Obsidian vault for Claude Desktop and other MCP clients.
by chatmcp
Summarize chat messages by querying a local chat database and returning concise overviews.
by dmayboroda
Provides on‑premises conversational retrieval‑augmented generation (RAG) with configurable Docker containers, supporting fully local execution, ChatGPT‑based custom GPTs, and Anthropic Claude integration.
by GreatScottyMac
Provides a project‑specific memory bank that stores decisions, progress, architecture, and custom data, exposing a structured knowledge graph via MCP for AI assistants and IDE tools.
by andrea9293
Provides document management and AI-powered semantic search for storing, retrieving, and querying text, markdown, and PDF files locally without external databases.
by scorzeth
Provides a local MCP server that interfaces with a running Anki instance to retrieve, create, and update flashcards through standard MCP calls.