by mberg
Generates spoken audio from text, outputting MP3 files locally and optionally uploading them to Amazon S3.
Kokoro Text to Speech provides a Model Context Protocol (MCP) server that converts supplied text into spoken audio using the Kokoro TTS model. The server produces MP3 files and can store them locally or push them to an S3 bucket for persistent access.
kokoro-v1.0.onnx and voices-v1.0.bin) in the repository root.brew install ffmpeg on macOS.serverConfig below) and setting environment variables in a .env file or directly in the config.uv run mcp-tts.py.mcp_client.py script, e.g., python mcp_client.py --text "Hello, world!".Q: Do I need a GPU to run the server?
A: The provided ONNX model can run on CPU; a GPU will speed up inference but is not required.
Q: How do I change the default voice?
A: Set TTS_VOICE in the .env file or pass --voice when using mcp_client.py.
Q: Can I disable S3 uploads?
A: Yes, set S3_ENABLED=false in the environment or use the client flag --no-s3.
Q: Where are MP3 files stored locally?
A: By default in an mp3 folder next to the script, configurable via MP3_FOLDER.
Q: How are old files cleaned up?
A: Set MP3_RETENTION_DAYS to the number of days after which files are automatically deleted.
Kokoro Text to Speech MCP server that generates .mp3 files with option to upload to S3.
Uses: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
Add the following to your MCP configs. Update with your own values.
"kokoro-tts-mcp": {
"command": "uv",
"args": [
"--directory",
"/path/toyourlocal/kokoro-tts-mcp",
"run",
"mcp-tts.py"
],
"env": {
"TTS_VOICE": "af_heart",
"TTS_SPEED": "1.0",
"TTS_LANGUAGE": "en-us",
"AWS_ACCESS_KEY_ID": "",
"AWS_SECRET_ACCESS_KEY": "",
"AWS_REGION": "us-east-1",
"AWS_S3_FOLDER": "mp3",
"S3_ENABLED": "true",
"MP3_FOLDER": "/path/to/mp3"
}
}
This is needed to convert .wav to .mp3 files
For mac:
brew install ffmpeg
To run locally add these to your .env file. See env.example and copy to .env and modify with your own values.
AWS_ACCESS_KEY_ID: Your AWS access key IDAWS_SECRET_ACCESS_KEY: Your AWS secret access keyAWS_S3_BUCKET_NAME: S3 bucket nameAWS_S3_REGION: S3 region (e.g., us-east-1)AWS_S3_FOLDER: Folder path within the S3 bucketAWS_S3_ENDPOINT_URL: Optional custom endpoint URL for S3-compatible storageMCP_HOST: Host to bind the server to (default: 0.0.0.0)MCP_PORT: Port to listen on (default: 9876)MCP_CLIENT_HOST: Hostname for client connections to the server (default: localhost)DEBUG: Enable debug mode (set to "true" or "1")S3_ENABLED: Enable S3 uploads (set to "true" or "1")MP3_FOLDER: Path to store MP3 files (default is 'mp3' folder in script directory)MP3_RETENTION_DAYS: Number of days to keep MP3 files before automatic deletionDELETE_LOCAL_AFTER_S3_UPLOAD: Whether to delete local MP3 files after successful S3 upload (set to "true" or "1")TTS_VOICE: Default voice for the TTS client (default: af_heart)TTS_SPEED: Default speed for the TTS client (default: 1.0)TTS_LANGUAGE: Default language for the TTS client (default: en-us)Preferred method use UV
uv run mcp-tts.py
The mcp_client.py script allows you to send TTS requests to the server. It can be used as follows:
When running the server and client on the same machine:
0.0.0.0 (all interfaces) or 127.0.0.1 (localhost only)localhost or 127.0.0.1python mcp_client.py --text "Hello, world!"
python mcp_client.py --file my_text.txt
python mcp_client.py --text "Hello, world!" --voice "en_female" --speed 1.2
python mcp_client.py --text "Hello, world!" --no-s3
python mcp_client.py --help
The TTS server generates MP3 files that are stored locally and optionally uploaded to S3. You can configure how these files are managed:
MP3_FOLDER in your .env file to specify where MP3 files are storedMP3_RETENTION_DAYS=30 (or any number) to automatically delete files older than that number of daysDELETE_LOCAL_AFTER_S3_UPLOAD=true to delete local files immediately after successful S3 uploadS3_ENABLED=true or DISABLE_S3=true.env file--no-s3 optionPlease log in to share your review and rating for this MCP.
{
"mcpServers": {
"kokoro-tts-mcp": {
"command": "uv",
"args": [
"--directory",
"/path/toyourlocal/kokoro-tts-mcp",
"run",
"mcp-tts.py"
],
"env": {
"TTS_VOICE": "<YOUR_VOICE>",
"TTS_SPEED": "<YOUR_SPEED>",
"TTS_LANGUAGE": "<YOUR_LANGUAGE>",
"AWS_ACCESS_KEY_ID": "<YOUR_AWS_ACCESS_KEY_ID>",
"AWS_SECRET_ACCESS_KEY": "<YOUR_AWS_SECRET_ACCESS_KEY>",
"AWS_REGION": "<YOUR_AWS_REGION>",
"AWS_S3_FOLDER": "<YOUR_S3_FOLDER>",
"S3_ENABLED": "<true|false>",
"MP3_FOLDER": "<PATH_TO_LOCAL_MP3>"
}
}
}
}claude mcp add kokoro-tts-mcp uv --directory /path/toyourlocal/kokoro-tts-mcp run mcp-tts.pyExplore related MCPs that share similar capabilities and solve comparable challenges
by burningion
Upload, edit, search, and generate videos by leveraging LLM capabilities together with Video Jungle's media library.
by mamertofabian
Generate speech audio from text via ElevenLabs API and manage voice generation tasks through a Model Context Protocol server with a companion SvelteKit web client.
by Flyworks-AI
Create fast, free lip‑sync videos for digital avatars by providing audio or text, with optional avatar generation from images or videos.
by allvoicelab
Generate natural speech, translate and dub videos, clone voices, remove hardcoded subtitles, and extract subtitles using powerful AI APIs.
by nabid-pf
Extracts YouTube video captions, subtitles, and metadata to supply structured information for AI assistants to generate concise video summaries.
by omergocmen
Provides video generation and status checking via the json2video API for seamless integration with LLMs, agents, and other MCP‑compatible clients.
by cartesia-ai
Provides clients such as Cursor, Claude Desktop, and OpenAI agents with capabilities to localize speech, convert text to audio, and infill voice clips via Cartesia's API.
by TSavo
Provides an enterprise‑grade MCP server that exposes 12 AI video generation tools, enabling AI assistants to create avatar videos, URL‑to‑video conversions, short videos, scripts, custom avatars, advanced lip‑sync, and more through natural language interactions.
by netdata
Delivers real‑time, per‑second infrastructure monitoring with zero‑configuration agents, on‑edge machine‑learning anomaly detection, and built‑in dashboards.