by pipecat-ai
Provides voice interaction tools for AI agents via the Model Context Protocol, exposing start, listen, speak, and stop capabilities.
Provides a set of voice‑related tools (start, listen, speak, stop) that can be invoked by any MCP‑compatible client. The server itself does not handle microphone or speaker hardware; instead it delegates audio I/O to separate transports such as the Pipecat Playground, Daily WebRTC rooms, or phone providers.
uv):
uv tool install pipecat-ai-mcp-server
For custom transports or screen‑capture, install the extra dependencies, e.g. uv tool install pipecat-ai-mcp-server[daily].pipecat-mcp-server
The server listens at http://localhost:9090/mcp.start, listen, speak, stop) via MCP.agent.py.Q: Do I need any API keys to get started? A: No. The default configuration uses local models, so the server runs without external keys. Keys are only required if you switch to a cloud STT/TTS provider or a telephony service.
Q: How do I enable Daily WebRTC transport?
A: Install the [daily] extra, set DAILY_API_KEY and DAILY_SAMPLE_ROOM_URL environment variables, then launch the server with the -d flag.
Q: Can I use a phone call to talk to the agent?
A: Yes. Run the server with -t <provider> -x <ngrok-subdomain> and provide the provider‑specific environment variables (e.g., TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN for Twilio). Configure your phone number to forward to the ngrok URL.
Q: What is the purpose of the screen‑capture option? A: It streams your desktop or a named window to the audio transport, letting remote participants see what’s on your screen while conversing.
Q: How do I grant the voice tools permission without prompts?
A: Add a client‑specific settings file (e.g., .claude/settings.local.json) that lists mcp__pipecat__* under the allow array.
Pipecat MCP Server gives your AI agents a voice using Pipecat. It should work with any MCP-compatible client:
The Pipecat MCP Server exposes voice-related tools (start, listen, speak, stop) to MCP-compatible clients, but it does not itself provide microphone or speaker access.
Audio input/output is handled by a separate audio transport, such as:
MCP clients like Cursor, Claude Code, and Codex control the agent, but they are not audio devices. To hear or speak, you must also connect via one of the audio transports.
By default, the voice agent uses local models (no API keys required): Faster Whisper for speech-to-text and Kokoro for text-to-speech. The Whisper models are approximately 1.5 GB and are downloaded automatically on the first connection, so the initial startup may take a moment.
uv tool install pipecat-ai-mcp-server
This will install the pipecat-mcp-server tool.
If you want to use different services or modify the Pipecat pipeline somehow, you will need to clone the repository:
git clone https://github.com/pipecat-ai/pipecat-mcp-server.git
and install your local version with:
uv tool install -e /path/to/repo/pipecat-mcp-server
Start the server:
pipecat-mcp-server
This will make the Pipecat MCP Server available at http://localhost:9090/mcp.
For hands-free voice conversations, you will need to auto-approve tool permissions. Otherwise, your agent will prompt for confirmation, which interrupts the conversation flow.
⚠️ Warning: Enabling broad permissions is at your own risk.
The Pipecat skill provides a better voice conversation experience. It asks for verbal confirmation before making changes to files, adding a layer of safety when using broad permissions.
Alternatively, just tell your agent something like Let's have a voice conversation. In this case, the agent won't ask for verbal confirmation before making changes.
Register the MCP server:
claude mcp add pipecat --transport http http://localhost:9090/mcp --scope user
Scope options:
local: Stored in ~/.claude.json, applies only to your projectuser: Stored in ~/.claude.json, applies to all projectsproject: Stored in .mcp.json in your project directoryCreate .claude/settings.local.json in your project directory:
{
"permissions": {
"allow": [
"Bash",
"Read",
"Edit",
"Write",
"WebFetch",
"WebSearch",
"mcp__pipecat__*"
]
}
}
This grants permissions for bash commands, file operations, web fetching and searching, and all Pipecat MCP tools without prompting. See available tools if you need to grant more permissions.
.claude/skills/pipecat/SKILL.md/pipecat.Register the MCP server by editing ~/.cursor/mcp.json:
{
"mcpServers": {
"pipecat": {
"url": "http://localhost:9090/mcp"
}
}
}
Go to the Auto-Run agent settings and configure it to Run Everything.
.claude/skills/pipecat/SKILL.md (Cursor supports the Claude skills location)./pipecat.Register the MCP server:
codex mcp add pipecat --url http://localhost:9090/mcp
If you start codex inside a version controlled project, you will be asked if you allow Codex to work on the folder without approval. Say Yes, which adds the following to ~/.codex/config.toml.
[projects."/path/to/your/project"]
trust_level = "trusted"
.codex/skills/pipecat/SKILL.md.$pipecat.Once the voice agent starts, you can connect using different methods depending on how the server is configured.
When no arguments are specified to the pipecat-mcp-server command, the server uses Pipecat's local playground. Connect by opening http://localhost:7860 in your browser.
You can also run an ngrok tunnel that you can connect to remotely:
ngrok http --url=your-proxy.ngrok.app 7860
You can also use Daily and access your agent through a Daily room, which is convenient because you can then access from anywhere without tunnels.
First, install the server with the Daily dependency:
uv tool install pipecat-ai-mcp-server[daily]
Then, set the DAILY_API_KEY environment variable to your Daily API key and DAILY_SAMPLE_ROOM_URL to your desired Daily room URL and pass the -d argument to pipecat-mcp-server.
export DAILY_API_KEY=your-daily-api-key
export DAILY_SAMPLE_ROOM_URL=your-daily-room
pipecat-mcp-server -d
Connect by opening your Daily room URL (e.g., https://yourdomain.daily.co/room) in your browser. Daily Prebuilt provides a ready-to-use video/audio interface.
To connect via phone call, pass -t <provider> -x <your-proxy> where <provider> is one of twilio, telnyx, exotel, or plivo, and <your-proxy> is your ngrok tunnel domain (e.g., your-proxy.ngrok.app).
First, start your ngrok tunnel:
ngrok http --url=your-proxy.ngrok.app 7860
Then, run the Pipecat MCP server with your ngrok URL and the required environment variables for your chosen telephony provider.
| Provider | Environment variables |
|---|---|
| Twilio | TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN |
| Telnyx | TELNYX_API_KEY |
| Exotel | EXOTEL_API_KEY, EXOTEL_API_TOKEN |
| Plivo | PLIVO_AUTH_ID, PLIVO_AUTH_TOKEN |
export TWILIO_ACCOUNT_SID=your-twilio-account-sid
export TWILIO_AUTH_TOKEN=your-twilio-auth-token
pipecat-mcp-server -t twilio -x your-proxy.ngrok.app
Configure your provider's phone number to point to your ngrok URL, then call your number to connect.
You can enable screen capture to stream your screen (or a specific window) to the Pipecat Playground or Daily room. This lets you see what's happening on your computer remotely while having a voice conversation with the agent.
First, install the server with the screen capture dependency:
uv tool install "pipecat-ai-mcp-server[screen]"
Then, define the following environment variables:
| Variable | Description |
|---|---|
PIPECAT_MCP_SERVER_SCREEN_CAPTURE |
Set to any value (e.g., 1) to enable screen capture |
PIPECAT_MCP_SERVER_SCREEN_WINDOW |
Optional. Window name to capture (partial match, case-insensitive) |
For example, to capture your entire primary monitor:
export PIPECAT_MCP_SERVER_SCREEN_CAPTURE=1
pipecat-mcp-server
And to capture a specific window:
export PIPECAT_MCP_SERVER_SCREEN_CAPTURE=1
export PIPECAT_MCP_SERVER_SCREEN_WINDOW="claude"
pipecat-mcp-server
ℹ️ Note: Window capture is based on window coordinates, not content. If another window overlaps the target, the overlapping content will be captured. The capture region updates dynamically if the window is moved. If the specified window is not found, capture falls back to the full screen.
agent.py to use different STT/TTS providersPlease log in to share your review and rating for this MCP.
Explore related MCPs that share similar capabilities and solve comparable challenges
by modelcontextprotocol
An MCP server implementation that provides a tool for dynamic and reflective problem-solving through a structured thinking process.
by danny-avila
Provides a self‑hosted ChatGPT‑style interface supporting numerous AI models, agents, code interpreter, image generation, multimodal interactions, and secure multi‑user authentication.
by block
Automates engineering tasks on local machines, executing code, building projects, debugging, orchestrating workflows, and interacting with external APIs using any LLM.
by RooCodeInc
Provides an autonomous AI coding partner inside the editor that can understand natural language, manipulate files, run commands, browse the web, and be customized via modes and instructions.
by pydantic
A Python framework that enables seamless integration of Pydantic validation with large language models, providing type‑safe agent construction, dependency injection, and structured output handling.
by mcp-use
A Python SDK that simplifies interaction with MCP servers and enables developers to create custom agents with tool‑calling capabilities.
by lastmile-ai
Build effective agents using Model Context Protocol and simple, composable workflow patterns.
by Klavis-AI
Provides production‑ready MCP servers and a hosted service for integrating AI applications with over 50 third‑party services via standardized APIs, OAuth, and easy Docker or hosted deployment.
by nanbingxyz
A cross‑platform desktop AI assistant that connects to major LLM providers, supports a local knowledge base, and enables tool integration via MCP servers.