voicemode

Voice Mode

What is Voice Mode?

Voice Mode is a Python project that enables natural voice conversations with AI assistants like Claude and ChatGPT. It facilitates human-like voice interactions through the Model Context Protocol (MCP), allowing users to speak their prompts and hear responses in real-time. It supports various AI coding assistants and offers flexibility with both cloud and local speech-to-text (STT) and text-to-speech (TTS) services.

How to Use Voice Mode?

Installation: Install via uvx voice-mode, pip install voice-mode, or npm install -g @anthropic-ai/claude-code followed by uvx voice-mode.
Configuration: Set your OpenAI API key using export OPENAI_API_KEY="your-key". For local services, you can set STT_BASE_URL and TTS_BASE_URL.
Integration: Configure your preferred AI coding assistant (e.g., Claude Code, Cursor, VS Code) to use Voice Mode via MCP. This typically involves adding a configuration snippet to the tool's settings.
Usage: Use the converse tool for voice conversations. For example, claude converse to start a conversation. You can also use listen_for_speech to convert speech to text.

Key Features of Voice Mode

🎙️ Voice Conversations: Engage in spoken dialogues with AI assistants.
🔄 Multiple Transports: Supports local microphone or LiveKit room-based communication.
🗣️ OpenAI-compatible: Works with any STT/TTS service, local or cloud.
⚡ Real-time: Low-latency voice interactions with automatic transport selection.
🔧 MCP Integration: Seamless integration with Claude Desktop and other MCP clients.
🎯 Silence Detection: Automatically stops recording when speaking stops.
Local STT/TTS: Supports local Whisper.cpp for STT and Kokoro for TTS for privacy.
Audio Format Flexibility: Configurable audio formats (PCM, MP3, WAV, FLAC, AAC, Opus) with quality settings.

Use Cases of Voice Mode

Programming & Development: Debugging code, explaining code, brainstorming architecture, writing tests through voice commands.
General Productivity: Practicing presentations, preparing for interviews, using the AI as a rubber duck for problem-solving.
Voice Control: Interacting with AI assistants using voice for commands like "Read this error message" or "Give me a quick summary."

FAQ about Voice Mode

What are the system requirements? Python 3.10+, Linux, macOS, Windows (WSL), NixOS. Requires a microphone and speakers, or a LiveKit server. An OpenAI API key or compatible service is needed.
How do I set up local STT/TTS? Use the provided installation tools like install_whisper_cpp and install_kokoro_fastapi or follow the documentation for Whisper.cpp and Kokoro setup.
Can I use Voice Mode offline? Yes, by setting up local STT/TTS services like Whisper.cpp and Kokoro.
How does Voice Mode handle audio formats? It defaults to PCM for TTS streaming but supports other formats like MP3, WAV, FLAC, AAC, and Opus, with configurable quality settings.
What AI assistants are supported? Claude Code, Claude Desktop, Gemini CLI, Cursor, VS Code, Roo Code, Cline, Zed, Windsurf, Continue, and others that support MCP.
How do I troubleshoot microphone issues in WSL2? Refer to the WSL2 Microphone Access Guide in the documentation.
How can I save audio files? Set the environment variable VOICEMODE_SAVE_AUDIO="true" to save all audio input and output.

Installation Tools

Voice Mode provides tools to help set up local services:

install_whisper_cpp: Installs Whisper.cpp for local STT.
install_kokoro_fastapi: Installs Kokoro for local TTS.

These tools facilitate the setup of free, private, open-source voice services locally, offering an OpenAI-compatible API interface for seamless switching between cloud and local processing.

Install via: uv tool install voice-mode | getvoicemode.com

Natural voice conversations for AI assistants. VoiceMode brings human-like voice interactions to Claude Code, AI code editors through the Model Context Protocol (MCP).

🖥️ Compatibility

Runs on: Linux • macOS • Windows (WSL) • NixOS | Python: 3.10-3.14

✨ Features

🎙️ Natural Voice Conversations with Claude Code - ask questions and hear responses
🗣️ Supports local Voice Models - works with any OpenAI API compatible STT/TTS services
⚡ Real-time - low-latency voice interactions with automatic transport selection
🔧 MCP Integration - seamless with Claude Code (and other MCP clients)
🎯 Silence detection - automatically stops recording when you stop speaking (no more waiting!)
🔄 Multiple transports - local microphone or LiveKit room-based communication (optional)

🎯 Simple Requirements

All you need to get started:

🎤 Computer with microphone and speakers
🔑 OpenAI API Key (Recommended, if only as a backup for local services)

Quick Start

Install VoiceMode and dependencies with UV (Recommended)

Linux (fedora, debian/ubuntu)
macOS
Windows WSL

# Install VoiceMode MCP python package and dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh 
uvx voice-mode-install

# While local voice services can be installed automatically, we recommend
# providing an OpenAI API key as a fallback in case local services are unavailable
export OPENAI_API_KEY=your-openai-key  # Optional but recommended

# Add VoiceMode to Claude
claude mcp add --scope user voicemode -- uvx --refresh voice-mode

# Start a voice conversation
claude converse

Manual Installation

For manual setup steps, see the Getting Started Guide.

🎬 Demo

Watch VoiceMode in action with Claude Code:

The converse function makes voice interactions natural - it automatically waits for your response by default, creating a real conversation flow.

Installation

Prerequisites

Python 3.10-3.14
Astral UV - Package manager (install with curl -LsSf https://astral.sh/uv/install.sh | sh)
OpenAI API Key (or compatible service)

Note on LiveKit: LiveKit integration is optional and requires Python 3.10-3.13 (Python 3.14 support pending upstream dependencies). Install with: uv tool install voice-mode[livekit]. See LiveKit Integration Guide for details.

System Dependencies

sudo apt update
sudo apt install -y ffmpeg gcc libasound2-dev libasound2-plugins libportaudio2 portaudio19-dev pulseaudio pulseaudio-utils python3-dev

Note for WSL2 users: WSL2 requires additional audio packages (pulseaudio, libasound2-plugins) for microphone access.

sudo dnf install alsa-lib-devel ffmpeg gcc portaudio portaudio-devel python3-devel

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install dependencies
brew install ffmpeg node portaudio

Follow the Ubuntu/Debian instructions above within WSL.

VoiceMode includes a flake.nix with all required dependencies. You can either:

Use the development shell (temporary):

nix develop github:mbailey/voicemode

Install system-wide (see Installation section below)

Quick Install

# Using Claude Code (recommended)
claude mcp add --scope user voicemode uvx --refresh voice-mode

Configuration for AI Coding Assistants

📖 Looking for detailed setup instructions? Check our comprehensive Getting Started Guide for step-by-step instructions!

Below are quick configuration snippets. For full installation and setup instructions, see the integration guides above.

claude mcp add --scope user voicemode -- uvx --refresh voice-mode

Or with environment variables:

claude mcp add --scope user --env OPENAI_API_KEY=your-openai-key voicemode -- uvx --refresh voice-mode

Alternative Installation Options

git clone https://github.com/mbailey/voicemode.git
cd voicemode
uv tool install -e .

1. Install with nix profile (user-wide):

nix profile install github:mbailey/voicemode

2. Add to NixOS configuration (system-wide):

# In /etc/nixos/configuration.nix
environment.systemPackages = [
  (builtins.getFlake "github:mbailey/voicemode").packages.${pkgs.system}.default
];

3. Add to home-manager:

# In home-manager configuration
home.packages = [
  (builtins.getFlake "github:mbailey/voicemode").packages.${pkgs.system}.default
];

4. Run without installing:

nix run github:mbailey/voicemode

Configuration

📖 Getting Started - Step-by-step setup guide
🔧 Configuration Reference - All environment variables

Quick Setup

The only required configuration is your OpenAI API key:

export OPENAI_API_KEY="your-key"

Local STT/TTS Services

For privacy-focused or offline usage, VoiceMode supports local speech services:

Whisper.cpp - Local speech-to-text with OpenAI-compatible API
Kokoro - Local text-to-speech with multiple voice options

These services provide the same API interface as OpenAI, allowing seamless switching between cloud and local processing.

Troubleshooting

Common Issues

No microphone access: Check system permissions for terminal/application
- WSL2 Users: Additional audio packages (pulseaudio, libasound2-plugins) required for microphone access
UV not found: Install with curl -LsSf https://astral.sh/uv/install.sh | sh
OpenAI API error: Verify your OPENAI_API_KEY is set correctly
No audio output: Check system audio settings and available devices

Audio Saving

To save all audio files (both TTS output and STT input):

export VOICEMODE_SAVE_AUDIO=true

Audio files are saved to: ~/.voicemode/audio/YYYY/MM/ with timestamps in the filename.

Documentation

📚 Read the full documentation at voice-mode.readthedocs.io

Getting Started

Getting Started - Step-by-step setup for all supported tools
Configuration Guide - Complete environment variable reference

Development

Development Setup - Local development guide

Service Guides

Whisper.cpp Setup - Local speech-to-text configuration
Kokoro Setup - Local text-to-speech configuration
LiveKit Integration - Real-time voice communication

License

MIT - A Failmode Project

mcp-name: com.failmode/voicemode

voicemode

voicemode Overview

Voice Mode

What is Voice Mode?

How to Use Voice Mode?

Key Features of Voice Mode

Use Cases of Voice Mode

FAQ about Voice Mode

Installation Tools

voicemode's README

VoiceMode

🖥️ Compatibility

✨ Features

🎯 Simple Requirements

Quick Start

Install VoiceMode and dependencies with UV (Recommended)

Manual Installation

🎬 Demo

Installation

Prerequisites

System Dependencies

Quick Install

Configuration for AI Coding Assistants

Alternative Installation Options

Configuration

Quick Setup

Local STT/TTS Services

Troubleshooting

Common Issues

Audio Saving

Documentation

Getting Started

Development

Service Guides

Links

Community

See Also

License

voicemode Reviews

Login Required

Similar MCP Servers like voicemode

Sequential Thinking

LibreChat

Goose

Roo Code

Pydantic AI

Mcp Use

Mcp Agent

Klavis AI

5ire

Actions

voicemode's Information

Configuration

Claude Code (Terminal)

Configure Clients