by nmbrthirteen
Generate vertical podcast clips with AI‑driven face tracking, burned‑in captions, and a fully integrated content workflow that stays on‑device.
Podcli transforms long‑form podcast recordings into short, 9:16 vertical clips ready for TikTok, Shorts, or Reels. It transcribes audio with Whisper, uses Claude (or Codex) to surface viral moments, applies face‑tracking cropping, adds styled captions, and outputs polished MP4s. The tool also bundles a content workflow (PodStack) that creates titles, descriptions, thumbnails, and publishing checklists.
podcli → Open Web UI → drag‑and‑drop a video, add or generate a transcript, let the engine suggest clips, tweak settings, and export.podcli process episode.mp4 performs transcription, moment selection, rendering, and export in one command.podcli presets save myshow …) and reuse them with --preset.podcli mcp install) so Claude can invoke tools like suggest_clips, create_clip, manage_assets, etc./produce-shorts, /generate-titles, /publish-checklist.Q: Do I need an internet connection? A: All processing (transcription, face detection, rendering) runs locally. Only the Claude/Codex AI calls for clip suggestion and content generation require internet access.
Q: Which operating systems are supported? A: macOS (Apple Silicon), Linux (x64/arm64), and Windows (x64). Intel‑Mac support is planned.
Q: How much hardware do I need?
A: A modern CPU can handle the full pipeline; GPU with NVENC/VAAPI speeds up encoding. Whisper model size is configurable via WHISPER_MODEL.
Q: Can I use my own transcript?
A: Yes – provide a .txt, .srt, .vtt, or speaker‑labeled plain‑text file with --transcript.
Q: Is the software open‑source? A: Yes, licensed under AGPL‑3.0. A commercial license is available for closed‑source use.
podcli process episode.mp4
One command transcribes, picks the best moments, crops to the face, and burns captions in. Nothing leaves your machine.
podcli takes a long-form podcast and turns it into a complete content operation:
Record episode
↓
Transcribe (Whisper, speaker detection)
↓
Find viral moments (Claude AI + audio energy + knowledge base)
↓
Render clips (9:16, captions, smart crop, normalized audio)
↓
Generate content package (titles, descriptions, thumbnails, SEO) ← PodStack
↓
Publish with optimization checklist ← PodStack
↓
Review performance ← PodStack
The first half is video processing — podcli's core engine. The second half is content workflow — powered by PodStack, a set of Claude Code slash commands that ship with podcli. Both halves are deeply integrated: the clip suggestion engine reads from your PodStack knowledge base, uses your title formulas and voice rules, checks the episode database for duplicates, and outputs MCP-aligned fields that flow through to export.
podcli # then choose "Open Web UI"
# → http://localhost:3847
Drag your video into the Web UI, or use the CLI:
podcli process episode.mp4
podcli uses Claude to analyze your transcript against your show's knowledge base, finding the most viral moments. It scores each one on 4 dimensions, suggests clips with multi-cut segments (cutting out filler), and lets you toggle them on/off before rendering.
Clips come out as upload-ready Shorts: 1080x1920, 9:16 vertical, with burned-in captions, normalized audio, and your logo.
Open the project in Claude Code and run:
/produce-shorts
This runs the PodStack pipeline — a gstack-style workflow that gives you:
Run /publish-checklist when uploading. A week later, run /retro-episode with your YouTube Studio stats to see what worked and what to improve.
| Video Engine (podcli core) | Content Workflow (PodStack) | |
|---|---|---|
| What | Transcription, clip detection, rendering | Titles, descriptions, thumbnails, publishing |
| How | Python + FFmpeg + Whisper + OpenCV + Claude/Codex | Claude Code slash commands |
| Interface | Web UI, CLI, MCP tools | /slash-commands in Claude Code |
| Output | .mp4 files ready to upload |
Content packages ready to paste into YouTube |
Both halves share the same knowledge base (.podcli/knowledge/) — your show's brand, voice, title formulas, episode database, and style guide. Set it up once, everything stays on-brand.
Speaker (MM:SS), JSON, drag-drop .txt / .srt / .vtt/process-transcript — extract and score best moments from any transcript/generate-titles — 8 titles per clip with 6-point verification checklist/generate-descriptions — descriptions + hashtags + SEO keywords/plan-thumbnails — thumbnail text + designer briefs for both formats/review-content — paranoid brand check (banned words, voice, title rules)/produce-shorts — full pipeline: transcript → publish-ready package/publish-checklist — pre/post-publish optimization/retro-episode — performance analysis after publishing.md files that teach the AI your brand, voice, and stylelocalhost:3847podcli process episode.mp4No prerequisites — the install fetches a self-contained binary, and the first run provisions everything it needs (Python, Node, FFmpeg, whisper.cpp, models) into a managed directory. You don't need Go, Node, Python, or FFmpeg installed.
macOS / Linux
curl -fsSL https://podcli.com/install.sh | sh
Windows (PowerShell)
irm https://podcli.com/install.ps1 | iex
Then just run it — the first launch sets itself up:
podcli # interactive menu (and Web UI)
podcli process episode.mp4 # transcribe + export clips
Supported platforms: macOS (Apple Silicon), Linux (x64 / arm64), Windows (x64). Intel Macs are coming in a follow-up release.
To uninstall the app files while keeping your config, knowledge, presets, assets, history, and cache:
podcli uninstall
Add --purge if you want to remove the entire managed podcli folder, including user data.
Optional, for AI clip suggestion and the PodStack slash commands: install Claude Code or Codex (auto-detected).
Building from source needs Go 1.23+ (and Node for the studio bundle); see
plans/native-cli.md.
podcli # then choose "Open Web UI"
# → http://localhost:3847
.txt file, paste Speaker (MM:SS) text, or auto-transcribe with Whisper# One command. Auto-transcribes, picks moments, renders clips.
podcli process episode.mp4
With more control:
# Use an existing transcript instead of transcribing
podcli process episode.mp4 --transcript transcript.txt --top 5
# Full options
podcli process episode.mp4 \
--transcript transcript.txt \
--top 8 \
--caption-style branded \
--crop center \
--logo logo.png
podcli presets save myshow --caption-style branded --logo logo.png --top 5
podcli presets list
podcli process video.mp4 --preset myshow
Open the project in Claude Code, then use slash commands:
# Full pipeline — transcript to publish-ready package
/produce-shorts
# Individual steps
/process-transcript # extract moments from a transcript
/generate-titles # get 8 title options for a clip
/generate-descriptions # get descriptions + hashtags
/plan-thumbnails # get thumbnail briefs for your designer
/review-content # brand and quality review
/publish-checklist # pre/post-publish ops
/retro-episode # performance analysis
Or just paste a transcript — Claude auto-detects the input and runs the right command.
The knowledge base is what makes podcli understand your show. Drop .md files into .podcli/knowledge/ and both the video engine and content workflow use them. The clip suggestion engine reads 8 of these files (prioritized by relevance), checks the episode database for duplicate avoidance, and applies your voice rules and title formulas when generating suggestions.
PodStack ships with 13 starter templates that you fill in with your show's details:
| File | What It Teaches The AI |
|---|---|
00-master-instructions.md |
Auto-detection rules, decision tree, quality gates |
01-brand-identity.md |
Show name, positioning, tagline, hosts, format |
02-voice-and-tone.md |
Voice fingerprint, banned words, the Coffee Test |
03-episodes-database.md |
Episode tracking, existing shorts (for dedup) |
04-shorts-creation-guide.md |
Moment types, selection criteria, extraction process |
05-title-formulas.md |
Title shapes, rules, templates by content type |
06-descriptions-template.md |
Description formulas, hashtag library, SEO keywords |
07-thumbnail-guide.md |
Layouts, brand colors, typography, visual specs |
08-topics-themes.md |
Core topics, cross-cutting themes, audience map |
09-content-workflow.md |
End-to-end workflow phases, handoff specs |
10-internal-processing.md |
Auto-execution rules, internal quality gates |
11-inspiration-channels.md |
Reference channels, viral hooks, hybrid formulas |
12-quick-reference.md |
Copy-paste hooks, hashtags, CTAs, checklists |
Manage via the web UI at /knowledge.html (drag & drop, inline editor) or through the knowledge_base MCP tool.
podcli is a Model Context Protocol server — Claude can use it as a tool to create clips through conversation.
Claude Code — register the bundled MCP server in one command:
podcli mcp install
Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"podcli": {
"command": "podcli",
"args": ["mcp"]
}
}
}
| Tool | Description |
|---|---|
transcribe_podcast |
Transcribe audio/video with Whisper + speaker detection |
suggest_clips |
Submit clip suggestions (includes duplicate check) |
create_clip |
Render a single short-form clip as a vertical short |
batch_create_clips |
Render multiple clips in one batch |
knowledge_base |
Read/manage podcast context files (hosts, style, audience, etc.) |
manage_assets |
Register/list reusable assets (logos, videos) |
clip_history |
View previously created clips, check for duplicates |
get_ui_state |
Read current session state and get workflow next-step guidance |
modify_clip |
Adjust a suggested clip's timing, title, or caption style (or delete it) |
toggle_clip |
Select or deselect a suggested clip for export |
update_settings |
Update rendering settings (caption style, crop strategy, logo, outro) |
list_outputs |
List all rendered clip files in the output directory |
manage_presets |
Save, load, list, or delete rendering presets |
analyze_energy |
Analyze audio energy levels to find high-energy moments |
set_video |
Set the working video file without transcribing |
import_transcript |
Import an external transcript with word-level timestamps (skips Whisper) |
parse_transcript |
Parse raw speaker-labeled plain text into word-level timestamps |
| Style | Look |
|---|---|
| branded | Large bold text, dark box highlight on active word, gradient overlay, optional logo |
| hormozi | Bold uppercase pop-on text, yellow active word (Alex Hormozi style) |
| karaoke | Full sentence visible, words highlight progressively |
| subtle | Clean minimal white text at bottom |
podcli/
├── cli/ # Go launcher (native binary, provisioning, self-update)
├── install.sh / install.ps1 # node-less installers
├── setup.sh # dev environment setup (venv + npm)
├── package.json
├── CLAUDE.md # PodStack master config
│
├── .claude/commands/ # PodStack slash commands
│ ├── process-transcript.md
│ ├── generate-titles.md
│ ├── generate-descriptions.md
│ ├── plan-thumbnails.md
│ ├── review-content.md
│ ├── produce-shorts.md
│ ├── publish-checklist.md
│ └── retro-episode.md
│
├── src/ # TypeScript
│ ├── index.ts # MCP server entry (stdio)
│ ├── server.ts # MCP tool definitions
│ ├── config/paths.ts
│ ├── models/index.ts
│ ├── handlers/ # MCP tool handlers
│ ├── services/
│ │ ├── python-executor.ts
│ │ ├── file-manager.ts
│ │ ├── asset-manager.ts
│ │ ├── clips-history.ts
│ │ ├── knowledge-base.ts
│ │ └── transcript-cache.ts
│ └── ui/
│ ├── web-server.ts # Express server + API
│ └── public/ # Frontend (React SPA)
│
├── backend/ # Python
│ ├── main.py # stdin/stdout JSON dispatcher
│ ├── cli.py # CLI entry point
│ ├── presets.py
│ ├── requirements.txt
│ ├── models/ # ML model files
│ │ └── face_detection_yunet_2023mar.onnx
│ ├── services/ # Whisper, FFmpeg, captions, face tracking, etc.
│ │ ├── face_detector.py # shared YuNet face detector
│ │ └── ...
│ └── config/
│ └── caption_styles.py
│
├── .podcli/ # config home (gitignored) — knowledge, presets, assets
│ ├── knowledge/
│ ├── assets/
│ ├── presets/
│ └── history/
└── data/ # runtime data (gitignored) — cache, output, working
├── cache/ # CLI transcription cache + remotion bundle
│ └── transcripts/ # MCP/UI transcript cache
├── output/ # rendered clips
└── working/ # temp uploads and task dirs
Copy .env.example to .env (setup.sh does this automatically):
| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL |
base |
Whisper model size (tiny, base, small, medium, large) |
WHISPER_DEVICE |
auto |
cpu, cuda, or auto |
PYTHON_PATH |
(venv) | Path to Python binary |
PODCLI_HOME |
.podcli/ |
Config home (knowledge, presets, assets, settings) |
PODCLI_DATA |
data/ |
Runtime data (cache, output, working, logs) |
FFMPEG_PATH |
ffmpeg |
Custom FFmpeg path |
LOG_LEVEL |
info |
Logging verbosity |
Portable bundles zip your config home (not cache or rendered clips):
podcli config export ~/backups/myshow.zip
podcli config import ~/backups/myshow.zip --home ~/.podcli-myshow --activate
podcli config status
Activate a config root without importing: podcli config use ~/.podcli-myshow (writes .podcli-home in the project).
Older releases stored transcription cache under project/.podcli/cache/ (now data/cache/) and presets under project/presets/ (now .podcli/presets/). After upgrading, migration runs automatically when legacy files are still present (CLI, Web UI, MCP). To preview or run manually:
podcli config migrate --dry-run # preview only
podcli config migrate # apply (same as auto when legacy cache exists)
One source of truth: settings live in config home (PODCLI_HOME or .podcli/, tracked by .podcli-home); heavy/runtime files live under data (PODCLI_DATA or data/). The marker file only points at which config home is active — it does not replace either root.
MCP: manage_config(action=migrate).
Web UI: Config profiles (when npm run ui is running).
See CONTRIBUTING.md for development conventions.
Speaker Name (00:00)
What they said goes here as plain text.
Another Speaker (00:45)
Their response text here.
The time offset field (default: -1s) shifts all timestamps to sync with audio.
Content workflow powered by PodStack — inspired by gstack by Garry Tan.
AGPL-3.0. See LICENSE.
Need to use Podcli without AGPL terms? A commercial license is available — email siradze@nikusha.me with a one-line description of your use case.
Please log in to share your review and rating for this MCP.
Explore related MCPs that share similar capabilities and solve comparable challenges
by MiniMax-AI
Enables interaction with powerful text‑to‑speech, image generation and video generation APIs through a Model Context Protocol server.
by burningion
Upload, edit, search, and generate videos by leveraging LLM capabilities together with Video Jungle's media library.
by mamertofabian
Generate speech audio from text via ElevenLabs API and manage voice generation tasks through a Model Context Protocol server with a companion SvelteKit web client.
by Flyworks-AI
Create fast, free lip‑sync videos for digital avatars by providing audio or text, with optional avatar generation from images or videos.
by mberg
Generates spoken audio from text, outputting MP3 files locally and optionally uploading them to Amazon S3.
by allvoicelab
Generate natural speech, translate and dub videos, clone voices, remove hardcoded subtitles, and extract subtitles using powerful AI APIs.
by nabid-pf
Extracts YouTube video captions, subtitles, and metadata to supply structured information for AI assistants to generate concise video summaries.
by omergocmen
Provides video generation and status checking via the json2video API for seamless integration with LLMs, agents, and other MCP‑compatible clients.
by TSavo
Provides an enterprise‑grade MCP server that exposes 12 AI video generation tools, enabling AI assistants to create avatar videos, URL‑to‑video conversions, short videos, scripts, custom avatars, advanced lip‑sync, and more through natural language interactions.