by landing-ai
Provides a lightweight side‑car that translates MCP tool calls into authenticated requests to Landing AI’s VisionAgent REST APIs, returning JSON responses and visual outputs for computer‑vision tasks.
It acts as a local bridge that receives Model Context Protocol (MCP) tool calls from compatible clients (e.g., Claude Desktop, Cursor, Cline) and forwards them to Landing AI’s VisionAgent cloud APIs. The server handles argument validation, authentication, and optional rendering of images, masks, or depth maps, then streams the results back to the client.
npm install -g vision-tools-mcp (or use the built package directly with npx).VISION_AGENT_API_KEY. Optional: OUTPUT_DIRECTORY for saved visual assets and IMAGE_DISPLAY_ENABLED (true/false).npx vision-tools-mcp and passes the env vars.detect all traffic lights in street.png).npm run generate-tools pulls the latest OpenAPI spec and creates up‑to‑date MCP tool definitions.VISION_AGENT_API_KEY Bearer token for all outgoing calls.Q: Which clients are compatible? A: Any MCP‑aware client such as Claude Desktop, Cursor, Cline, or custom implementations using the Model Context Protocol SDK.
Q: Do I need to write any code to call VisionAgent APIs? A: No. The server handles the HTTP request/response cycle; you only issue natural‑language prompts.
Q: How are large files handled? A: Files are read, base64‑encoded (or streamed for multipart uploads) by the server before being sent to the VisionAgent endpoint.
Q: Can I disable image rendering?
A: Set IMAGE_DISPLAY_ENABLED to false in the environment; the server will return only JSON data and file paths.
Q: What if the tool list is outdated?
A: Run npm run generate-tools to refresh the tool map from the latest OpenAPI spec.
Q: What Node.js version is required?
A: Node 20 LTS or newer, as the code relies on native Blob and FormData APIs.
Beta – v0.1
This project is early access and subject to breaking changes until v1.0.
Modern LLM “agents” call external tools through the Model Context Protocol (MCP). VisionAgent MCP is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client (Claude Desktop, Cursor, Cline, etc.) into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.
https://github.com/user-attachments/assets/2017fa01-0e7f-411c-a417-9f79562627b7
| Capability | Description |
|---|---|
agentic-document-analysis |
Parse PDFs / images to extract text, tables, charts, and diagrams taking into account layouts and other visual cues. Web Version here. |
text-to-object-detection |
Detect free-form prompts (“all traffic lights”) using OWLv2 / CountGD / Florence-2 / Agentic Object Detection (Web Version here); outputs bounding boxes. |
text-to-instance-segmentation |
Pixel-perfect masks via Florence-2 + Segment-Anything-v2 (SAM-2). |
activity-recognition |
Recognise multiple activities in video with start/end timestamps. |
depth-pro |
High-resolution monocular depth estimation for single images. |
Run
npm run generate-toolswhenever VisionAgent releases new endpoints. The script fetches the latest OpenAPI spec and regenerates the local tool map automatically.
If you do not have a VisionAgent API key, create an account and obtain your API key.
# 1 Install
npm install -g vision-tools-mcp
# 2 Configure your MCP client with the following settings:
{
"mcpServers": {
"VisionAgent": {
"command": "npx",
"args": ["vision-tools-mcp"],
"env": {
"VISION_AGENT_API_KEY": "<YOUR_API_KEY>",
"OUTPUT_DIRECTORY": "/path/to/output/directory",
"IMAGE_DISPLAY_ENABLED": "true" # or false, see below
}
}
}
}
Detect all traffic lights in /path/to/mcp/vision-agent-mcp/assets/street.png
If your client supports inline resources, you’ll see bounding-box overlays; otherwise, the PNG is saved to your output directory, and the chat shows its path.
| Software | Minimum Version |
|---|---|
| Node.js | 20 (LTS) |
| VisionAgent account | Any paid or free tier (needs API key) |
| MCP client | Claude Desktop / Cursor / Cline / etc. |
| ENV var | Required | Default | Purpose |
|---|---|---|---|
VISION_AGENT_API_KEY |
Yes | — | Landing AI auth token. |
OUTPUT_DIRECTORY |
No | — | Where rendered images / masks / depth maps are stored. |
IMAGE_DISPLAY_ENABLED |
No | true |
false ➜ skip rendering |
.mcp.json for VS Code / Cursor){
"mcpServers": {
"VisionAgent": {
"command": "npx",
"args": ["vision-tools-mcp"],
"env": {
"VISION_AGENT_API_KEY": "912jkefief09jfjkMfoklwOWdp9293jefklwfweLQWO9jfjkMfoklwDK",
"OUTPUT_DIRECTORY": "/Users/me/documents/mcp/test",
"IMAGE_DISPLAY_ENABLED": "false"
}
}
}
}
For MCP clients without image display capabilities, like Cursor, set IMAGE_DISPLAY_ENABLED to False. For MCP clients with image display capabilities, like Claude Desktop, set IMAGE_DISPLAY_ENABLED to true to visualize tool outputs. Generally, MCP clients that support resources (see this list: https://modelcontextprotocol.io/clients) will support image display.
| Scenario | Prompt (after uploading file) |
|---|---|
| Invoice extraction | “Extract vendor, invoice date & total from this PDF using agentic-document-analysis.” |
| Pedrestrian Recognition | “Locate every pedestrian in street.jpg via text-to-object-detection.” |
| Agricultural segmentation | “Segment all tomatoes in kitchen.png with text-to-instance-segmentation.” |
| Activity recognition (video) | “Identify activities occurring in match.mp4 via activity-recognition.” |
| Depth estimation | “Produce a depth map for selfie.png using depth-pro.” |
┌────────────────────┐ 1. human prompt ┌───────────────────┐
│ MCP-capable client │───────────────────────────▶│ VisionAgent MCP │
│ (Cursor, Claude) │ │ (this repo) │
└────────────────────┘ └─────────▲─────────┘
▲ 6. rendered PNG / JSON │ 2. JSON tool call
│ │
│ 5. preview path / data 3. HTTPS │
│ ▼
local disk ◀──────────┐ Landing AI VisionAgent
└────────────── Cloud APIs
4. JSON / media blob
Here’s how to dive into the code, add new endpoints, or troubleshoot issues.
Clone the repository:
git clone https://github.com/landing-ai/vision-agent-mcp.git
Navigate into the project directory:
cd vision-agent-mcp
Install dependencies:
npm install
Build the project:
npm run build
VISION_AGENT_API_KEY - Required API key for VisionAgent authenticationOUTPUT_DIRECTORY - Optional directory for saving processed outputs (supports relative and absolute paths)IMAGE_DISPLAY_ENABLED - Set to "true" to enable image visualization featuresAfter building, configure your MCP client with the following settings:
{
"mcpServers": {
"VisionAgent": {
"command": "node",
"args": [
"/path/to/build/index.js"
],
"env": {
"VISION_AGENT_API_KEY": "<YOUR_API_KEY>",
"OUTPUT_DIRECTORY": "../../output",
"IMAGE_DISPLAY_ENABLED": "true"
}
}
}
}
Note: Replace
/path/to/build/index.jswith the actual path to your builtindex.jsfile, and set your environment variables as needed. For MCP clients without image display capabilities, like Cursor, set IMAGE_DISPLAY_ENABLED to False. For MCP clients with image display capabilities, like Claude Desktop, set IMAGE_DISPLAY_ENABLED to true to visualize tool outputs. Generally, MCP clients that support resources (see this list: https://modelcontextprotocol.io/clients) will support image display.
| Script | Purpose |
|---|---|
npm run build |
Compile TypeScript → build/ (adds executable bit). |
npm run start |
Build and run (node build/index.js). |
npm run typecheck |
Type-only check (tsc --noEmit). |
npm run generate-tools |
Fetch latest OpenAPI and regenerate toolDefinitionMap.ts. |
npm run build:all |
Convenience: npm run build + npm run generate-tools. |
Pro Tip: If you modify any files under
src/or want to pick up new endpoints from VisionAgent, runnpm run build:allto recompile + regenerate tool definitions.
vision-agent-mcp/
├── .eslintrc.json # ESLint config (optional)
├── .gitignore # Ignore node_modules, build/, .env, etc.
├── jest.config.js # Placeholder for future unit tests
├── mcp-va.md # Draft docs (incomplete)
├── package.json # npm metadata, scripts, dependencies
├── package-lock.json # Lockfile
├── tsconfig.json # TypeScript compiler config
├── .env # Your environment variables (not committed)
│
├── src/ # TypeScript source code
│ ├── generateTools.ts # Dev script: fetch OpenAPI → generate MCP tool definitions (Zod schemas)
│ ├── index.ts # Entry point: load .env, start MCP server, handle signals
│ ├── toolDefinitionMap.ts # Auto-generated MCP tool definitions (don’t edit by hand)
│ ├── toolUtils.ts # Helpers to build MCP tool objects (metadata, descriptions)
│ ├── types.ts # Core TS interfaces (MCP, environment config, etc.)
│ │
│ ├── server/ # MCP server logic
│ │ ├── index.ts # Create & start the MCP server (Server + Stdio transport)
│ │ ├── handlers.ts # `handleListTools` & `handleCallTool` implementations
│ │ ├── visualization.ts # Post-process & save image/video outputs (masks, boxes, depth maps)
│ │ └── config.ts # Load & validate .env, export SERVER_CONFIG & EnvConfig
│ │
│ ├── utils/ # Generic utilities
│ │ ├── file.ts # File handling (base64 encode images/PDFs, read streams)
│ │ └── http.ts # Axios wrappers & error formatting
│ │
│ └── validation/ # Zod schema generation & argument validation
│ └── schema.ts # Convert JSON Schema → Zod, validate incoming tool args
│
├── build/ # Compiled JavaScript (generated after `npm run build`)
│ ├── index.js
│ ├── generateTools.js
│ ├── toolDefinitionMap.js
│ └── … # Mirror of `src/` structure
│
├── output/ # Runtime artifacts (bounding boxes, masks, depth maps, etc.)
│
└── assets/ # Static assets (e.g., demo.gif)
└── demo.gif
src/generateTools.ts
https://api.va.landing.ai/openapi.json (VisionAgent’s public OpenAPI).toolDefinitionMap.ts with a Map<string, McpToolDefinition>.npm run generate-tools.src/toolDefinitionMap.ts
src/server/handlers.ts
Implements handleListTools: returns [ { name, description, inputSchema } ].
Implements handleCallTool:
arguments with Zod.imagePath, pdfPath), reads & base64-encodes via src/utils/file.ts.IMAGE_DISPLAY_ENABLED=true, calls src/server/visualization.ts to save PNGs/JSON.src/server/visualization.ts
OUTPUT_DIRECTORY.src/utils/file.ts
readFileAsBase64(path: string): Promise<string>: Reads any binary (image, PDF, video) and returns base64.loadFileStream(path: string): Returns a Node.js stream for large file uploads.src/utils/http.ts
https://api.va.landing.ai.Authorization: Bearer ${VISION_AGENT_API_KEY} header.src/validation/schema.ts
buildZodSchema(jsonSchema: any): ZodObject used by generateTools.ts.src/index.ts
Loads dotenv (reads .env).
Validates required env vars (VISION_AGENT_API_KEY).
Imports generated toolDefinitionMap.
Creates an MCP Server (from @modelcontextprotocol/sdk/server) with StdioServerTransport.
Wires ListTools → handleListTools, CallTool → handleCallTool.
Logs startup info:
vision-tools-api MCP Server (v0.1.0) running on stdio, proxying to https://api.va.landing.ai
Listens for SIGINT/SIGTERM to gracefully shut down.
Validation Errors If you send invalid or missing parameters, the server returns:
{
"id": 3,
"error": {
"code": -32602,
"message": "Validation error: missing required parameter ‘imagePath’"
}
}
Network Errors Axios errors (timeouts, 5xx) are caught and returned as:
{
"id": 4,
"error": {
"code": -32000,
"message": "VisionAgent API error: 502 Bad Gateway"
}
}
Internal Exceptions Uncaught exceptions in handlers produce:
{
"id": 5,
"error": {
"code": -32603,
"message": "Internal error: Unexpected token in JSON at position 345"
}
}
VISION_AGENT_API_KEY is correct and active.api.va.landing.ai isn’t blocked by a proxy/VPN.The local tool map may be stale. Run:
npm run generate-tools
npm start
The code uses the Blob & FormData APIs natively introduced in Node 20.
Upgrade via nvm install 20 (mac/Linux) or download from nodejs.org if on Windows.
For other issues, refer to the MCP documentation: https://modelcontextprotocol.io/quickstart/user
Also not that specific clients will have their own helpful documentation. For example, if you are using the OpenAI Agents SDK, refer to their documentation here: https://openai.github.io/openai-agents-python/mcp/
We love PRs!
git checkout -b feature/my-feature.npm run typecheck (no errors)OUTPUT_DIRECTORY only on your machine.Made with ❤️ by the LandingAI Team.
Please log in to share your review and rating for this MCP.
{
"mcpServers": {
"VisionAgent": {
"command": "npx",
"args": [
"vision-tools-mcp"
],
"env": {
"VISION_AGENT_API_KEY": "<YOUR_API_KEY>",
"OUTPUT_DIRECTORY": "/path/to/output/directory",
"IMAGE_DISPLAY_ENABLED": "true"
}
}
}
}claude mcp add VisionAgent npx vision-tools-mcpExplore related MCPs that share similar capabilities and solve comparable challenges
by zed-industries
A high‑performance, multiplayer code editor designed for speed and collaboration.
by modelcontextprotocol
Model Context Protocol Servers
by modelcontextprotocol
A Model Context Protocol server for Git repository interaction and automation.
by modelcontextprotocol
A Model Context Protocol server that provides time and timezone conversion capabilities.
by cline
An autonomous coding assistant that can create and edit files, execute terminal commands, and interact with a browser directly from your IDE, operating step‑by‑step with explicit user permission.
by continuedev
Enables faster shipping of code by integrating continuous AI agents across IDEs, terminals, and CI pipelines, offering chat, edit, autocomplete, and customizable agent workflows.
by upstash
Provides up-to-date, version‑specific library documentation and code examples directly inside LLM prompts, eliminating outdated information and hallucinated APIs.
by github
Connects AI tools directly to GitHub, enabling natural‑language interactions for repository browsing, issue and pull‑request management, CI/CD monitoring, code‑security analysis, and team collaboration.
by daytonaio
Provides a secure, elastic infrastructure that creates isolated sandboxes for running AI‑generated code with sub‑90 ms startup, unlimited persistence, and OCI/Docker compatibility.