Phoenix

What is Phoenix about?

Provides end‑to‑end observability for large language model (LLM) workflows: captures runtime traces via OpenTelemetry, runs automated evaluations, stores versioned datasets, tracks experiments, and offers a UI playground for prompt engineering.

How to use Phoenix?

Installation – install the server with Python or Docker.
```
pip install arize-phoenix
```
Or pull the container: docker pull arizephoenix/phoenix.
Start the server – run the Docker image or launch the Python package (arize-phoenix serve).
Instrument your code – add the appropriate OpenTelemetry wrapper (e.g., openinference-instrumentation-openai, openinference-instrumentation-langchain, etc.) or use the provided arize-phoenix-otel package for a simplified setup.
Interact via UI – open http://localhost:8080 (default) to explore traces, evaluation results, datasets, and experiments.
Programmatic access – use arize-phoenix-client (Python) or @arizeai/phoenix-client (JS/TS) to create datasets, upload evaluations, and query metadata.

Key Features

Tracing – OpenTelemetry‑based capture of LLM calls across frameworks (LangChain, LlamaIndex, Haystack, etc.).
Evaluation – built‑in response and retrieval evals; customizable eval pipelines via arize-phoenix-evals.
Datasets & Experiments – versioned example sets, experiment tracking for prompt/LLM changes.
Playground & Prompt Management – UI for prompt iteration, model comparison, replay of traced calls, and version‑controlled prompt assets.
Vendor & Language Agnostic – supports OpenAI, Anthropic, Bedrock, Vertex AI, MistralAI, LiteLLM, Google GenAI, and more.
Deploy Anywhere – local, Jupyter, container, Kubernetes, or managed cloud instance.
Extensible SDKs – Python sub‑packages and TypeScript packages for client integration and evaluation.

Use Cases

RAG pipeline debugging – trace retrieval and generation steps to pinpoint latency or relevance issues.
Prompt engineering – iteratively test prompts in the Playground, compare model outputs, and roll back to prior versions.
Model benchmarking – run standardized evals across multiple LLM providers to select the best performer.
Continuous monitoring – integrate with CI/CD to automatically capture traces and evaluation metrics for every deployment.
Agent observability – monitor multi‑agent workflows (e.g., CrewAI, Smolagents) for state transitions and decision points.

FAQ

Q: Do I need a cloud account to run Phoenix? A: No. Phoenix can run locally via Docker or the Python package. Cloud‑hosted instances are optional.

Q: Which frameworks are supported out of the box? A: LangChain, LlamaIndex, Haystack, DSPy, Smolagents, OpenAI SDK, Bedrock, Vertex AI, MistralAI, LiteLLM, and many more via OpenInference.

Q: How are evaluations stored? A: Evaluations are persisted as part of the Phoenix backend and can be queried through the UI or client SDKs.

Q: Can I extend the evaluation suite? A: Yes. Custom eval functions can be added using arize-phoenix-evals or the TypeScript counterpart.

Q: Is there a licensing cost? A: Phoenix is released under the Elastic License 2.0 and is free to use; commercial support is offered by Arize AI.

Phoenix is an open-source AI observability platform designed for experimentation, evaluation, and troubleshooting. It provides:

Tracing - Trace your LLM application's runtime using OpenTelemetry-based instrumentation.
Evaluation - Leverage LLMs to benchmark your application's performance using response and retrieval evals.
Datasets - Create versioned datasets of examples for experimentation, evaluation, and fine-tuning.
Experiments - Track and evaluate changes to prompts, LLMs, and retrieval.
Playground- Optimize prompts, compare models, adjust parameters, and replay traced LLM calls.
Prompt Management- Manage and test prompt changes systematically using version control, tagging, and experimentation.

Phoenix is vendor and language agnostic with out-of-the-box support for popular frameworks (🦙LlamaIndex, 🦜⛓LangChain, Haystack, 🧩DSPy, 🤗smolagents) and LLM providers (OpenAI, Bedrock, MistralAI, VertexAI, LiteLLM, Google GenAI and more). For details on auto-instrumentation, check out the OpenInference project.

Phoenix runs practically anywhere, including your local machine, a Jupyter notebook, a containerized deployment, or in the cloud.

Installation

Install Phoenix via pip or conda

pip install arize-phoenix

Phoenix container images are available via Docker Hub and can be deployed using Docker or Kubernetes. Arize AI also provides cloud instances at app.phoenix.arize.com.

Packages

The arize-phoenix package includes the entire Phoenix platfom. However if you have deployed the Phoenix platform, there are light-weight Python sub-packages and TypeScript packages that can be used in conjunction with the platfrom.

Python Subpackages

Package	Version & Docs	Description
arize-phoenix-otel		Provides a lightweight wrapper around OpenTelemetry primitives with Phoenix-aware defaults
arize-phoenix-client		Lightweight client for interacting with the Phoenix server via its OpenAPI REST interface
arize-phoenix-evals		Tooling to evaluate LLM applications including RAG relevance, answer relevance, and more

TypeScript Subpackages

Package	Version & Docs	Description
@arizeai/phoenix-otel		Provides a lightweight wrapper around OpenTelemetry primitives with Phoenix-aware defaults
@arizeai/phoenix-client		Client for the Arize Phoenix API
@arizeai/phoenix-evals		TypeScript evaluation library for LLM applications (alpha release)
@arizeai/phoenix-mcp		MCP server implementation for Arize Phoenix providing unified interface to Phoenix's capabilities

Tracing Integrations

Phoenix is built on top of OpenTelemetry and is vendor, language, and framework agnostic. For details about tracing integrations and example applications, see the OpenInference project.

Python Integrations

Integration	Package	Version Badge
OpenAI	`openinference-instrumentation-openai`
OpenAI Agents	`openinference-instrumentation-openai-agents`
LlamaIndex	`openinference-instrumentation-llama-index`
DSPy	`openinference-instrumentation-dspy`
AWS Bedrock	`openinference-instrumentation-bedrock`
LangChain	`openinference-instrumentation-langchain`
MistralAI	`openinference-instrumentation-mistralai`
Google GenAI	`openinference-instrumentation-google-genai`
Google ADK	`openinference-instrumentation-google-adk`
Guardrails	`openinference-instrumentation-guardrails`
VertexAI	`openinference-instrumentation-vertexai`
CrewAI	`openinference-instrumentation-crewai`
Haystack	`openinference-instrumentation-haystack`
LiteLLM	`openinference-instrumentation-litellm`
Groq	`openinference-instrumentation-groq`
Instructor	`openinference-instrumentation-instructor`
Anthropic	`openinference-instrumentation-anthropic`
Smolagents	`openinference-instrumentation-smolagents`
Agno	`openinference-instrumentation-agno`
MCP	`openinference-instrumentation-mcp`
Pydantic AI	`openinference-instrumentation-pydantic-ai`
Autogen AgentChat	`openinference-instrumentation-autogen-agentchat`
Portkey	`openinference-instrumentation-portkey`

Span Processors

Normalize and convert data across other instrumentation libraries by adding span processors that unify data.

Package	Description	Version
`openinference-instrumentation-openlit`	OpenInference Span Processor for OpenLIT traces.
`openinference-instrumentation-openllmetry`	OpenInference Span Processor for OpenLLMetry (Traceloop) traces.

JavaScript Integrations

Integration	Package	Version Badge
OpenAI	`@arizeai/openinference-instrumentation-openai`
LangChain.js	`@arizeai/openinference-instrumentation-langchain`
Vercel AI SDK	`@arizeai/openinference-vercel`
BeeAI	`@arizeai/openinference-instrumentation-beeai`
Mastra	`@mastra/arize`

Java Integrations

Integration	Package	Version Badge
LangChain4j	`openinference-instrumentation-langchain4j`
SpringAI	`openinference-instrumentation-springAI`

Platforms

Platform	Description	Docs
BeeAI	AI agent framework with built-in observability	Integration Guide
Dify	Open-source LLM app development platform	Integration Guide
Envoy AI Gateway	AI Gateway built on Envoy Proxy for AI workloads	Integration Guide
LangFlow	Visual framework for building multi-agent and RAG applications	Integration Guide
LiteLLM Proxy	Proxy server for LLMs	Integration Guide

Community

Join our community to connect with thousands of AI builders.

🌍 Join our Slack community.
📚 Read our documentation.
💡 Ask questions and provide feedback in the #phoenix-support channel.
🌟 Leave a star on our GitHub.
🐞 Report bugs with GitHub Issues.
𝕏 Follow us on 𝕏.
🗺️ Check out our roadmap to see where we're heading next.
🧑‍🏫 Deep dive into everything Agents and LLM Evaluations on Arize's Learning Hubs.

Breaking Changes

See the migration guide for a list of breaking changes.

Copyright, Patent, and License

Portions of this code are patent protected by one or more U.S. Patents. See the IP_NOTICE.

This software is licensed under the terms of the Elastic License 2.0 (ELv2). See LICENSE.

Phoenix Overview

What is Phoenix about?

How to use Phoenix?

Key Features

Use Cases

FAQ

Phoenix's README

Installation

Packages

Python Subpackages

TypeScript Subpackages

Tracing Integrations

Span Processors

JavaScript Integrations

Java Integrations

Platforms

Community

Breaking Changes

Copyright, Patent, and License

Phoenix Reviews

Login Required

Similar MCP Servers like Phoenix

Netdata

Tianji

Grafana MCP Server

Dynatrace MCP Server

Logfire MCP Server

VictoriaMetrics MCP Server

Datadog MCP Server

Loki MCP Server

JMeter MCP Server

Actions

Phoenix's Information

Configuration

Claude Code (Terminal)

Configure Clients