OpenAlex Author Disambiguation

OpenAlex Author Disambiguation MCP Server

A streamlined Model Context Protocol (MCP) server for author disambiguation and academic research using the OpenAlex.org API. Specifically designed for AI agents with optimized data structures and enhanced functionality.

🎯 Key Features

🔍 Core Capabilities

Advanced Author Disambiguation: Handles complex career transitions and name variations
Institution Resolution: Current and past affiliations with transition tracking
Academic Work Retrieval: Journal articles, letters, and research papers
Citation Analysis: H-index, citation counts, and impact metrics
ORCID Integration: Highest accuracy matching with ORCID identifiers

🚀 AI Agent Optimized

Streamlined Data: Focused on essential information for disambiguation
Fast Processing: Optimized data structures for rapid analysis
Smart Filtering: Enhanced filtering options for targeted queries
Clean Output: Structured responses optimized for AI reasoning

🤖 Agent Integration

Multiple Candidates: Ranked results for automated decision-making
Structured Responses: Clean, parseable output optimized for LLMs
Error Handling: Graceful degradation with informative messages
Enhanced Filtering: Journal-only, citation thresholds, and temporal filters

🏛️ Professional Grade

MCP Best Practices: Built with FastMCP following official guidelines
Tool Annotations: Proper MCP tool annotations for optimal client integration
Resource Management: Efficient HTTP client management and cleanup
Rate Limiting: Respectful API usage with proper delays

🚀 Quick Start

Prerequisites

Python 3.10 or higher
MCP-compatible client (e.g., Claude Desktop)
Email address (for OpenAlex API courtesy)

Installation

For detailed installation instructions, see INSTALL.md.

Clone the repository:

git clone https://github.com/drAbreu/alex-mcp.git
cd alex-mcp

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package:
```
pip install -e .
```

Configure environment:

export OPENALEX_MAILTO=your-email@domain.com

Run the server:

./run_alex_mcp.sh
# Or, if installed as a CLI tool:
alex-mcp

⚙️ MCP Configuration

Claude Desktop Configuration

Add to your Claude Desktop configuration file:

{
  "mcpServers": {
    "alex-mcp": {
      "command": "/path/to/alex-mcp/run_alex_mcp.sh",
      "env": {
        "OPENALEX_MAILTO": "your-email@domain.com"
      }
    }
  }
}

Replace /path/to/alex-mcp with the actual path to the repository on your system.

🤖 Using with AI Agents

OpenAI Agents Integration

You can load this MCP server in your OpenAI agent workflow using the agents.mcp.MCPServerStdio interface:

from agents.mcp import MCPServerStdio

async with MCPServerStdio(
    name="OpenAlex MCP For Author disambiguation and works",
    cache_tools_list=True,
    params={
        "command": "uvx",
        "args": [
            "--from", "git+https://github.com/drAbreu/alex-mcp.git@4.1.0",
            "alex-mcp"
        ],
        "env": {
            "OPENALEX_MAILTO": "your-email@domain.com"
        }
    },
    client_session_timeout_seconds=10
) as alex_mcp:
    await alex_mcp.connect()
    tools = await alex_mcp.list_tools()
    print(f"Available tools: {[tool.name for tool in tools]}")

Academic Research Agent Integration

This MCP server is specifically optimized for academic research workflows:

# Optimized for academic research workflows
from alex_agent import run_author_research

# Enhanced functionality with streamlined data
result = await run_author_research(
    "Find J. Abreu at EMBO with recent publications"
)

# Clean, structured output for AI processing
print(f"Success: {result['workflow_metadata']['success']}")
print(f"Quality: {result['research_result']['metadata']['result_analysis']['quality_score']}/100")

Direct Launch with uvx

# Standard launch
uvx --from git+https://github.com/drAbreu/alex-mcp.git@4.1.0 alex-mcp

# With environment variables
OPENALEX_MAILTO=your-email@domain.com uvx --from git+https://github.com/drAbreu/alex-mcp.git@4.1.0 alex-mcp

🛠️ Available Tools

1. autocomplete_authors ⭐ NEW

Get multiple author candidates using OpenAlex autocomplete API for intelligent disambiguation.

Parameters:

name (required): Author name to search (e.g., "James Briscoe", "M. Ralser")
context (optional): Context for disambiguation (e.g., "Francis Crick Institute developmental biology")
limit (optional): Maximum candidates (1-10, default: 5)

Key Features:

⚡ Fast: ~200ms response time
🎯 Smart: Multiple candidates with institutional hints
🧠 AI-Ready: Perfect for context-based selection
📊 Rich: Works count, citations, institution info

Streamlined Output:

{
  "query": "James Briscoe",
  "context": "Francis Crick Institute",
  "total_candidates": 3,
  "candidates": [
    {
      "openalex_id": "https://openalex.org/A5019391436",
      "display_name": "James Briscoe",
      "institution_hint": "The Francis Crick Institute, UK",
      "works_count": 415,
      "cited_by_count": 24623,
      "external_id": "https://orcid.org/0000-0002-1020-5240"
    }
  ]
}

Usage Pattern:

# Get multiple candidates for disambiguation
candidates = await autocomplete_authors(
    "James Briscoe", 
    context="Francis Crick Institute developmental biology"
)

# AI selects best match based on institutional context
# Much more accurate than single search result!

2. search_authors

Search for authors with streamlined output for AI agents.

Parameters:

name (required): Author name to search
institution (optional): Institution name filter
topic (optional): Research topic filter
country_code (optional): Country code filter (e.g., "US", "DE")
limit (optional): Maximum results (1-25, default: 20)

Streamlined Output:

{
  "query": "J. Abreu",
  "total_count": 3,
  "results": [
    {
      "id": "https://openalex.org/A123456789",
      "display_name": "Jorge Abreu-Vicente",
      "orcid": "https://orcid.org/0000-0000-0000-0000",
      "display_name_alternatives": ["J. Abreu-Vicente", "Jorge Abreu Vicente"],
      "affiliations": [
        {
          "institution": {
            "display_name": "European Molecular Biology Organization",
            "country_code": "DE"
          },
          "years": [2023, 2024, 2025]
        }
      ],
      "cited_by_count": 316,
      "works_count": 25,
      "summary_stats": {
        "h_index": 9,
        "i10_index": 5
      },
      "x_concepts": [
        {
          "display_name": "Astrophysics",
          "score": 0.8
        },
        {
          "display_name": "Machine Learning", 
          "score": 0.6
        }
      ]
    }
  ]
}

Features: Clean structure optimized for AI reasoning and disambiguation

2. retrieve_author_works

Retrieve works for a given author with enhanced filtering capabilities.

Parameters:

author_id (required): OpenAlex author ID
limit (optional): Maximum results (1-50, default: 20)
order_by (optional): "date" or "citations" (default: "date")
publication_year (optional): Filter by specific year
type (optional): Work type filter (e.g., "journal-article")
authorships_institutions_id (optional): Filter by institution
is_retracted (optional): Filter retracted works
open_access_is_oa (optional): Filter by open access status

Enhanced Output:

{
  "author_id": "https://openalex.org/A123456789",
  "total_count": 25,
  "results": [
    {
      "id": "https://openalex.org/W123456789",
      "title": "A platform for the biomedical application of large language models",
      "doi": "10.1038/s41587-024-02534-3",
      "publication_year": 2025,
      "type": "journal-article",
      "cited_by_count": 42,
      "authorships": [
        {
          "author": {
            "display_name": "Jorge Abreu-Vicente"
          },
          "institutions": [
            {
              "display_name": "European Molecular Biology Organization"
            }
          ]
        }
      ],
      "locations": [
        {
          "source": {
            "display_name": "Nature Biotechnology",
            "type": "journal"
          }
        }
      ],
      "open_access": {
        "is_oa": true
      },
      "primary_topic": {
        "display_name": "Biomedical Engineering"
      }
    }
  ]
}

Features: Comprehensive work data with flexible filtering for targeted queries

📊 Data Optimization

Focused Information Architecture

This MCP server provides focused, structured data specifically designed for AI agent consumption:

Author Data Features

Identity Resolution: Names, ORCID, alternatives for disambiguation
Affiliation Tracking: Current and historical institutional connections
Impact Metrics: Citation counts, h-index, and scholarly impact
Research Context: Fields, concepts, and domain expertise
Career Analysis: Temporal affiliation changes and transitions

Work Data Features

Publication Metadata: Title, DOI, venue, and publication details
Impact Assessment: Citation counts and scholarly influence
Access Information: Open access status and availability
Authorship Details: Complete author lists and institutional affiliations
Research Classification: Topics, concepts, and domain categorization

Enhanced Filtering

# Target high-impact journal articles
works = await retrieve_author_works(
    author_id="https://openalex.org/A123456789",
    type="journal-article",      # Focus on journal publications
    open_access_is_oa=True,      # Open access only
    order_by="citations",        # Most cited first
    limit=15
)

# Career transition analysis
authors = await search_authors(
    name="J. Abreu",
    institution="EMBO",          # Current institution
    topic="Machine Learning",    # Research focus
    limit=10
)

🧪 Example Usage

Author Disambiguation

from alex_mcp.server import search_authors_core

# Comprehensive author search
results = search_authors_core(
    name="J Abreu Vicente",
    institution="EMBO",
    topic="Machine Learning",
    limit=20
)

print(f"Found {results.total_count} candidates")
for author in results.results:
    print(f"- {author.display_name}")
    if author.affiliations:
        current_inst = author.affiliations[0].institution.display_name
        print(f"  Institution: {current_inst}")
    print(f"  Metrics: {author.cited_by_count} citations, h-index {author.summary_stats.h_index}")
    if author.x_concepts:
        fields = [c.display_name for c in author.x_concepts[:3]]
        print(f"  Research: {', '.join(fields)}")

Academic Work Analysis

from alex_mcp.server import retrieve_author_works_core

# Comprehensive work retrieval
works = retrieve_author_works_core(
    author_id="https://openalex.org/A5058921480",
    type="journal-article",      # Academic focus
    order_by="citations",        # Impact-based ordering
    limit=20
)

print(f"Found {works.total_count} publications")
for work in works.results:
    print(f"- {work.title}")
    if work.locations:
        journal = work.locations[0].source.display_name
        print(f"  Published in: {journal} ({work.publication_year})")
    print(f"  Impact: {work.cited_by_count} citations")
    if work.open_access and work.open_access.is_oa:
        print("  ✓ Open Access")

Institution and Field Analysis

# Analyze career transitions
def analyze_career_path(author_result):
    affiliations = author_result.affiliations
    if len(affiliations) > 1:
        print("Career path:")
        for aff in sorted(affiliations, key=lambda x: min(x.years)):
            years = f"{min(aff.years)}-{max(aff.years)}"
            print(f"  {years}: {aff.institution.display_name}")
    
    # Research evolution
    if author_result.x_concepts:
        print("Research areas:")
        for concept in author_result.x_concepts[:5]:
            print(f"  {concept.display_name} (score: {concept.score:.2f})")

# Usage
results = search_authors_core("Jorge Abreu Vicente")
if results.results:
    analyze_career_path(results.results[0])

🔧 Configuration Options

Environment Variables

# Required
export OPENALEX_MAILTO=your-email@domain.com

# Optional settings
export OPENALEX_MAX_AUTHORS=100             # Maximum authors per query
export OPENALEX_USER_AGENT=research-agent-v1.0
export ALEX_MCP_VERSION=4.1.0

# Rate limiting (respectful usage)
export OPENALEX_RATE_PER_SEC=10
export OPENALEX_RATE_PER_DAY=100000

Performance Tuning

# For comprehensive research applications
config = {
    "max_authors_per_query": 25,     # Detailed author analysis
    "max_works_per_author": 50,      # Complete publication history
    "enable_all_filters": True,      # Full filtering capabilities
    "detailed_affiliations": True,   # Complete institutional data
    "research_concepts": True        # Detailed concept analysis
}

🧑‍💻 Development & Testing

Project Structure

alex-mcp/
├── src/alex_mcp/
│   ├── server.py              # Main MCP server
│   ├── data_objects.py        # Data models and structures
│   └── utils.py               # Utility functions
├── examples/
│   ├── basic_usage.py         # Simple examples
│   ├── advanced_queries.py    # Complex query examples
│   └── integration_demo.py    # AI agent integration
├── tests/
│   ├── test_server.py         # Server functionality tests
│   └── test_integration.py    # Integration tests
└── docs/
    └── api_reference.md       # Detailed API documentation

Running Tests

# Install test dependencies
pip install -e ".[test]"

# Run functionality tests
pytest tests/test_server.py -v

# Test with real queries
python examples/basic_usage.py

# Test AI agent integration
python examples/integration_demo.py

Development Examples

# Test author disambiguation
python examples/basic_usage.py --query "J. Abreu" --institution "EMBO"

# Test work retrieval
python examples/advanced_queries.py --author-id "A123456789" --type "journal-article"

# Test integration patterns
python examples/integration_demo.py --workflow "career-analysis"

📈 Integration Examples

Academic Research Workflows

Perfect integration with AI-powered research analysis:

# Enhanced academic research agent
from alex_agent import AcademicResearchAgent

agent = AcademicResearchAgent(
    mcp_servers=[alex_mcp],  # Streamlined data processing
    model="gpt-4.1-2025-04-14"
)

# Complex research queries with structured data
result = await agent.research_author(
    "Find J. Abreu at EMBO with machine learning publications"
)

# Rich, structured output for AI reasoning
print(f"Quality Score: {result.quality_score}/100")
print(f"Author disambiguation: {result.confidence}")
print(f"Research fields: {result.research_domains}")

Multi-Agent Systems

# Collaborative research analysis
async def research_collaboration_network(seed_author):
    # Find primary author
    authors = await alex_mcp.search_authors(seed_author)
    primary = authors['results'][0]
    
    # Get their works
    works = await alex_mcp.retrieve_author_works(
        primary['id'], 
        type="journal-article"
    )
    
    # Analyze co-authors and build network
    collaborators = set()
    for work in works['results']:
        for authorship in work.get('authorships', []):
            collaborators.add(authorship['author']['display_name'])
    
    return {
        'primary_author': primary,
        'publication_count': len(works['results']),
        'collaborator_network': list(collaborators),
        'research_impact': sum(w['cited_by_count'] for w in works['results'])
    }

🤝 Contributing

We welcome contributions to improve functionality and add new features:

Fork the repository
Create a feature branch: git checkout -b feature/enhanced-filtering
Add tests: Ensure your changes maintain data quality and structure
Submit a pull request: Include examples and documentation

Development Priorities

📄 License

This project is licensed under the MIT License. See LICENSE for details.

OpenAlex Author Disambiguation