by hyperbrowserai
Enables natural-language-driven browser automation using Playwright and LLMs, allowing tasks like navigation, data extraction, and multi‑page workflows without writing brittle scripts.
HyperAgent extends Playwright with large language model (LLM) capabilities, turning plain English commands into robust browser actions. It can navigate websites, interact with page elements, extract structured data, and orchestrate multi‑page flows, all while optionally falling back to standard Playwright APIs when AI isn’t required.
npm install @hyperbrowser/agent # or yarn add @hyperbrowser/agent
npx @hyperbrowser/agent -c "Find a route from Miami to New Orleans, and provide the detailed route information."
Options: -d/--debug for debug mode, --hyperbrowser to run against the Hyperbrowser cloud service.import { HyperAgent } from "@hyperbrowser/agent";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";
const agent = new HyperAgent({
llm: new ChatOpenAI({ openAIApiKey: process.env.OPENAI_API_KEY, modelName: "gpt-4o" })
});
const result = await agent.executeTask(
"Navigate to amazon.com, search for 'laptop', and extract the prices of the first 5 results"
);
console.log(result.output);
await agent.closeAgent();
const agent = new HyperAgent({ browserProvider: "Hyperbrowser" });
const resp = await agent.executeTask(
"Go to hackernews, and list me the 5 most recent article titles"
);
console.log(resp);
await agent.closeAgent();
page.ai() and page.extract() – simple APIs for LLM‑driven actions and schema‑validated extraction.browserProvider uses a local Playwright instance.page.extract() or agent.executeTask(); the LLM returns data that conforms to the schema.Hyperagent is Playwright supercharged with AI. No more brittle scripts, just powerful natural language commands. Just looking for scalable headless browsers or scraping infra? Go to Hyperbrowser to get started for free!
page.ai(), page.extract() and executeTask() for any AI automation# Using npm
npm install @hyperbrowser/agent
# Using yarn
yarn add @hyperbrowser/agent
$ npx @hyperbrowser/agent -c "Find a route from Miami to New Orleans, and provide the detailed route information."
The CLI supports options for debugging or using hyperbrowser instead of a local browser
-d, --debug Enable debug mode
-c, --command <task description> Command to run
--hyperbrowser Use Hyperbrowser for the browser provider
import { HyperAgent } from "@hyperbrowser/agent";
import { z } from "zod";
// Initialize the agent
const agent = new HyperAgent({
llm: {
provider: "openai",
model: "gpt-4o",
},
});
// Execute a task
const result = await agent.executeTask(
"Navigate to amazon.com, search for 'laptop', and extract the prices of the first 5 results"
);
console.log(result.output);
// Use page.ai and page.extract
const page = await agent.newPage();
await page.goto("https://flights.google.com", { waitUntil: "load" });
await page.ai("search for flights from Rio to LAX from July 16 to July 22");
const res = await page.extract(
"give me the flight options",
z.object({
flights: z.array(
z.object({
price: z.number(),
departure: z.string(),
arrival: z.string(),
})
),
})
);
console.log(res);
// Clean up
await agent.closeAgent();
HyperAgent provides two complementary APIs optimized for different use cases:
page.aiAction() - Single Granular ActionsBest for: Single, specific actions like "click login", "fill email with test@example.com"
Advantages:
Example:
const page = await agent.newPage();
await page.goto("https://example.com/login");
// Fast, reliable single actions
await page.aiAction("fill email with user@example.com");
await page.aiAction("fill password with mypassword");
await page.aiAction("click the login button");
page.ai() - Complex Multi-Step TasksBest for: Complex workflows requiring multiple steps and visual context
Advantages:
Parameters:
useDomCache (boolean): Reuse DOM snapshots for speedenableVisualMode (boolean): Enable screenshots and overlays (default: false)Example:
const page = await agent.newPage();
await page.goto("https://flights.google.com");
// Complex task with multiple steps handled automatically
await page.ai("search for flights from Miami to New Orleans on July 16", {
useDomCache: true,
});
Combine both APIs for optimal performance:
// Use aiAction for fast, reliable individual actions
await page.aiAction("click the search button");
await page.aiAction("type laptop into search");
// Use ai() for complex, multi-step workflows
await page.ai("filter results by price under $1000 and sort by rating");
// Extract structured data
const products = await page.extract(
"get the top 5 products",
z.object({
products: z.array(z.object({ name: z.string(), price: z.number() }))
})
);
You can scale HyperAgent with cloud headless browsers using Hyperbrowser
HYPERBROWSER_API_KEYbrowserProvider to "Hyperbrowser"const agent = new HyperAgent({
browserProvider: "Hyperbrowser",
});
const response = await agent.executeTask(
"Go to hackernews, and list me the 5 most recent article titles"
);
console.log(response);
await agent.closeAgent();
// Create and manage multiple pages
const page1 = await agent.newPage();
const page2 = await agent.newPage();
// Execute tasks on specific pages
const page1Response = await page1.ai(
"Go to google.com/travel/explore and set the starting location to New York. Then, return to me the first recommended destination that shows up. Return to me only the name of the location."
);
const page2Response = await page2.ai(
`I want to plan a trip to ${page1Response.output}. Recommend me places to visit there.`
);
console.log(page2Response.output);
// Get all active pages
const pages = await agent.getPages();
await agent.closeAgent();
HyperAgent can extract data in a specified schema. The schema can be passed in at a per-task level
import { z } from "zod";
const agent = new HyperAgent();
const agentResponse = await agent.executeTask(
"Navigate to imdb.com, search for 'The Matrix', and extract the director, release year, and rating",
{
outputSchema: z.object({
director: z.string().describe("The name of the movie director"),
releaseYear: z.number().describe("The year the movie was released"),
rating: z.string().describe("The IMDb rating of the movie"),
}),
}
);
console.log(agentResponse.output);
await agent.closeAgent();
{
"director": "Lana Wachowski, Lilly Wachowski",
"releaseYear": 1999,
"rating": "8.7/10"
}
Hyperagent supports multiple LLM providers with native SDKs for better performance and reliability.
// Using OpenAI
const agent = new HyperAgent({
llm: {
provider: "openai",
model: "gpt-4o",
},
});
// Using Anthropic's Claude
const agent = new HyperAgent({
llm: {
provider: "anthropic",
model: "claude-3-7-sonnet-latest",
},
});
// Using Google Gemini
const agent = new HyperAgent({
llm: {
provider: "gemini",
model: "gemini-2.5-pro-preview-03-25",
},
});
// Using DeepSeek
const agent = new HyperAgent({
llm: {
provider: "deepseek",
model: "deepseek-chat",
},
});
HyperAgent functions as a fully functional MCP client. For best results, we recommend using
gpt-4o as your LLM.
Here is an example which reads from wikipedia, and inserts information into a google sheet using the composio Google Sheet MCP. For the full example, see here
const agent = new HyperAgent({
llm: llm,
debug: true,
});
await agent.initializeMCPClient({
servers: [
{
command: "npx",
args: [
"@composio/mcp@latest",
"start",
"--url",
"https://mcp.composio.dev/googlesheets/...",
],
env: {
npm_config_yes: "true",
},
},
],
});
const response = await agent.executeTask(
"Go to https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population and get the data on the top 5 most populous states from the table. Then insert that data into a google sheet. You may need to first check if there is an active connection to google sheet, and if there isn't connect to it and present me with the link to sign in. "
);
console.log(response);
await agent.closeAgent();
HyperAgent's capabilities can be extended with custom actions. Custom actions require 3 things:
Here is an example that performs a search using Exa
const exaInstance = new Exa(process.env.EXA_API_KEY);
export const RunSearchActionDefinition: AgentActionDefinition = {
type: "perform_search",
actionParams: z.object({
search: z
.string()
.describe(
"The search query for something you want to search about. Keep the search query concise and to-the-point."
),
}).describe("Search and return the results for a given query.");,
run: async function (
ctx: ActionContext,
params: z.infer<typeof searchSchema>
): Promise<ActionOutput> {
const results = (await exaInstance.search(params.search, {})).results
.map(
(res) =>
`title: ${res.title} || url: ${res.url} || relevance: ${res.score}`
)
.join("\n");
return {
success: true,
message: `Succesfully performed search for query ${params.search}. Got results: \n${results}`,
};
},
};
const agent = new HyperAgent({
"Search about the news for today in New York",
customActions: [RunSearchActionDefinition],
});
HyperAgent speaks Chrome DevTools Protocol natively. Element lookup, scrolling, typing, frame management, and screenshots all go through CDP so every action has exact coordinates, execution contexts, and browser events. This allows for more custom commands and deep iframe tracking.
HyperAgent integrates seamlessly with Playwright, so you can still use familiar commands, while the actions take full advantage of native CDP protocol with fast locators and advanced iframe tracking.
Key Features:
Keep in mind that CDP is still experimental, and stability is not guaranteed. If you’d like the agent to use Playwright’s native locators/actions instead, set cdpActions: false when you create the agent and it will fall back automatically.
The CDP layer is still evolving—expect rapid polish (and the occasional sharp edge). If you hit something quirky you can toggle CDP off for that workflow and drop us a bug report.
We welcome contributions to Hyperagent! Here's how you can help:
git checkout -b feature/AmazingFeature)git commit -m 'Add some AmazingFeature')git push origin feature/AmazingFeature)Please log in to share your review and rating for this MCP.
Explore related MCPs that share similar capabilities and solve comparable challenges
by modelcontextprotocol
An MCP server implementation that provides a tool for dynamic and reflective problem-solving through a structured thinking process.
by danny-avila
Provides a self‑hosted ChatGPT‑style interface supporting numerous AI models, agents, code interpreter, image generation, multimodal interactions, and secure multi‑user authentication.
by block
Automates engineering tasks on local machines, executing code, building projects, debugging, orchestrating workflows, and interacting with external APIs using any LLM.
by RooCodeInc
Provides an autonomous AI coding partner inside the editor that can understand natural language, manipulate files, run commands, browse the web, and be customized via modes and instructions.
by pydantic
A Python framework that enables seamless integration of Pydantic validation with large language models, providing type‑safe agent construction, dependency injection, and structured output handling.
by mcp-use
A Python SDK that simplifies interaction with MCP servers and enables developers to create custom agents with tool‑calling capabilities.
by lastmile-ai
Build effective agents using Model Context Protocol and simple, composable workflow patterns.
by Klavis-AI
Provides production‑ready MCP servers and a hosted service for integrating AI applications with over 50 third‑party services via standardized APIs, OAuth, and easy Docker or hosted deployment.
by nanbingxyz
A cross‑platform desktop AI assistant that connects to major LLM providers, supports a local knowledge base, and enables tool integration via MCP servers.
{
"mcpServers": {
"composio": {
"command": "npx",
"args": [
"@composio/mcp@latest",
"start",
"--url",
"https://mcp.composio.dev/googlesheets/..."
],
"env": {
"npm_config_yes": "true"
}
}
}
}claude mcp add composio npx @composio/mcp@latest start --url https://mcp.composio.dev/googlesheets/...