In April 2025, Microsoft quietly dropped Playwright MCP, a new server that connects your AI agent (via MCP) directly to the Playwright browser API. Playwright MCP What does that mean in plain English? Your AI agent can now interact with real web pages using nothing but simple text instructions (and for free!) "Click this," "Take a screenshot"—all without writing a single line of browser automation code. Your AI agent can now interact with real web pages using nothing but simple text instructions Click this Take a screenshot Yeah, that’s so big that Playwright MCP has already become one of the most-starred MCP servers on GitHub. And it’s only just getting started. 🔥 Playwright MCP has already become one of the most-starred MCP servers on GitHub In this step-by-step guide, you’ll see exactly what this server can do, and how to plug it into a Python-based AI workflow using the OpenAI Agents SDK. Why Everyone's Talking About the Playwright MCP Server Playwright MCP is an MCP (Model Context Protocol) server that gives your AI agents serious browser automation superpowers via the Playwright API. Playwright MCP Playwright MCP Behind the scenes, instead of relying on screenshots or vision-tuned models, it lets LLMs interact with web pages using structured accessibility snapshots. That means faster, cleaner, and far more LLM-friendly interactions. structured accessibility snapshots In the world of agentic AI and agentic RAG workflows—where AI needs to read, click, and navigate like a human—this is a total game-changer! 🤯 agentic AI and agentic RAG workflows read click navigate And get this: even though Microsoft didn’t roll it out with much fanfare (no flashy keynote, no blog post blitz), this low-key library is already sitting at 12K+ ⭐ on GitHub… and climbing. 12K+ ⭐ on GitHub So why the hype? Because it just works, and below's what makes it special: ⚡ Blazing fast and lightweight: Uses the browser’s accessibility tree, not slow, clunky pixel-based input. 🧠 LLM-first design: No need for vision model. It’s built for structured text interfaces. 🛠️ Deterministic tool usage: No guesswork, no ambiguity—just clear, reliable actions via several tools. ⚡ Blazing fast and lightweight: Uses the browser’s accessibility tree, not slow, clunky pixel-based input. Blazing fast and lightweight 🧠 LLM-first design: No need for vision model. It’s built for structured text interfaces. LLM-first design 🛠️ Deterministic tool usage: No guesswork, no ambiguity—just clear, reliable actions via several tools. Deterministic tool usage Plus, it comes with a solid CLI and useful customization flags for fine-tuned browser control. This technology lets you build serious agents that don’t just hallucinate, but actually do stuff on the web. 💪 🌐 do stuff How to Use the Playwright MCP Server: Step-by-Step Tutorial Ready to put the Playwright MCP server into action? Follow the steps below to wire it up in your Python script using the OpenAI Agents SDK! Prerequisites To follow along with this tutorial, make sure you’ve got your dev setup ready to roll: Python 3.10+ installed locally 🐍 Node.js installed and working (the latest LTS version is your friend) ⚙️ An API key from a supported LLM provider (OpenAI Agents SDK needs it, and we recommend Gemini because it’s free to use) 🔑 Python 3.10+ installed locally 🐍 Python 3.10+ Node.js installed and working (the latest LTS version is your friend) ⚙️ Node.js An API key from a supported LLM provider (OpenAI Agents SDK needs it, and we recommend Gemini because it’s free to use) 🔑 An API key from a supported LLM provider Gemini Step #1: Project Setup & Configuration This Playwright + MCP integration builds on what we covered in MCP + OpenAI Agents SDK: How to Build a Powerful AI Agent. So if you haven’t checked that out yet, go skim it real quick. We’ll wait. 🕒 MCP + OpenAI Agents SDK: How to Build a Powerful AI Agent. MCP + OpenAI Agents SDK: How to Build a Powerful AI Agent MCP + OpenAI Agents SDK: How to Build a Powerful AI Agent But if you’re in a rush (we get it), here’s the TL;DR version to get you rolling: here’s the TL;DR version to get you rolling Create a project folder. Inside it, add a file named agent.py and set up a Python virtual environment. Install the required libraries with pip install openai-agents python-dotenv. Add a .env file to your project folder and, inside it, drop your Gemini API key like this: Create a project folder. Inside it, add a file named agent.py and set up a Python virtual environment. agent.py Install the required libraries with pip install openai-agents python-dotenv. pip install openai-agents python-dotenv Add a .env file to your project folder and, inside it, drop your Gemini API key like this: .env GEMINI_API_KEY=<your-gemini-api-key> GEMINI_API_KEY=<your-gemini-api-key> That’s it! You’re now locked, loaded, and ready to start building! 💥 Step #2: Playwright MCP Server Integration Get the Playwright MCP server up and running inside OpenAI Agents SDK with the following code: async with MCPServerStdio( name="Playwright MCP server", params={ "command": "npx", "args": ["-y", "@playwright/mcp@latest", "--output-dir", "./"], }, ) as server: # Create and initialize the AI agent with the running MCP server agent = await create_mcp_ai_agent(server) # Agent request-response cycle... ```python This chunk of code basically runs the following shell command under the hood: ```bash npx -y @playwright/mcp@latest --output-dir "./" async with MCPServerStdio( name="Playwright MCP server", params={ "command": "npx", "args": ["-y", "@playwright/mcp@latest", "--output-dir", "./"], }, ) as server: # Create and initialize the AI agent with the running MCP server agent = await create_mcp_ai_agent(server) # Agent request-response cycle... ```python This chunk of code basically runs the following shell command under the hood: ```bash npx -y @playwright/mcp@latest --output-dir "./" In plain English? It spins up the Playwright MCP server using Node.js. The --output-dir "./" bit tells it where to dump export files like screenshots and PDFs. --output-dir "./" ℹ️ Note: If your agent doesn’t need to export any files, you can skip the --output-dir flag entirely. That's optional! Note --output-dir Step #3: Complete Code Here’s what your agent.py file should look like once everything’s wired up and humming. This is your full Open AI Agents SDK-built, Gemini-powered, Playwright-integrated AI agent ready to take action through MCP: agent.py import asyncio from dotenv import load_dotenv import os from agents import ( Runner, Agent, OpenAIChatCompletionsModel, set_default_openai_client, set_tracing_disabled ) from openai import AsyncOpenAI from agents.mcp import MCPServerStdio # Load environment variables from the .env file load_dotenv() # Read the required secrets envs from environment variables GEMINI_API_KEY = os.getenv("GEMINI_API_KEY") async def create_mcp_ai_agent(mcp_server): # Initialize Gemini client using its OpenAI-compatible interface gemini_client = AsyncOpenAI( api_key=GEMINI_API_KEY, base_url="https://generativelanguage.googleapis.com/v1beta/openai/" ) # Set the default OpenAI client to Gemini set_default_openai_client(gemini_client) # Disable tracing to avoid tracing errors being logged in the terminal set_tracing_disabled(True) # Create an agent configured to use the MCP server and Gemini model agent = Agent( name="Assistant", instructions="You are a helpful assistant", model=OpenAIChatCompletionsModel( model="gemini-2.0-flash", openai_client=gemini_client, ), mcp_servers=[mcp_server] ) return agent async def run(): # Start the Playwright MCP server via npx async with MCPServerStdio( name="Playwright MCP server", params={ "command": "npx", "args": ["-y", "@playwright/mcp@latest", "--output-dir", "./"], }, ) as server: # Create and initialize the AI agent with the running MCP server agent = await create_mcp_ai_agent(server) # Main REPL loop to process user requests while True: # Read the user's request request = input("Your request -> ") # Exit condition if request.lower() == "exit": print("Exiting the agent...") break # Run the request through the agent output = await Runner.run(agent, input=request) # Print the result to the user print(f"Output -> \n{output.final_output}\n\n") if __name__ == "__main__": asyncio.run(run()) import asyncio from dotenv import load_dotenv import os from agents import ( Runner, Agent, OpenAIChatCompletionsModel, set_default_openai_client, set_tracing_disabled ) from openai import AsyncOpenAI from agents.mcp import MCPServerStdio # Load environment variables from the .env file load_dotenv() # Read the required secrets envs from environment variables GEMINI_API_KEY = os.getenv("GEMINI_API_KEY") async def create_mcp_ai_agent(mcp_server): # Initialize Gemini client using its OpenAI-compatible interface gemini_client = AsyncOpenAI( api_key=GEMINI_API_KEY, base_url="https://generativelanguage.googleapis.com/v1beta/openai/" ) # Set the default OpenAI client to Gemini set_default_openai_client(gemini_client) # Disable tracing to avoid tracing errors being logged in the terminal set_tracing_disabled(True) # Create an agent configured to use the MCP server and Gemini model agent = Agent( name="Assistant", instructions="You are a helpful assistant", model=OpenAIChatCompletionsModel( model="gemini-2.0-flash", openai_client=gemini_client, ), mcp_servers=[mcp_server] ) return agent async def run(): # Start the Playwright MCP server via npx async with MCPServerStdio( name="Playwright MCP server", params={ "command": "npx", "args": ["-y", "@playwright/mcp@latest", "--output-dir", "./"], }, ) as server: # Create and initialize the AI agent with the running MCP server agent = await create_mcp_ai_agent(server) # Main REPL loop to process user requests while True: # Read the user's request request = input("Your request -> ") # Exit condition if request.lower() == "exit": print("Exiting the agent...") break # Run the request through the agent output = await Runner.run(agent, input=request) # Print the result to the user print(f"Output -> \n{output.final_output}\n\n") if __name__ == "__main__": asyncio.run(run()) 🚨 Boom. Just like that, you’ve built a fully functional AI agent in ~75 lines of Python. It can think through Gemini and act through Playwright. 🧠 🛠️ think through Gemini and act through Playwright Step #4: Test Your Playwright-Powered AI Agent It's time to test your agent by running: python agent.py python agent.py 👀 That’s what you’ll see on startup: Now hit it with a prompt like this: Visit hackernoon.com, click on "Trending Stories", wait for the page to fully load, and then export it as a PDF file named "hackernoon-trending-stories.pdf" Visit hackernoon.com, click on "Trending Stories", wait for the page to fully load, and then export it as a PDF file named "hackernoon-trending-stories.pdf" 🎯 The result? Yup, it nailed it! 🎉 it nailed it! But that was fast, so let’s break down what happened: The agent launches a Playwright-powered Chrome instance. 🧭 It visits hackernoon.com. 🌐 Clicks on the “Trending Stories” link. 🖱️ Waits for the page to fully load. ⏳ Exports the page to a PDF file named hackernoon-trending-stories.pdf, which appeared in the project folder. 📄 The agent launches a Playwright-powered Chrome instance. 🧭 It visits hackernoon.com. 🌐 hackernoon.com Clicks on the “Trending Stories” link. 🖱️ Waits for the page to fully load. ⏳ Exports the page to a PDF file named hackernoon-trending-stories.pdf, which appeared in the project folder. 📄 hackernoon-trending-stories.pdf Note that the browser stays open in the background on the final page. That’s perfect if you want to send another prompt and keep the current session alive! stays open Don’t like that and want to end the session and close the browser? Just type something like: Now, close the browser. Now, close the browser. The agent will shut down the browser for you. ✅ Now, go check the generated PDF file hackernoon-trending-stories.pdf in your project folder. Scroll through it, and you’ll see: hackernoon-trending-stories.pdf Fantastic! A clean, full-page export of Hackernoon’s trending page, links and all, in crisp PDF format. That’s real browser automation, powered by your own AI agent. clean, full-page export of Hackernoon’s trending page And that, folks, is how you build an AI agent with Playwright + MCP. This thing is seriously powerful. Let it cook. 👨🍳 how you build an AI agent with Playwright + MCP The Hidden Limitations of Playwright MCP (And How to Outsmart Them) Cool! You might be thinking: "Alright, I've got the tools… an LLM that can think, an MCP server that can scrape, and Playwright to control the browser. Game over, right? Time to automate the entire internet!” "Alright, I've got the tools… an LLM that can think, an MCP server that can scrape, and Playwright to control the browser. Game over, right? Time to automate the entire internet!” Well… not so fast. 🛑 Sure, there are only two core things AI agents need to thrive: two core things AI agents need to thrive Access to real-time web data (✅ handled by scraping-capable MCP integrations) The ability to interact with a browser (✅ enter Playwright MCP) Access to real-time web data (✅ handled by scraping-capable MCP integrations) scraping-capable MCP integrations The ability to interact with a browser (✅ enter Playwright MCP) But here’s where things get interesting… Try asking your Playwright-powered AI agent something like: Visit g2.com, wait for the page to load, and take a screenshot Visit g2.com, wait for the page to load, and take a screenshot And what happens? This! 👇 It navigates to G2. ✅ Waits. ✅ Takes a screenshot. ✅ But there’s a catch: the page is blocked behind a verification wall. 😬 the page is blocked behind a verification wall. And the agent is even polite enough to tell you: Your request -> Visit g2.com, wait for the page to load, and take a screenshot Output -> Okay, I navigated to g2.com, waited for 5 seconds, took a snapshot of the page, and then took a screenshot. However, the page is showing a verification requirement. I am unable to complete the request. Your request -> Visit g2.com, wait for the page to load, and take a screenshot Output -> Okay, I navigated to g2.com, waited for 5 seconds, took a snapshot of the page, and then took a screenshot. However, the page is showing a verification requirement. I am unable to complete the request. Nope, it’s not “game over” for us as human beings. But maybe... it's game over for the default Playwright browser setup. 😵 💻 game over default Playwright browser setup 🧩 So what went wrong? Here’s the issue: Vanilla Chrome! 🙅♂️ 🧩 So what went wrong? Vanilla Chrome! The browser Playwright controls out of the box isn’t designed to avoid blocks. It leaks signals all over the place, such as automation flags, weird configs, and so on…. To the point that most websites—jsut like G2 —instantly know it’s a bot. 🤖 🚫 To the point that most websites—jsut like G2 —instantly know it’s a bot So what’s the solution? 👉 A Playwright-compatible infrastructure built to support agentic workflows on remote browsers that don’t get blocked. We’re talking: Playwright-compatible infrastructure remote browsers that don’t get blocked don’t Infinitely scalable ⚙️ Works headless or headful 👁️ Powered by a proxy network with 100M+ residential IPs 🌍 Designed to blend in like a real user 🕵️♂️ Infinitely scalable ⚙️ Works headless or headful 👁️ Powered by a proxy network with 100M+ residential IPs 🌍 Designed to blend in like a real user 🕵️♂️ Wondering what this magical tool is? Say hello to Bright Data’s Agent Browser—your stealthy, scalable, AI-friendly browser stack: Bright Data’s Agent Browser Bright Data’s Agent Browser https://www.youtube.com/watch?v=T59GCkpk5zY&embedable=true https://www.youtube.com/watch?v=T59GCkpk5zY&embedable=true If you're wondering how to plug it into your Playwright MCP setup, it's as simple as passing a CDP endpoint: async with MCPServerStdio( name="Playwright MCP server", params={ "command": "npx", "args": [ "-y", "@playwright/mcp@latest", # other arguments... "--cdp-endpoint", "<YOUR_BRIGHT_DATA_AGENT_BROWSER_CDP_URL>" ], }, ) as server: # Your agent logic here... async with MCPServerStdio( name="Playwright MCP server", params={ "command": "npx", "args": [ "-y", "@playwright/mcp@latest", # other arguments... "--cdp-endpoint", "<YOUR_BRIGHT_DATA_AGENT_BROWSER_CDP_URL>" ], }, ) as server: # Your agent logic here... And just like that, no more blocks! No more bot walls. No more sad Gru memes. just like that, Ready to build agentic workflows that actually work on the live web? Bright Data’s Agent Browser has your back! actually Bright Data’s Agent Browser has your back! See it in action in another integration here: https://www.youtube.com/watch?v=JG_INdktGsk&embedable=true https://www.youtube.com/watch?v=JG_INdktGsk&embedable=true Final Thoughts Now you know how to supercharge any AI agent built with the OpenAI Agents SDK—whether it’s running on GPT, Gemini, or whatever’s coming next—by plugging it into the Playwright MCP server for real browser control. We also showed how to level up even further by overcoming browser blocks using Bright Data’s Agent Browser, just one piece of the powerful AI infrastructure we've built to support real-world AI workflows at scale. Agent Browser we've built to support real-world AI workflows at scale At Bright Data, our mission is simple: make AI accessible for everyone, everywhere. So until next time—stay curious, stay bold, and keep building the future of AI. ✨