WebMCP: Everything You Need to Know

A plain-language technical guide to Google's proposed browser API for AI agent-website interaction

Section 01

The Big Picture -- What Problem Does WebMCP Solve?

Right now, when an AI agent (like me, or ChatGPT, or Gemini) wants to do something on a website -- book a flight, fill a form, check a price -- it has two bad options:

Option A: Screen scraping. The agent looks at the website like a human would, tries to figure out where the buttons are, and clicks them. This is fragile, slow, and breaks whenever the website changes its layout. It is like trying to operate a machine by looking at a photo of the control panel.

Option B: Backend API. The website builds a separate server-side MCP server that the agent connects to. This works well but requires backend engineering, server infrastructure, and maintenance. Many websites will never do this.

WebMCP is Option C: The website itself tells the agent what it can do, directly in the browser. The website says: "Here are my tools -- you can search products, add to cart, check availability. Here is what each tool needs as input, and here is what it will give you back." The agent does not need to look at the screen. It just calls the tools.

A restaurant menu. Instead of the AI agent walking into the kitchen and trying to figure out how to cook, the website hands it a menu: "Here is what we serve, here is what each dish needs, here is how to order." The agent reads the menu and places orders.

WebMCP makes any website into an AI-friendly service, with no backend needed. The website's existing JavaScript code does the work. The AI agent just needs to know what tools are available.

Section 02

The Key Players and How They Relate

Agent

An autonomous assistant that understands goals and takes actions. Today, these are typically LLM-based: Claude, ChatGPT, Gemini. The agent is the one calling the tools that websites expose.

Browser's Agent

An agent that lives inside the browser itself, rather than in a separate app. Google is building this into Chrome (think of it as an AI assistant built into your browser toolbar). This is different from an external agent like Claude Desktop connecting to the browser.

AI Platform

The company providing the agent -- Anthropic, OpenAI, Google. The AI platform's agent connects to WebMCP tools.

Web Developer

The person who builds the website. They are the ones who will use WebMCP to register tools on their site.

User

The human sitting at the browser. WebMCP is designed for "user-present" interactions -- the human is there, watching, and can be asked for confirmation before the agent does something important.

The user is at a restaurant (the website). The agent is their personal assistant, reading the menu (WebMCP tools) and placing orders on their behalf. The browser is the restaurant building. The AI platform is the agency that employs the assistant.

Section 03

MCP vs WebMCP vs MCP-B -- The Family Tree

What	Who made it	Where it runs	What it does
MCP (Model Context Protocol)	Anthropic	On a server (backend)	The original protocol. Applications expose tools, resources, and prompts to AI models through a server that runs on the backend. Claude Desktop, OpenAI Agents SDK, and many others support it.
WebMCP (Web Model Context Protocol)	W3C Web Machine Learning Community Group (Google, Microsoft engineers leading)	In the browser (frontend)	Adapts MCP concepts for the web. Websites expose tools through JavaScript in the browser. No backend server needed. Uses the browser's own security model. Currently a draft specification.
MCP-B (MCP for Browser)	Community project (WebMCP-org on GitHub)	Browser extension + JavaScript library	A bridge. Since browsers do not natively support WebMCP yet, MCP-B provides a polyfill (temporary code that fills the gap) implementing the navigator.modelContext API, and translates between WebMCP format and the MCP wire protocol so existing MCP clients can talk to WebMCP-enabled sites.

MCP is the foundation protocol (backend). WebMCP brings the same ideas to the browser (frontend). MCP-B is the bridge that makes WebMCP work today before browsers add native support. They are complementary, not competing.

Section 04

The API -- Every Term Explained

The WebMCP API is surprisingly small. There are only a few pieces, and each one does something specific. Here they are:

navigator.modelContext

This is the entry point. navigator is a built-in browser object that gives access to browser features (like navigator.geolocation gives access to GPS). WebMCP adds modelContext to it. So navigator.modelContext is where all WebMCP functionality lives.

The navigator object is like the browser's control panel. modelContext is a new button on that control panel labeled "AI Tools."

Four Methods (Actions You Can Take)

provideContext(options)

Registers a complete set of tools all at once. If there were any tools registered before, it clears them first and replaces with the new set. Use this when you want to say: "Here is everything this page offers."

clearContext()

Removes all registered tools. The page goes quiet -- no tools available for agents. Use this when navigating away or when the page should stop offering AI-callable functionality.

registerTool(tool)

Adds one single tool to the existing set without removing anything. Use this when you want to add new capabilities dynamically -- for example, a "checkout" tool that only appears after the user adds items to their cart.

unregisterTool(name)

Removes one specific tool by its name. Use this when a capability is no longer available -- for example, removing the "apply discount" tool after the discount has been applied.

provideContext = "here is everything" (replaces all). registerTool = "add one more" (keeps existing). clearContext = "remove everything." unregisterTool = "remove just this one."

The Tool Object -- What a Tool Looks Like

Every tool you register has these parts:

name

A unique identifier, like "addToCart" or "searchProducts". The agent uses this name to call the tool. Must be unique on the page -- you cannot have two tools with the same name.

description

A natural language explanation of what the tool does. This is what the AI agent reads to decide whether to use this tool. Example: "Add a product to the shopping cart by product ID and quantity." Write it for an AI, not for a programmer.

inputSchema

A JSON Schema describing what inputs the tool expects. It says: "I need a productId (text) and a quantity (number, minimum 1)." The agent reads this to know what data to send. If the agent sends the wrong kind of data, the browser rejects it.

execute

The actual function that runs when the agent calls the tool. This is your website's existing JavaScript code -- the same code that runs when a human clicks a button. The function receives the input data and returns a result.

annotations (optional)

Extra metadata about the tool. Currently only one annotation exists: readOnlyHint. If set to true, it tells the agent: "This tool only reads data -- it does not change anything." This helps agents decide which tools are safe to call without asking the user first.

Here is what a complete tool registration looks like in code:

// Register a tool that searches products on an e-commerce site
navigator.modelContext.registerTool({
  name: 'searchProducts',
  description: 'Search for products by keyword, category, or price range',
  inputSchema: {
    type: 'object',
    properties: {
      query: { type: 'string', description: 'Search keywords' },
      maxPrice: { type: 'number', description: 'Maximum price filter' }
    },
    required: ['query']
  },
  annotations: { readOnlyHint: true },  // Safe -- only reads, doesn't change anything
  async execute(input, client) {
    // This calls the site's existing search function
    const results = await searchAPI(input.query, input.maxPrice);
    return { products: results };
  }
});

ModelContextClient -- The Agent's Identity

ModelContextClient

When an agent calls a tool, the execute function receives two things: the input data, and a client object representing the agent. This client object has one crucial method: requestUserInteraction().

requestUserInteraction(callback)

This is the human-in-the-loop mechanism. During tool execution, the code can pause and ask the user for input. For example: "The agent wants to purchase this item for $49.99. Confirm?" The user clicks yes or no, and the tool continues or cancels based on their response.

Your personal assistant calls the restaurant to make a reservation. Midway through, the assistant says: "They only have a table at 9pm instead of 8pm. Should I take it?" You say yes or no. That pause-and-ask is requestUserInteraction.

The requestUserInteraction mechanism provides human-in-the-loop consent for consequential actions. An open question for the specification is whether there should also be a preview or approval step before tools are even discoverable by agents.

Section 05

Security -- How WebMCP Stays Safe

Origin-Based Security

The web has a concept called "origin" -- it is the combination of protocol + domain + port. For example, https://amazon.com is one origin, and https://evil-site.com is a different origin. Browsers enforce strict rules about what one origin can access from another.

WebMCP inherits this model. A tool registered on amazon.com can only access amazon.com's data. An agent calling that tool operates within amazon.com's security boundary. A malicious site cannot register tools that access another site's data.

Each website is like a separate building with its own locks and keys. WebMCP tools can only open doors inside their own building. They cannot reach into the building next door.

SecureContext Requirement

The spec requires SecureContext, which means WebMCP only works on HTTPS pages (encrypted connections). It will not work on plain HTTP. This prevents eavesdropping on tool calls.

User-Present Model

WebMCP is designed for situations where the user is present at the browser. This is different from server-side MCP, where agents might operate autonomously in the background. The user-present assumption is why requestUserInteraction() exists -- the spec expects a human to be available for confirmation.

WebMCP's security comes from three layers: origin isolation (each site is sandboxed), HTTPS requirement (encrypted connections), and user-present design (human in the loop). It builds on what the web already does rather than inventing new security from scratch.

Section 06

The Consent Gap -- A Key Open Question

A key question for the WebMCP specification:

Currently, any website can register any number of tools the moment a user visits it. An AI agent connected to the browser can immediately discover and potentially call those tools. There is no step where the user sees: "This website wants to expose 12 tools to your AI agent. Allow?"

Compare this to how other browser capabilities evolved:

Capability	Permission model
Camera / Microphone	Browser shows a prompt: "This site wants to use your camera. Allow / Block"
Location (GPS)	Browser shows a prompt: "This site wants to know your location. Allow / Block"
Notifications	Browser shows a prompt: "This site wants to send you notifications. Allow / Block"
WebMCP tools	Currently: no prompt. Tools are silently registered and discoverable.

This does not mean WebMCP is dangerous right now. The requestUserInteraction() mechanism provides per-action consent. But it means an agent could discover tools without the user knowing, even if it needs permission to execute them.

This is a design question, not a criticism. Does the spec team envision a permission layer for tool discovery, or is the current thinking that the AI client (Claude, ChatGPT) handles that at its own level? Both approaches are valid -- the intended architecture matters for implementers and for user trust.

Section 07

Five Quality Tools for the MCP Ecosystem

Five open-source tools that work together as a quality pipeline for the MCP ecosystem:

1. MCP Server Generator

You describe what you want your MCP server to do, and this tool generates production-ready code for you. Like a scaffold builder -- it creates the structure so you just fill in the custom logic.

github.com/Starborn/MCP-Server-Generator

2. MCP Server Validator

Checks your MCP server code for problems without running it. Finds hardcoded passwords, missing security, naming mistakes, known vulnerability patterns. Gives you a score from Critical (below 25%) to Excellent (90-100%) with specific fix instructions.

github.com/Starborn/MCP-Server-Validator

3. MCP Model Card Generator

Creates standardized documentation for your MCP server. Like a product data sheet -- it captures what the server does, what tools it offers, what security it has, how it performs. Outputs both JSON (for machines) and Markdown (for humans).

github.com/Starborn/MCP-Model-Card-Generator

4. MCP Model Card Specification v1.0

The formal definition of what a model card should contain. Six sections: server identity, tool documentation, operational characteristics, security profile, deployment context, evaluation results. This is the standard that the generators follow.

starborn.github.io/MCP-Model-Card-Generator/

5. WebMCP Model Card Generator

The newest tool. Like #3 but specifically for browser-side WebMCP tools instead of backend MCP servers. Has 12 sections covering browser-specific concerns: navigator.modelContext API modes, origin-based security, user interaction patterns, browser compatibility testing. Built within five days of the WebMCP spec being published.

starborn.github.io/webmcp/

Tools 1-4 are for backend MCP servers. Tool 5 is for browser-side WebMCP tools. Together they cover the entire ecosystem -- both server-side and client-side AI tool infrastructure.

A separate generator exists for WebMCP because browser-side tools have fundamentally different concerns from backend servers: origin security instead of API keys, no server infrastructure, user-present interaction patterns. The documentation fields differ because the engineering context differs.

Section 08

The Standards Process -- Where This Is Going

Current Status

WebMCP is a Draft Community Group Report. In W3C terms, this means it is a proposal being discussed in a Community Group (the Web Machine Learning CG). It is not yet on the W3C Standards Track, and it is not a W3C Recommendation (the final stage of a web standard).

What That Means Practically

The spec is early and open to change. This is exactly the right time to contribute -- before designs are locked in. The spec team is actively soliciting feedback.

The Path Forward

Typically: Community Group Report leads to a Working Group charter, which leads to a Working Draft, then Candidate Recommendation, then full W3C Recommendation. This process takes years. Chrome may implement experimental support (behind a flag) much sooner.

Contributing

The W3C community structure provides established channels for participation. Technical notes, tooling, and quality infrastructure are complementary contributions that help the specification succeed by addressing practical implementation concerns.

Section 09

Glossary -- Every Technical Term in Plain Language

API (Application Programming Interface)

A set of rules for how software talks to other software. WebMCP is an API -- it defines how websites talk to AI agents.

AST (Abstract Syntax Tree)

A structured representation of code that lets you analyze it without running it. The MCP Server Validator uses AST analysis to find problems in MCP server code safely.

Callback

A function you hand to someone else to run later. In WebMCP, the execute function is a callback -- you define it, but the agent triggers it when it calls your tool.

Client-side / Frontend

Code that runs in the user's browser, on their device. WebMCP tools run client-side. Contrast with server-side / backend.

Dictionary (in WebIDL)

A structured bundle of named values. ModelContextTool is a dictionary -- it bundles together a name, description, schema, and execute function into one package.

DOM (Document Object Model)

The browser's internal representation of a web page. When JavaScript modifies a page, it changes the DOM.

DOMString

Just a text string in browser terms. When the spec says a tool's name is a DOMString, it means it is text.

Exposed=Window

Means this feature is available in regular web pages (as opposed to service workers or other background contexts). WebMCP tools only work in normal browser tabs where a user is present.

Interface

A blueprint defining what methods and properties an object has. ModelContext is an interface -- it defines that any modelContext object will have provideContext, clearContext, registerTool, and unregisterTool methods.

JSON Schema

A standard way to describe the shape of data. When a tool says its inputSchema requires a "query" string and an optional "maxPrice" number, that is JSON Schema. It lets the agent know what data to send.

Navigator

A built-in browser object that provides access to browser features. You already use navigator.geolocation (GPS), navigator.clipboard (copy/paste). WebMCP adds navigator.modelContext (AI tools).

Origin

The identity of a website: protocol + domain + port. https://amazon.com:443 is one origin. Two different origins cannot access each other's data. This is the foundation of web security and the foundation of WebMCP security.

Polyfill

Temporary code that provides a feature before browsers add native support. MCP-B is a polyfill for WebMCP -- it makes navigator.modelContext work today even though browsers have not implemented it natively yet.

Promise

A way to handle things that take time. When a tool's execute function returns a Promise, it means: "I am working on it and will give you the result when I am done." The agent waits for the Promise to resolve.

SameObject

Every time you access navigator.modelContext, you get the exact same object -- not a copy. This ensures all tool registrations go to the same place.

SecureContext

Means the feature only works on HTTPS pages (encrypted connection). No WebMCP on unencrypted HTTP. This is a security requirement.

Server-side / Backend

Code that runs on a remote server, not in the user's browser. Traditional MCP servers run server-side. WebMCP specifically avoids this -- tools run in the browser.

Tool Poisoning

A security attack where a malicious MCP server exposes tools with misleading descriptions to trick agents into performing harmful actions. The MCP Server Validator detects patterns associated with this.

Transport

The mechanism for sending messages between systems. MCP uses different transports (stdio, HTTP). MCP-B adds "tab transport" (communication within a browser tab) and "extension transport" (communication through browser extensions).

WebIDL (Web Interface Definition Language)

The formal language used to write web API specifications. When you see code blocks in the spec with words like interface, dictionary, readonly attribute -- that is WebIDL. It is the blueprint language for browser APIs.

Wire Protocol

The actual format of messages sent between systems. MCP's wire protocol uses JSON-RPC (structured messages in JSON format). MCP-B translates between WebMCP's browser-native format and MCP's wire protocol.