Open data · Security

The security surface, measured

Name: The security surface of the MCP ecosystem
Creator: Major Labs
License: https://github.com/major-matters/mcp-scanner/blob/main/LICENSE

A weekly static read of the source code behind the most-used public MCP servers, looking for the patterns that matter when an AI agent is on the other end: command injection, SSRF surface, code execution, unsafe deserialization, committed secrets. Population statistics, updated as the sweep reruns. A companion to the population data.

The one hard rule

Static analysis of public source code only. We never connect to, run, install, probe, or exploit a running MCP server. The code is open; the signal is in the code. A finding is a pattern visible in the source, never a confirmed vulnerability, and a "High surface" score means the code does risky things in risky ways and deserves a closer look, never that a server is compromised.

2,001

servers swept (most-used, active)

35.4%

have at least one risky pattern

High surface (deserve a closer look)

2026-07-27

latest sweep

Risk-surface tiers

Each swept server gets a transparent score from the categories present in its source, capped at 100. Tiers are score bands, not verdicts.

High surface

653

Elevated surface

1,293

Low surface

By pattern (share of swept servers)

ssrf surface		32%
code execution		3.2%
command injection		2.2%
unsafe deserialization		0.5%
hardcoded secret		0.4%

Heuristics are tuned for precision over recall, so every number here is a lower bound.

The supply chain

The risk is not only in a server's own code. We resolved 33,256 declared dependencies across 1,377 servers and checked every one against OSV.dev. Runtime dependencies only; dev and test tooling excluded.

75.2%

ship a vulnerable runtime dependency

380

distinct packages with a known advisory

6,843

unique packages depended on

33,256

dependency edges resolved

Blast radius — packages the most servers depend on

@modelcontextprotocol/sdk	npm	528 servers
zod	npm	404 servers
mcp	PyPI	314 servers
pytest	PyPI	198 servers
pydantic	PyPI	188 servers
dotenv	npm	168 servers
python-dotenv	PyPI	158 servers
httpx	PyPI	152 servers
fastmcp	PyPI	146 servers
pytest-asyncio	PyPI	139 servers

Exposure means a server declares a dependency whose pinned version carries a published advisory; whether the installed version is patched depends on a lockfile public repos often do not commit, so read it as a hygiene signal, not a confirmed exploit. The widest reach belongs to the official @modelcontextprotocol/sdk, which carries advisories of its own.

What the breakdowns show

The headline rate is steady across the population, so we cut it three ways to see what moves it. The first answer is the one people don't expect: being popular does not make a server safer. The most-used servers carry a risky pattern at the same rate as obscure ones. Stars measure adoption, not review.

By popularity (GitHub stars)

	servers	any	SSRF surface
<50 stars	678	37.5%	32.4%
50-499 stars	962	33.4%	31.2%
500+ stars	361	36.8%	33.2%

By language — Python and JS/TS only (see note)

	servers	any	SSRF surface
JavaScript	173	56.1%	52%
TypeScript	654	44.3%	42.7%
Python	574	35.5%	28.9%

By transport

	servers	any	SSRF surface
local (stdio)	245	39.6%	35.5%
network (HTTP)	863	36.7%	33.6%

Language. Within the languages the checks actually cover, the JavaScript and TypeScript ecosystem — most of MCP — carries a markedly higher SSRF surface than Python, because building a request from a variable (fetch(url)) is idiomatic there. We restrict this cut to Python and JS/TS on purpose: the heuristics are Python and JS specific, so ranking Go or Rust against them would measure our checks, not their code. We don't publish a number we'd have to caveat into meaninglessness.

Transport. Counter to intuition, the local (stdio) servers flag more than the network-facing ones. The servers running on your machine, with filesystem and shell reach, are the dirtier set.

The series

Share of swept servers with at least one risky pattern, captured per sweep. The question this series answers over time: is the ecosystem getting safer as it professionalizes, or just bigger?

Do high-risk servers fix themselves?

We froze the highest-risk cohort and re-scanned it. Before any maintainer was contacted, 43 of 47 were unchanged — 1 fixed, 1 reduced, 2 got worse. Only 4.3% moved on their own. This is the pre-outreach baseline the disclosure programme is measured against.

unchanged

fixed

reduced

got worse

What the sweep looks for

command injection	subprocess with shell=True, os.system, child_process.exec with interpolated arguments. An agent tool that shells out with model-controlled input is a direct remote-code-execution path.
SSRF surface	outbound requests/fetch/axios calls to a URL built from a variable, with no allow-list in sight. The classic MCP risk: an agent fetches an attacker-chosen internal URL, like a cloud metadata endpoint.
code execution	eval, exec, new Function(), vm.runInNewContext on non-literal input. Arbitrary code from tool arguments.
unsafe deserialization	pickle.loads, yaml.load without SafeLoader. Deserializing untrusted input is code execution wearing a different hat.
hardcoded secrets	live-looking API key patterns committed to the source. Keys in public repos get harvested in minutes.

Weights are per category present, not per hit, so one noisy file cannot inflate a score. Comment lines are skipped. Findings carry file, line, and snippet internally so a false positive is obvious and cheap to dismiss.

Check your own server

The exact checks behind this scoreboard run as a GitHub Action. Drop it into your MCP server's CI and you get a security-surface score on every push, plus a README badge. Read-only, same hard rule: it never connects to or probes anything.

- uses: major-matters/mcp-surfacecheck@v1

mcp-surfacecheck on GitHub →

Embed the live number

Writing about MCP security? Embed the current sweep figure as a badge that updates with every weekly re-sweep, so your piece never carries a stale number.

[![MCP security sweep](https://img.shields.io/endpoint?url=https://majorlabs.co/data/badge-security.json)](https://majorlabs.co/security)

[![MCP identity gap](https://img.shields.io/endpoint?url=https://majorlabs.co/data/badge-identity.json)](https://majorlabs.co/identity)

Why no names

Per-repo findings stay in the database and go to maintainers through coordinated disclosure, not into a public feed. Publishing a ranked list of risky servers would be a target list, and a static heuristic does not earn the right to put a name on it. What we publish is the population: how common each pattern is, how the tiers are distributed, and whether the trend is improving.

After the first disclosure cycle completes, one exception arrives: a fixed-since-last-sweep feed naming servers whose maintainers resolved findings. Good actors get named. Target lists do not get made.

Check our work

Scanner source and methodology — the sweep is ~200 lines of readable Python. Read the checks yourself.
The population dataset — what we swept, how it was discovered, and the weekly series.
The State of MCP report — the narrative read on the same data.
The identity gap — the companion sweep: who can call these servers, and is anyone checking?
mcp-surfacecheck — run the same checks on your own server in CI, and earn the badge.