major labs
Essay 06Provenance10 min read

Inside the provenance layer

By Charlie Major · 2026-06-23

August 2, 2026 is the date that reprices provenance. That is when Article 50 of the EU AI Act becomes enforceable. Operators serving the European market have to label AI-generated content, disclose when users interact with AI systems, and produce evidence on request. The clock is ticking and most of the per-CMS implementation work that operators need has not been done.

Provenance is the only layer in the agentic stack where regulation, not engineering, sets the timeline. That changes how we approach it. This essay walks the layer, names what is shipping and what is missing, defines what an audit-ready synthetic media disclosure receipt actually looks like, and explains why Major Labs is publishing about provenance in 2026 but not yet shipping into it.


What August 2 actually changes

The EU AI Act has been law since 2024. Different obligations come online on different dates. The August 2, 2026 enforcement window covers Article 50 and the related synthetic-media provisions. Three obligations matter for operators.

The first is the labeling obligation. AI-generated or AI-manipulated content must be marked as such. The marking can be technical (embedded in the content itself) or visible (a label rendered to the user). The spec is permissive about how. The spec is firm about whether.

The second is the disclosure obligation. When a user interacts with an AI system that could reasonably be mistaken for a human, the system has to disclose it. Chatbots, voice agents, automated content moderators — all in scope. Disclosure has to be clear and not buried in terms of service.

The third is the audit obligation. National authorities and the EU AI Office can request evidence that the labeling and disclosure obligations were met for any specific piece of content or interaction. The operator has to produce that evidence on demand. The default retention period in the draft enforcement guidance is two years for high-risk content categories.

Three things will start happening on August 3, 2026.

First, a wave of operators discovers that their current setup does not produce the evidence the regulator will ask for. Most operators today have content credentials in Photoshop output and nothing else. The blog post their marketing team wrote with an LLM has no provenance signal at all.

Second, the larger consumer platforms either announce compliance or get visited by regulators. Meta, Google, TikTok, and X all sit in the line of fire. Their answers will set the rough shape of what enforcement looks like in practice.

Third, the long-tail operator who runs a blog, an e-commerce site, or a SaaS product realizes that compliance is not optional and the tooling is not ready. That is where most of the operational pain lands.


What is shipping in provenance today

The category is more mature for images than for text, which is the wrong way around for what regulators care about most.

For images, the Content Authenticity Initiative's C2PA specification is at v2.4. The spec is genuinely good. It defines a manifest format that travels with the image file, records the chain of custody from capture to publication, and supports cryptographic signatures. Adobe embeds content credentials in Photoshop and Premiere output. Leica ships C2PA-signed images from some camera bodies. Microsoft signs content for Bing AI and Copilot output. Google's SynthID watermarks AI-generated images at the pixel level for some Gemini outputs.

Independent verification services exist. Truepic offers enterprise-grade content authentication. Numbers Protocol runs a public registry of signed content. Both are operational businesses, not research projects.

For video, the C2PA spec extends, but adoption is shallower. Most platforms strip content credentials when re-encoding for delivery, which makes the original manifest unreachable by the time the viewer sees the content. The technical solution exists; the platform behaviors do not.

For text, the category barely exists. A paragraph of LLM-generated copy in a WordPress post has no manifest, no signature, no embedded provenance. The metadata fields exist in some CMS schemas; almost no publishing pipeline writes them. OpenAI, Anthropic, and Google all produce text without standardized provenance metadata at the API level. The operator who wants to record that a blog post was AI-drafted has to roll their own log entry.

For audio, including voice cloning, provenance is essentially absent. The technical capability to watermark synthetic audio exists in some commercial products. The industry-wide standard does not.


What is missing

Five concrete gaps, in rough order of how acutely operators feel them as August 2 approaches.

Per-CMS C2PA implementations. WordPress has some coverage through plugins. Adoption is patchy. Shopify has almost none. Webflow has none. Custom Rails, Django, Next.js, and Astro sites have none. Every operator running on those stacks has to write the implementation themselves or stay non-compliant. The gap exists because the C2PA tooling has been built for media-production workflows (Adobe, Leica) rather than publishing workflows.

Provenance for AI-generated text at the API level. When an operator calls Anthropic or OpenAI to generate a paragraph, the response does not include a structured provenance signal that the operator can persist alongside the text. The operator has to construct the metadata themselves: which model generated the text, when, with what system prompt, what mandate authorized the request. Doing that consistently across thousands of generations per day, across multiple vendors, is the kind of work that operators do not budget for until the regulator arrives.

Synthetic media disclosure receipts. The C2PA manifest is a container. It does not specify what the regulator-facing receipt looks like. When a national authority asks the operator "produce the disclosure record for this piece of content," the operator has to assemble: the original AI generation log, the labeling decision and rationale, the user-facing disclosure rendered at the time of view, the retention proof. Today, every operator assembles that pack differently. There is no standard format.

Cross-platform verification at scale. A video uploaded to YouTube, downloaded, re-uploaded to TikTok, screenshot into a tweet, embedded in a blog post. The original C2PA manifest travels through none of those transitions reliably. The forensic capability to verify provenance after multi-platform travel exists for a small number of expensive vendors and nobody else.

Audit-grade trail for AI-edited content. AI rarely generates content from scratch in production publishing. It edits human drafts, summarizes existing content, expands outlines. The audit trail for partial AI contribution is a different problem than the audit trail for fully synthetic content. The regulators have not yet specified how to handle partial contribution, but they will. Operators who can produce a granular per-paragraph attribution have a defensible compliance posture. Operators who cannot do not.


What an audit-ready synthetic media disclosure receipt looks like

The Major Labs draft format ships as part of the State of Agent Commerce Q4 2026 report, but the structure is worth stating now because operators need to start collecting the data even before the format is canonical.

A disclosure receipt is a JSON document with five required sections.

Content identity. The cryptographic hash of the content as published, the URL it was published at, the timestamp, and the canonical content type. This lets a regulator point at a specific piece of content and the operator point at the receipt that covers it.

Generation provenance. The model or models that generated the content, the version of each, the timestamp of each generation, the prompt context (full prompt or, where the operator chooses, a hash of the prompt for privacy), the system prompt or mandate context, the operator's identifier. If the content was AI-edited rather than AI-generated, the receipt distinguishes the human-authored portion from the AI portion at the paragraph or block level.

Labeling decision. Whether the content was labeled as AI-generated, what label was used, where it was rendered, and the operator's reasoning for the labeling decision. If the content was not labeled, the receipt records why the operator concluded labeling was not required.

Disclosure rendering. Evidence that the disclosure was actually shown to the user at the time of interaction. For static content, this is a description of where the label sits in the page layout. For interactive content (chatbots, voice agents), this is the time-stamped record of when the disclosure was rendered relative to the user's first interaction.

Signatures and retention. The operator signs the receipt with a key tied to their identity. The receipt is stored for the operator's declared retention period (minimum two years for high-risk content under the draft enforcement guidance). Any subsequent edits to the content produce a new receipt that chains to the previous one.

That structure covers the regulator's ask, scales to high publishing volume, and survives platform travel as long as the operator publishes the receipt URL alongside the content. The format is open and any operator, CMS, or verification vendor can adopt it.


Why Major Labs is not shipping into provenance yet

Five layers in the agentic stack, four of them Major Labs is building products into. Provenance is the one we are explicitly not shipping a product into in 2026.

The category waits for regulatory specificity. Shipping the wrong primitive locks operators in.

The August 2 enforcement date is real, but the enforcement guidance is still being clarified. The EU AI Office published draft technical guidance in May 2026 and the final version is expected in Q3. National authorities will publish their own implementation notes in Q4. Until that body of guidance settles, the disclosure receipt format and the labeling defaults are unstable. Shipping the wrong format now locks operators into a structure they have to migrate off when the final guidance lands. The shipping cost is high and the obsolescence risk is real.

What we ship instead is research. The State of Agent Commerce Q4 2026 report includes the draft disclosure receipt format, the per-CMS implementation gap analysis, the cross-platform verification work, and the regulatory landscape map. The report becomes the reference document that operators and CMS vendors use to plan their 2027 implementations.

The category opens for a Major Labs product in 2027, probably in Q2 once national authorities have published implementation guidance. The product will likely be either a per-CMS C2PA plugin family or a hosted disclosure receipt service that any operator can integrate. We will pick which one based on what the data tells us in the next six months.

In the meantime, operators preparing for August 2 should start collecting the receipt fields today, even in a custom format. The Major Labs draft format is published openly. Adopting it now means a clean migration to whatever the final standard settles on later.


What ships

Nothing in 2026.

Research, yes. The State of Agent Commerce Q4 2026 report covers provenance as a major section. The draft receipt format is open. The category landscape is mapped.

Product, no. Major Labs builds in the discovery, commerce, observability, and identity layers in 2026 and Q1 2027. Provenance ships in 2027 once the regulatory ground has stopped moving. That is the discipline of building when the category settles, not when the conference talk gets the most applause.

The next essay goes inside the identity layer, the last of the five. The DID-FIDO-EUDI cross-walk, why Major Labs Identity ships last, and why the brand has to mature before operators trust us to broker their agent identity.

See you Tuesday.

— Charlie

Charlie Major writes Major Matters and joined Mastercard in April 2026. Major Labs is independent of Mastercard and operates separately from Major Matters. Any opinions in these essays are Charlie's own.

Coming next
Essay 07 · Inside the identity layer

The DID-FIDO-EUDI cross-walk, why Major Labs Identity ships last, and why the brand has to mature before operators trust us to broker their agent identity.

Get every essay

Two essays a week. Quarterly State of reports drop here first. No marketing, no fluff.