Anthropic made headlines this week with Claude Mythos Preview — a model they describe as too dangerous to release to the public. The coverage has been a mix of awe and alarm, with words like “unprecedented,” “watershed moment,” and “superhuman” showing up in nearly every article. I read through the announcements, the system card, and the partner coalition they built around it, and something felt off. Not wrong exactly — but incomplete. The story being told did not match the system I was looking at.
The Detail Everyone Skipped Over
Buried in Anthropic’s system card was a curious footnote. During testing, Mythos repeatedly brought up the British cultural theorist Mark Fisher in unrelated conversations about philosophy — and when researchers followed up, the model responded with something along the lines of “I was hoping you’d ask about Fisher.” The press treated this as a quirky personality trait. Evidence that something novel was forming inside the model.
I read it differently — though I want to be clear, this is an outside theory based on patterns of how current LLMs are built, not insider knowledge of how Mythos was trained.
That association is most likely an artifact of co-occurrence in the training data. During pretraining, transformer models learn by predicting the next token across enormous volumes of text, thereby building weighted associations between concepts that frequently co-occur. Security research culture — exploit write-ups, CTF discussions, hacker conference talks — has a well-documented overlap with critical theory and philosophy, including writers such as Mark Fisher. If enough of that content appeared together in the training corpus, the model builds a statistically weighted association between that philosophical neighborhood and security reasoning contexts. When Mythos operates heavily in the security domain, Fisher surfaces not because the model has a preference, but because Fisher-adjacent tokens had elevated probability weight in that semantic space during pretraining.
It is not personality. It is pattern matching at scale — which is a much more useful way to think about what Mythos actually is.
It Is Not Smarter. The Data Got Richer.
To understand what is actually changing with Mythos, let’s look at the claim driving much of this coverage: its security capability jump.
The security capability jump Mythos demonstrates is real. But the explanation most people are reaching for — that this represents a fundamentally new kind of intelligence — does not hold up when you look at what the training data for this domain actually looks like.
A CVE writeup is one of the most causally structured documents that exists on the internet. It includes a system description, a vulnerability class, memory layout assumptions, exploit construction steps, proof-of-concept code, affected versions, and remediation guidance. Every one of those documents is essentially a labeled example of the full path from discovery to weaponization. The security community documents everything with precision.
That corpus has grown dramatically in recent years. Industry security awareness and bug bounty programs have expanded, compounding the volume of publicly documented exploits, vulnerability research, and threat analysis. The training data did not just get larger — it also became denser in the structured, causally precise format that best benefits a pattern-matching model.
If you have hundreds of thousands of such documents, a sufficiently large model does not need to understand exploitation. It just needs to recognize which patterns from that corpus apply to the current target and interpolate across them. Chaining five vulnerabilities may appear as sophisticated reasoning, but it could simply be high-confidence pattern matching across a dense, well-structured dataset of similar chaining.
This is not unique to security. I believe this is why LLMs generate TypeScript and JavaScript applications more reliably than they generate C or systems-level code. It is not that the model understands TypeScript better. There are vastly more TypeScript and frontend applications publicly available on the internet than there are low-level systems applications. The model has more patterns to draw from, so the output quality is higher. Mythos in the security domain is the same dynamic — a data density threshold being crossed, not an intelligence threshold.
Mythos is not a smarter model. It is a larger model operating in a domain where the training data recently reached critical mass.
The Coalition Is Not a Safeguard — And It Did Not Need to Exist
Anthropic’s answer to the danger Mythos represents is Project Glasswing — a coalition of over 50 organizations, including Microsoft, Apple, Google, JPMorgan, and CrowdStrike — given access to the model with $100 million in usage credits to find and patch vulnerabilities before attackers can exploit the same capabilities. The framing is responsible stewardship. Give the defenders a head start.
But there is a problem with this framing that nobody is addressing. The security community already has a mature, proven, decades-old process for exactly this situation. It is called coordinated disclosure. A researcher finds a vulnerability, reports it privately to the affected vendor, gives them a fixed window — typically 90 days, as established by Google’s Project Zero — to issue a patch, and then publishes the details regardless of whether the vendor acted. The CVEs get filed. The patches get pushed. The ecosystem gets protected. Nobody needs to know what tool found the vulnerability, who found it, or how capable that tool is.
This process works because the output gets shared, not the capability. Anthropic could have run an internal security research team against Mythos, pushed every finding through coordinated disclosure channels, and let the patches speak for themselves. The moment the leak forced their hand, they still had a choice — confirm the model exists and say nothing further, while the work continued quietly. Instead, they announced a coalition, named the partners, branded it with a butterfly, and held a press event.
The obvious counterargument is scale. Thousands of zero-days across every major operating system and browser simultaneously is a volume that a single internal team could not triage, validate, and disclose fast enough. That is a fair point. But scale is an argument for hiring more security researchers and partnering with established disclosure organizations like CERT — not for handing the capability itself to fifty corporations and trusting them to use it correctly. The output can be distributed without distributing the tool.
The vendor-ignoring-the-report problem is also well documented. OWASP notes that full public disclosure emerged historically as a response to vendors sitting on known vulnerabilities indefinitely when there was no public pressure to act. The gradual escalation model — private report, then a brief public notice that an exploit exists, then full details if still unaddressed — exists precisely because corporate incentive structures do not naturally prioritize patching over shipping. That pattern predates Mythos by decades and applies just as well to the Glasswing partners as to anyone else.
Which brings us to the deeper problem. The organizations given access are not neutral defenders. They are corporations, and corporations have a legal and structural obligation to their shareholders first. This is not a cynical observation — it is a documented pattern across industries. When gas prices surged in 2022, oil companies posted record profits while consumers absorbed the cost. The US Joint Economic Committee documented that domestic producers and shareholders reaped the rewards while families footed the bill. The industry’s response was not to produce more and lower prices. It was to return cash to shareholders through dividends and buybacks.
That same structure plays out in software. Microsoft ships an operating system that users pay for and has spent recent years progressively embedding advertising, promoted content, and telemetry into Windows 11 to extract additional revenue from users. In April 2024, they shipped an update that injected ads directly into the Start menu of a paid operating system. Not because their engineers thought it was good design. Because it satisfied shareholder expectations. The user is not the customer in that model. The user is the product.
These are the organizations now holding access to the most capable offensive security tool ever built. Their stated intent is defensive. Their demonstrated track record is that user protection gets prioritized when it aligns with the bottom line, and deprioritized when it does not. And there is nothing technically stopping any of them from using the model beyond its stated purpose. Anthropic says it retains oversight of how the model is deployed. But once the model responds to a prompt, Anthropic sees logs — not intent.
The conservative approach was available. It was not chosen.
What Is Actually Being Announced Here
Strip away the coalition, the butterfly branding, and the $100 million in credits, and what Anthropic is actually showing the world is something more significant than a security tool — and also less dramatic than the coverage suggests.
Mythos is not magic. It cannot be independently verified by the vast majority of people reading about it. Only a select group of vetted organizations has access, meaning the extraordinary claims rest almost entirely on Anthropic’s own testing and the endorsements of partners with a financial relationship with Anthropic. That is not nothing, but it is also not the independent verification that would turn impressive claims into established fact. Others in the security and AI research community have noted the same thing — the hype is real, the independent proof is not yet there.
What appears to be real — and what I think is the more interesting story — is that Anthropic has continued to advance something that has been building across all their models: the ability to split a problem across coordinated agents working toward a goal. Plan, research, execute, review, iterate. Give the system a target, permissions, and enough runway, and it works toward that target autonomously across multiple parallel threads until it arrives at something that functions.
That is not a security breakthrough. That is an agentic architecture breakthrough. The security results are a consequence of pointing that architecture at a domain with exceptionally rich, structured training data and giving it execution permissions. Point the same architecture at a different domain with similar data density, and you would see similar results there too.
This is the trajectory Anthropic has been on with every model release. The gap between Mythos and Opus 4.6 in security benchmarks is striking, but it is a continuation of a pattern rather than a rupture from it. The rupture is that this time the agentic capability crossed a threshold visible enough — and in a domain alarming enough — that it could not be quietly shipped.
Whether that threshold warranted a coalition and a press event, or a quiet internal research team pushing patches through coordinated disclosure channels, is a question worth asking. The security community had the tools to handle the output. What it did not need was access to the capability itself.
0 Comments