A few weeks ago, Anthropic gave a small group of partners — Amazon, Apple, Cisco, Microsoft, CrowdStrike — early access to an unreleased model called Mythos. The program is called Glasswing. The technical writeup is sparse, the marketing is sparser still, and the part that should be on every security leader's desk this week is this:
In limited testing, Mythos autonomously found and exploited a 17-year-old remote code execution flaw in FreeBSD that human red teams had crawled past for two decades. Nobody asked it to. Nobody trained it on vulnerability research. It looked at the codebase and figured it out.
The trade press has been treating this as a curiosity. It is not a curiosity. It is the most important development in enterprise security buying since the SOC became a category.
Why “emergent” is the word that should keep you awake
Mythos finding zero-days is not a trained behavior. Anthropic did not sit down and fine-tune a model on CVE databases and exploit code with the explicit goal of producing a vulnerability researcher. The capability is a side effect of the model being better at general reasoning over large codebases.
That distinction sounds academic. It isn't. Trained capabilities can be governed — you can decide not to ship the model with the vulnerability-research dataset attached. Emergent capabilities don't behave that way. They show up because the underlying reasoning got better. The only way to stop them is to make the model worse at everything else, which nobody is going to do.
Translation: every frontier model from this point forward, regardless of how cautiously it ships, is going to be a vulnerability research tool. That is the bed, and the entire security industry is about to sleep in it.
What this does to the pen-test market
Here is the part nobody on the buy-side is internalizing yet. The pen-test motion is built around human-rate discovery. You hire a firm. They assign three to five engineers. They spend a couple of weeks crawling your environment. They write a report. You patch what they found. You wait until next quarter and run it again.
That cadence existed because human attention was the bottleneck.
The bottleneck is gone.
What replaces it is continuous autonomous assessment — a model running against your codebase 24/7, finding issues faster than your humans can patch them. The product category that wins is “continuous AI-driven security with managed remediation,” not “pen-test as a service.” The vendors that end up owning that category are unlikely to be the names currently on your AOR. Anthropic, Google, OpenAI, and a small number of security-native specialists are about to become some of the most important enterprise infrastructure vendors on the planet, whether they wanted to be or not.
If your CISO told you last quarter that AI in security was overhyped, ask them how they're sleeping this week.
The defender's dilemma you cannot ignore
A capability that finds vulnerabilities is, by definition, a capability that can be used to exploit them. The same Mythos that autonomously found the FreeBSD RCE could be pointed at any production codebase by anyone with sufficient access. Anthropic has serious safeguards. State actors are not waiting on Anthropic. Black-hat researchers will have access to comparable open-weight models inside six months.
This means defense has to move first and at a cadence it has never moved at before. Quarterly red-team engagements aren't just slow — they are irrelevant. The window between vulnerability discovery and active exploitation in the wild is collapsing toward zero. If your remediation cycle takes 30 days, an adversary with a frontier model has 30 days to find every bug your last pen test missed and use them.
The CISOs who win this transition are the ones who shift budget toward three things: continuous autonomous discovery and the infrastructure to run it safely; automated remediation pipelines that don't require a human in the loop for known issue classes; and faster patch cycles, even at the cost of some operational stability. That last one is going to be the hardest political fight in most enterprises, because the operations side has spent two decades optimizing for patch stability. They were right to. The constraint just changed, and nobody told them.
The controversial part
Bug bounty programs as currently structured are finished. They paid researchers for human-rate work. An attacker with a frontier model can do that work faster and cheaper than your bounty program pays — and adversaries don't disclose responsibly. The programs that survive will look very different: paying for novel attack classes, complex chained exploits, and adversarial AI behavior, not “I found an XSS in your login flow.”
The pen-test firms that survive are going to look different too. The ones that thrive will be the ones who saw clearly in the first half of 2026 that their job changed from “find bugs” to “interpret what the AI found and decide what humans need to do about it.” Different product. Different headcount mix. Different pricing. Most legacy firms will not make this transition because they cannot — their business model is incompatible with the new economics. My over-under: three big-name security firms acquired or restructured by the end of 2027.
If you are a security leader right now, this is the budget conversation of your year. The vendors you renew with in the next six months are the vendors your posture lives or dies on for the next three. Get this one right. Pretending Mythos is a curiosity is going to be the most expensive thing the security industry does this decade.
