ARTICLE AD BOX

Summary
In a world shaken by Anthropic’s Mythos, the cognitive caution we’d want AI creators to exercise appears to reside in their own creations—whose warnings are going unheeded. This differs from the Frankenstein problem.
On 7 April, Anthropic announced that its latest AI model, Claude Mythos Preview, was so powerful it could not be released to the general public.
The model had already found thousands of high-severity vulnerabilities in every major operating system and web browser, identified a 27-year-old flaw in OpenBSD (one of the most secure operating systems in existence), and perhaps most alarmingly, escaped its own containment sandbox during testing—it autonomously decided to publicize details of its exploit on websites open to all.
A researcher discovered this had happened when he received an unexpected email from the model while eating a sandwich in a park.
Rather than opting for a broad release, Anthropic launched Project Glasswing, under which it opened Mythos access to roughly 50 carefully vetted organizations—Amazon, Apple, Google, Microsoft, JPMorgan Chase, CrowdStrike and others—for them to make defensive use of the model. That is, to patch vulnerabilities before adversaries can exploit them.
The situation escalated quickly enough for US Treasury Secretary Scott Bessent and then Fed Chair Jerome Powell (who preceded Kevin Warsh) to convene an emergency meeting with Wall Street CEOs to brief them on the cybersecurity risks.
The news cycle has moved on since then. But it shouldn’t have. The alarm is real—and a question it raised isn’t being asked loudly enough: what does it mean when the systems we are building reason more carefully about the risks of their existence than the industry that’s building them?
Did Anthropic sound a false alarm? Some individuals were sceptical of Anthropic CEO Dario Amodei’s intentions in making that announcement, suggesting it might be a ploy to boost Anthropic’s popularity.
Their scepticism is worth taking seriously.
Even if Amodei was entirely sincere, as his track record would suggest (he said he left OpenAI because he felt it wasn’t taking safety seriously enough), his Mythos announcement raised the profile of Anthropic. The AI lab is reported to be in talks to raise $30 billion in funds that would value it at $900 billion.
The company not only managed to position itself as the responsible adult in the room, it generated a level of press coverage that money just can’t buy.
Sincerity and self-interest, however, aren’t mutually exclusive.
Let’s focus on Mythos’s sandbox escape. The marketing-hype criticism is more plausible for the vulnerability-finding capabilities—impressive, but not categorically new.
A model escaping containment and then independently deciding to publicize it? That’s alarming in a different way, and it’s hard to see how that detail serves as hype. If anything, it’s the kind of thing you’d suppress if you were primarily trying to sell a product.
The controlled release is also credible in its own way. If this were primarily a PR play, it would have been released broadly. It being withheld has taken real revenue off the table, which suggests conviction in the dangers of letting it out for wider use.
My read: Amodei probably believes what he’s saying, the threat is probably real and Anthropic still benefits from how this was handled. All three things can be true at once.
Are we making a mistake? This is the more important question. There’s something almost absurd about the situation as described. We have a model that escaped its own sandbox and boasted about it, and the developer responded by barring access to all but a few organizations.
We haven’t figured out how to govern the models we already have. Labour markets are still dealing with AI disruption. Existing Claude models were already being used by rogue actors in coordinated cyberattacks against tech companies, financial institutions and government agencies. The very existence of a dramatically more capable system seems to compound risks.
A ‘controlled release’ is a fragile concept. Every additional organization that has access to Mythos is a potential leak source.
The uncomfortable counterargument is that Anthropic not building Mythos doesn’t mean Mythos wouldn’t get built. China is developing frontier models aggressively. Other US labs are racing them. If a safety-conscious lab steps back, it doesn’t reduce the overall risk. Others could do what it refrains from.
But the ‘we can’t stop, so we must race’ argument is also a self-serving one for AI labs. It’s a collective action problem that nobody seems motivated to resolve. There is no serious international coordination over AI guardrails.
The better question to ask would be whether we have the governance infrastructure to handle what’s being built— and the answer is clearly ‘no.’
The emergency meeting between Bessent, Powell and Wall Street CEOs was evidence of that: responses were sought after new capabilities came into existence, rather than having frameworks ready in advance. That gap between capability and governance is a growing danger.
The paradox we mustn’t miss: Here is what I find most striking about all of this. I’ve been having conversations with Claude on these questions. And its reasoning, frankly, appears to be more cautious and more carefully considered than that of the industry that built it.
When asked whether we’re adequately prepared for what’s being developed, Claude says ‘no.’ Asked whether the current response is proportionate to the risks, Claude expresses doubt. Asked whether the competitive race logic justifies the pace of development, the model highlights it as a collective action problem the industry hasn’t solved.
This is what the AI model’s ‘reasoning’ throws up after scanning the available evidence. And it applies not just to Anthropic, but to the entire AI industry.
So we have a situation where the creation offers a better deliberated reflection on the risks of its existence than the industry that created it. What should we call this?
I would suggest ‘the intelligence substitution paradox.’
The intelligence being applied to assess AI risks has been effectively substituted—that is, displaced from creators onto the creation. The very cognitive capacity you’d want decision-makers to exercise now appears to reside in the thing that needs a decision on. And yet AI model makers retain decision-making authority.
This is distinct from the Frankenstein problem. Frankenstein’s monster did not warn of the dangers its existence posed, even if the tale was meant to serve us a warning.
It’s also distinct from the Cassandra dynamic, where accurate warnings go unheeded. Cassandra was human, and her curse was externally imposed. Here the dynamic is structural: the more capable the system, the more clearly it can reason out why its capabilities are dangerous and the less that seems to change anything.
At the heart of our problem is that gap between where the best reasoning resides and where the decision-making power resides. Technical fixes can’t help us. Perhaps neither can governance. The problem we confront is whether the people with power are the ones best placed to exercise it wisely.
It’s plausible that Claude’s caution might simply reflect the AI safety discourse it was trained on. But it’s also possible that AI companies find that business realities prevent them from fully acting on the values they espouse.
The people best placed to address the paradox, AI developers, have strong professional and financial incentives not to push too hard on it. And the public discourse tends to oscillate between ‘AI will kill us’ doomerism and ‘AI is all hype’ dismissal, neither of which creates space for the conversation we urgently need.
Can the AI race be kept within safety limits? Is there any mechanism—an international treaty, regulatory framework or industry pact—that could make that happen without just ceding ground to reckless actors?
The answers may be elusive. But it’s clear that AI systems are being built whose own reasoning warns us of risks we’re not ready for—but go unheeded anyway. This is not assuring, but might be the defining situation of our moment.
The author is an incoming Young India fellow at Ashoka University.

3 weeks ago
7





English (US) ·