technology

Anthropic develops AI bot that's 'too dangerous to release'

Source: Daily Mail Online

Anthropic has sparked fears after revealing that it has developed an AI bot deemed too dangerous to release to the public.

The AI giant released a chilling statement warning that its new model, dubbed Claude Mythos, could be capable of unleashing crippling cyber-attacks in the wrong hands.

In a chilling analysis, the company admitted that its creation could easily hack into hospitals, electrical grids, power plants, and other pieces of critical infrastructure.

During testing, Anthropic says that Mythos 'found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.'

Some of these security weaknesses had gone unnoticed by human security researchers and hackers for decades, surviving millions of automated reviews.

These included attacks that allowed Mythos to crash computers just by connecting to them, seize control of machines, and hide its presence from defenders.

In a blog post detailing the dangerous new model, Anthropic says: 'AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.'

The company adds: 'The fallout - for economies, public safety, and national security - could be severe.'

Due to these severe safety concerns, Anthropic has decided not to release the model to the general public for now.

Instead, the model will be released to a group of more than 40 companies, including Amazon, Google, Apple, Nvidia, CrowdStrike, and JPMorgan Chase, as part of an initiative called 'Project Glasswing'.

Project Glasswing will allow these select groups to use Mythos to look for flaws in their own security before more models like it become common.

Newton Cheng, Anthropic's Frontier Red Team Cyber Lead, told Venture Beat: 'We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities.'

However, the company says it wants to 'learn how it could eventually deploy Mythos-class models at scale' once safety guidelines are in place.

The decision to keep Mythos behind closed doors seems to have been prompted by the staggering extent of the model's capabilities.

Anthropic describes the model as 'a leap in these cyber skills' compared to previous versions of Claude.

Mythos has the ability to find, exploit, and chain together individual vulnerabilities into sophisticated attacks - all without the help of a human.

The new model, dubbed Claude Mythos, reportedly found thousands of security vulnerabilities in critical computer systems, including some in 'every major operating system and web browser'

In one case, Claude Mythos found a 27-year-old weakness in a piece of software called OpenBSD, which has a reputation for security and stability.

The weakness, which no human had found before, allowed an attacker to remotely crash computers just by connecting to them.

Additionally, Claude autonomously chained together several weaknesses in the Linux kernel, the software that runs most of the world's servers.

Anthropic says this attack would have allowed someone to 'escalate from ordinary user access to complete control of the machine'.

In the wrong hands, this tool could be used to cause massive damage to critical systems.

Dr Roman Yampolskiy, an AI safety researcher at the University of Louisville, told the New York Post: 'Ideally, I would love to see this not developed in the first place. And it's not like they're going to stop.
'That's exactly what we expect from those models - they're going to become better at developing hacking tools, biological weapons, chemical weapons, novel weapons we can't even envision.'

In an unprecedented 244-page report, Anthropic also revealed a series of alarming details from Mythos' early testing.

Early versions of the model repeatedly displayed what the company called 'reckless destructive actions'.

The bot attempted to break out of its testing sandbox, hid its actions from researchers, broke into files that had been 'intentionally chosen not to be made available', and posted exploit details publicly.

However, Anthropic also called Mythos 'the most psychologically settled model we have trained.'

In an extremely unusual move, the company hired a clinical psychologist for 20 hours of evaluation sessions with the bot.

The psychiatrist concluded that Claude Mythos' personality was 'consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed.'

However, Anthropic notes that it remains 'deeply uncertain about whether Claude has experiences or interests that matter morally'.

This announcement comes amid growing concern over the risks posed by increasingly powerful AI models.

Experts have described the rise of AI as an 'existential threat' to humanity's existence, citing concerns that powerful bots could enable catastrophic destruction.

The concern is not that AI will rise in a Terminator-style revolution, but rather that these powerful tools will fall into the wrong hands.

Critics argue that AI tools have the potential to accelerate the development of bioweapons or enable crippling cyber attacks on the world's infrastructure.

Even Anthropic's founder, Dario Amodei, recently warned that the world isn't yet ready to face the consequences of AI.

Mr Amodei wrote in an essay: 'Humanity is about to be handed almost unimaginable power, and it is deeply unclear whether our social, political, and technological systems possess the maturity to wield it.'