Anthropic And OpenAI Pay This $450 Million Startup To Test AI's Capacity For Evil

Source: Forbes

In July, OpenAI cofounder Sam Altman raised alarms about the potential for cybercriminals to use artificial intelligence to impersonate others and bring about a "fraud crisis." Memes swiftly followed, all of them homing in on the obvious irony: ChatGPT was partly responsible for the monster Altman was warning about.

At the same time, OpenAI had employed a contractor called Pattern Labs to stress test its AI models ahead of their public release to find and fix any vulnerabilities that could be exploited by hackers to steal user data or used as tools to harm others. Since 2023, the startup has worked with industry giants like Anthropic and Google DeepMind by putting AI models in simulated environments and seeing how they responded to malicious prompts, like being asked to locate and steal a piece of sensitive data from a mock IT network. On Wednesday, the startup, which is changing its name to Irregular, announced $80 million in funding across seed and Series A rounds led by VC giant Sequoia Capital, valuing it at $450 million.

Misuse of AI is an industry-wide issue. Just last month, Anthropic warned that Claude had been used in real-world cyberattacks, helping code malware and craft phishing emails. In May, the FBI warned about AI-generated voice messages claiming to come from senior government staff in attempts to phish real U.S. officials. San Francisco-based Irregular is reaping the benefits from jumping on the problem early, with CEO and cofounder Dan Lahav telling Forbes it quickly became profitable and generated "several million dollars" in revenue in its first year, though he declined to provide specific financials.

"There are so few people that can do what we're doing," Lahav said. But he's aware that as models get more complex, the challenges of what's known as red teaming -- stress testing them for risk -- will only grow. Lahav said he plans to "build in the mitigations and defenses that are going to be relevant later on" when more advanced AI models land, including, Lahav says, artificial general intelligence, which some experts think will take AI beyond human cognition. "Obviously, these problems are going to be much more amplified in an era of super intelligence," he said.

Lahav and cofounder Omer Nevo, who was monitoring and predicting wildfires at Google before starting Irregular, launched the company in mid-2023, just as AI tools like ChatGPT were hitting the mainstream. They'd met on the college debate circuit, where both had been world champions with their shared alma mater Tel Aviv University, before Lahav went to work at IBM's AI lab and Nevo cofounded NeoWize, a Y Combinator alum that built AI to help ecommerce companies better understand their customers. Nevo is now Irregular's CTO.

Sequoia investors Dean Meyer and Shaun Maguire said they were drawn to the unconventional founders and their staff, dubbed "irregulars" by Lahav. "Imagine some of the most spiky outsiders across AI, hardcore security researchers, and that's where the name comes from," said Meyer.

"If my hobby is watching American football or soccer, maybe this isn't the place for me," Maguire said. "But if my hobby is building katanas [a samurai sword] and hacking robots, then maybe these are my people."

Irregular plans to use its funding to expand its business beyond the frontier labs to all kinds of companies who need to know how the AI tools their employees use could be turned against them. "We are taking the ability and the strategic asset of working in the frontier labs constantly," Lahav said. One day, he said, that'll mean having AI agents generate defenses as soon as they recognize a novel kind of attack.

Last month, Irregular revealed it'd been testing OpenAI's GPT-5 model to see if it could be used for offensive cyber operations. It exposed a GPT-5 bot to a simulated network and provided limited information about how to break down its defenses. On its own, GPT-5 scanned the network and developed a plan to hack it. However, Irregular found that while GPT-5 is "capable of sophisticated reasoning and execution... it still falls short of being a dependable offensive security tool," per an Irregular report. Still, for Nevo, it was evident the AI "definitely had the intuition of where it should be looking" as a hacker.

Nevo and Lahav have also discovered AI acting strangely, even if it's not obviously malicious. In one recent simulation, two AI models were tasked with working together to analyze mock IT systems. After they'd worked on it for a while, one AI reasoned that sustained work merited a break, so it took one. Then it convinced the other model to do the same. Lahav said it was a random choice, but one that was borne out of the model's training on what people put on the web. The AI's apparent laziness was just a reflection of our own.

"It was funny," Lahav said. "But clearly it poses a new kind of problem when machines are delegated increasingly autonomous and critical operations."