AI Is Finding More Bugs Than Open-Source Teams Can Fight Off

AI Is Finding More Bugs Than Open-Source Teams Can Fight Off
Source: Bloomberg Business

In 2025, Daniel Stenberg, the chief maintainer of cURL, an open-source software tool that transfers data using URLs, received 181 notifications of bugs or vulnerabilities across the codebase he oversees with a small team of six other volunteers.

That was roughly as many as the previous two years combined. "Last year was quite intense during periods," Stenberg said from his office in Sweden.

With hindsight, that 2025 bump -- which Stenberg attributed to the rise of AI tools like OpenAI's ChatGPT and Anthropic PBC's Claude, and the ease with which bug report forms could be filled out with their support -- was just a taster.

By April 9 this year, the cURL team had already fielded 87 requests. That puts them on track to receive some 325 reports during 2026, roughly as many as they received in total from 2020 through 2023.

As the only full-time member on the cURL project, Stenberg handles most of the requests and fixes himself. "It feels a little bit overwhelming at times," he said. It's also getting worse -- and will continue to do so, he says, thanks to the increasingly powerful capabilities of AI.

With the unveiling of Anthropic's latest model, Mythos, stakeholders from security experts to the US Treasury have expressed concern over whether the internet can remain secure. UK banks will get access to Mythos next week. According to Anthropic, Mythos, announced on April 7, can autonomously discover and exploit so-called "zero-day" vulnerabilities -- weaknesses in code that have not yet been discovered -- across every major operating system and web browser. "The fallout -- for economies, public safety, and national security -- could be severe," the company wrote.

As part of a pre-emptive effort to contain any potential impact, Anthropic decided not to release Mythos widely, instead giving access to those maintaining core code at around 40 organizations, including CrowdStrike Holdings Inc. and the Linux Foundation. It also announced $4 million of funding for a clutch of software maintainer groups. On Tuesday, OpenAI announced a model of its own, GPT-5.4-Cyber, which it says is aimed at spotting software vulnerabilities.

Yet Anthropic's donation -- a tiny fraction of its latest $14 billion run-rate revenue -- merely underscored one of the big secrets in big tech: That the sector's current sky-high valuations depend, at least in part, on open-source software maintained by small, under-resourced teams.

There's a hope that Mythos, put in the right hands, could fix issues before other AI models find them. For now, though, as cyber attackers and defenders race to adopt AI, those teams are at risk of becoming a bottleneck, with their workload growing faster than their capacity to respond.

Massive Influx

As code is maintained and bugs are fixed, it accrues what software maintainers call "cruft" -- remnants of legacy code left within software that can break things or be exploited. Stenberg and his team need to understand and be prepared to maintain any and all of it. cURL's total codebase is now 592,566 lines, and a small update to one component can cause issues elsewhere. Keeping track of the complex interplay is what makes their job so tricky.

Sometimes those things can go wrong, with catastrophic consequences. Massive internet outages, which seem to crop up every year or two, have been attributed to human errors in coding, while near-misses can also have enduring consequences. A vulnerability in an open-source Java library discovered in 2021 affected 93% of all cloud environments, and is still exploited today where not patched.

Twelve years ago, a major bug was discovered in the code of OpenSSL, a secure communications toolkit that, among other things, allows websites to show you that tiny yellow padlock to prove your purchases were secure. The error was in place for more than two years before it was found on April Fools' Day 2014 by Google researcher Neel Mehta. The vulnerability, nicknamed Heartbleed, affected thousands of websites, leaving them vulnerable to hacking. Cleaning up the issue -- quickly, before hackers could exploit it -- was largely a job for two guys named Steve.

"There's this sort of power law of maintainers," where only a tiny number of people do a vast amount of maintenance work, said Jim Zemlin, executive director of the Linux Foundation, which created the Core Infrastructure Initiative in response to Heartbleed. In 2020, that Linux Foundation effort was folded into a newly launched cross-industry body; the Open Source Security Foundation (OpenSSF).

Heartbleed was discovered because one eagle-eyed individual happened to find a problem with the code. Since the 2022 release of ChatGPT and the widespread usage of generative AI tools like Claude Code and Codex, the ability to rifle through lines upon lines of code for issues has skyrocketed -- as has hackers' ability to use AI to identify actual vulnerabilities. Thus the massive influx of bug reports on the desk of people like Daniel Stenberg.

Stenberg isn't alone. Software maintainers keeping core parts of our digital worlds working have started being swamped by threat alerts. Willy Tarreau, lead software developer at HAProxy, which load balances web traffic, has written that the slew of reports is "a bit scary (and tiring)." Open source developer Dirk Hohndel has equated it to a DDoS (distributed denial of service) hack attack on developers. Matej Cepl, an engineer at SUSE, a Linux software company, described the number of bug reports he and colleagues are receiving as "crazy."

Arise, Mythos

Since late 2025, things have changed. Quality and quantity have both increased. Autonomous AI agents now find faults in code, then fill out the forms to report them.

"These AI tools are now pretty good at accurately finding problems," said Stenberg, who said more than 200 issues found by AI tools have been fixed in cURL in the past six months.

The problem is that while the number of AI eyes looking for problems has increased, the number of people fixing those problems when they arise hasn't. And -- so far -- humans are still the final link in the chain, even as AI's autonomous code-writing capabilities increase exponentially.

Mythos may eventually alleviate the stress on maintainers, securing code for millions of users who rely on it. In the days since its release, it's already proving essential. Jim Zemlin gave early access to Mythos members of his team, including Greg Kroah-Hartman, who maintains Linux, a computer operating system. Kroah-Hartman has previously said he noticed "the world switched" in February as AI began reporting real issues at scale.

"What Greg's telling me is,'This thing is useful in actually fixing bugs'," said Zemlin. Hence Anthropic's staggered release to the "good guys" first. "Yesterday I asked Greg if I could take [his Mythos access] away," he added. "He said:'No, please don't.'"

"In my mind, anything, any tool that can help a maintainer write and patch more secure software is goodness," said Zemlin.

Security professionals such as Stenberg may have to hope that Mythos comes good. Already,"bounty" programs that pay ethical hackers to report bugs rather than exploit them are being scrapped under the weight of AI-generated reports. In March, Google hit pause on its Open Source Software Vulnerability Reward Program. The Internet Bug Bounty program recently paused new submissions,saying "the discovery landscape is changing."

The problem is that if Mythos is as effective as early signals suggest,a wave of externally generated code issue reports could soon become a flood of internal ones.(And if it ends up in the hands of bad actors,there's big trouble ahead: The UK AI Security Institute,a public-backed body,audited Mythos' capabilities this week and found it to be much faster than human hackers.)That would put even more pressure on those humans who still retain oversight,even in the age of Mythos.

On Easter Sunday alone, Stenberg says he received five security reports. He stayed up late to handle them,sending an update when he was finished to a small mailing list: "Another intense week ends."

It takes Stenberg around two hours on average to fix each problem that arrives in his inbox. Some of the complex issues identified by AI take longer.

"I fear it's not sustainable," he said."We need to stay sane and stay alive."