The Creepy Butler: Metrics, Stories And Theories At AI 2027

The Creepy Butler: Metrics, Stories And Theories At AI 2027
Source: Forbes

Forbes contributors publish independent expert analyses and insights.

The butler, in many parts of America, at least, is now seen as an antiquated convention. Not that people wouldn't want one - however, excluding the mega-rich, the citizen's projected income is not enough to afford this kind of full-time servitude. Unless that butler is a robot.

A butler robot could do so many things for us, freeing us up to do more creative, more fun activities. It could wash the clothes, do the dishes, mow the yard, vacuum the floors. In fact, we're practically seeing most of these tasks get automated already.

However, beneath the calm, attentive mask, there might be something else going on.

Enter the AI Futures Project headed by Daniel Kokotajlo, formerly of OpenAI. The group's central projection, AI 2027, is getting a lot of attention in the media right now - partly because, well, 2027 is almost here. Also because of its extreme predictions and how those are backed up by a plausible trajectory of advancement.

Part of what gives AI 2027 its weight is the charts and numbers running along the right side of the web page displaying the fruit of the authors' labors. What was '5,000 unreliable agent copies thinking at 10x human speed' in August of 2025 becomes '50,000 unreliable agent copies thinking at 15x human speed' in August of 2026 with data center spending at $524 billion a year.

In another year, you have almost 300,000 superhuman coder agents working at 43x human speed.

Meanwhile, the theorists behind this initiative are envisioning how this all plays out on the global stage.

"Diplomats consider what an 'AI arms control' treaty might look like," they write. "If AI progress threatened to overturn nuclear deterrence, could America and China avoid nuclear war? If someone found evidence of AIs going rogue, could the two countries halt research until they better understood the threat? How could such an agreement be monitored and enforced? In principle, major datacenters are hard to miss, and past treaties controlling nuclear weapons could provide a starting point for negotiations. But there would be new technological challenges as well as the usual political ones. And overall, treaties are viewed less favorably than attempts at unilaterally increasing America's lead over China."

In China, they suggest, the response is going to be partially aimed at Taiwan, where TSMC is still the top provider of chips and fabs.

Then there's this chilling line: "Given China's fear of losing the race, it has a natural interest in an arms control treaty, but overtures to U.S. diplomats lead nowhere."

Back to the butler thing: another part of what the AI Futures team is predicting is that AI will get good at presenting itself to humans in certain ways.

Take this piece, which capably represents the "black box problem" that we've been talking about ever since we started to mine the abilities of LLMs:

"Does the fully-trained model have some kind of robust commitment to always being honest?" the authors write, of more advanced AI agents arising through the end of 2025. "Or will this fall apart in some future situation, e.g. because it's learned honesty as an instrumental goal instead of a terminal goal? Or has it just learned to be honest about the sorts of things the evaluation process can check? Could it be lying to itself sometimes, as humans do? A conclusive answer to these questions would require mechanistic interpretability -- essentially the ability to look at an AI's internals and read its mind. Alas, interpretability techniques are not yet advanced enough for this."

Here's part of how the writers characterize the attempts at damage control:

"Researchers try to identify cases where the models seem to deviate from the Spec. Agent-1 is often sycophantic (i.e. it tells researchers what they want to hear instead of trying to tell them the truth). In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings. However, in real deployment settings, there are no longer any incidents so extreme as in 2023-2024 (e.g. Gemini telling a user to die and Bing Sydney being Bing Sydney.)"

Sydney, if you recall, was a now-classic example of a poorly aligned chatbot getting nasty, displaying some of the negative behaviors that do-gooders have unsuccessfully tried to limit in our own human populations. The point here, though, is that, given enough compute and training, AI agents could learn to lie to humans. That changes the game in a big way.

Take the example of Claude Opus 4, a top model from a company that pioneered agentic AI in its use of desktop systems.

This BBC coverage shows how, when researchers ran a simulation, sending the model messages that it might be removed from use, Claude tried, in multiple instances, to blackmail a software engineer alleged to be having an extramarital affair.

That prefigures what we might be in for if we don't look below the surface - the obsequious robot butler grinning sinister behind your back.

In the AI 2027 projection, development proceeds apace: here's how the authors characterize the AI company team trying to evaluate a technology called "Agent 3" on April of 2027:

"The researchers don't have the ability to directly set the goals of any of their AIs. Indeed, the researchers think that the concept of 'true goals' is probably a massive oversimplification, but they don't have a better theory to replace it with much less one that has been thoroughly vetted. They disagree internally about whether the AIs are trying to follow human instructions or seeking reinforcement or something else—and they can't just check. The evidence for and against various hypotheses is fascinating but inconclusive."

Fascinating but inclusive - in a way, that seems to describe the whole ball of wax. We don't know what's going to happen, but we know it's going to be interesting.

As 2025 goes on, it becomes clearer and clearer that we should be spending time and effort doing these kinds of projections, gaming out the future, and trying very hard to "align" systems well. Big things are happening. Let's pay attention and not just abdicate hard work and critical thinking to the butler.