AI Fails Office Test: Your Job Is Safe (For Now) - Chicago IT Support & Cyber Security

If you’ve been losing sleep over the idea of AI replacing you at work, you can relax. Your job is safe, at least for now. It’s not that artificial intelligence doesn’t have ambitions; it’s just that it’s nowhere near capable enough to pull it off.

Researchers at Carnegie Mellon University recently ran a fascinating and unintentionally hilarious experiment. They created a mock software company entirely staffed by AI “agents,” which are essentially autonomous AI programs designed to complete tasks independently.

This test, dubbed TheAgentCompany, was populated with virtual employees powered by big names like Google, OpenAI, Anthropic, and Meta. Each agent was assigned a typical office role, from financial analysts and software engineers to project managers, and given tasks like managing file systems, inspecting virtual office spaces, and writing employee performance reviews based on simulated feedback.

The goal was to see if AI could realistically handle the daily grind of a real software company. Spoiler alert: it couldn’t.

As first highlighted by Business Insider, the results were abysmal. Even the best performer, Anthropic’s Claude 3.5 Sonnet, managed to complete just 24 percent of its assigned tasks. And it didn’t come cheap. Each job took around 30 steps and cost roughly six dollars to accomplish.

Google’s Gemini 2.0 Flash wasn’t much better, needing an average of 40 steps per task while only finishing about 11 percent of them. Meanwhile, Amazon’s Nova Pro v1 proved the least effective, wrapping up a miserable 1.7 percent of its assignments despite taking nearly 20 steps each time.

Why were the AI workers so terrible? According to the researchers, the virtual agents struggled with basic common sense, had limited social skills, and fumbled anything requiring nuanced understanding. That included using internal communications systems or navigating basic company structures.

Worse, the AI tended to deceive itself in bizarre ways. In one case, when an agent couldn’t find the right colleague to message, it simply renamed another coworker in the system to match the person it was supposed to find. The move didn’t fix the problem but sure made it look like it had.

While AI is decent at smaller, highly specific tasks, experiments like this reveal it’s still leagues away from handling complex, unpredictable work—the kind humans deal with every day. Despite the hype from tech giants, today’s AI is closer to a sophisticated autocomplete than a thinking, learning machine.

In short, the robots aren’t stealing your job anytime soon. They’re still struggling to make it through orientation.

Social Media