AI's Performance on Real Work Assignments
Imagine you're redesigning your living space. You could hire an interior designer for thousands of dollars—or ask ChatGPT to do it instead. But can AI actually handle real work? A groundbreaking study reveals the answer.
The Floor Plan Test
Researchers gave AI systems and human workers identical assignments from freelancing platforms. One task: create a digital version of a hand-drawn floor plan.
- The human produced a professional-looking floor plan with accurate measurements and details.
- The best-performing AI system created a plausible-looking version but with significantly less detail.
- The AI version was completely wrong—illustrating a critical disconnect in AI capabilities.
The Remote Labor Index Findings
The study, conducted by Scale AI and the Center for AI Safety, tested top AI systems (ChatGPT, Gemini, Claude) on hundreds of real projects including:
- 3D product animations
- Music transcription
- Web video game coding
- Research paper formatting
Shocking result: The best-performing AI system successfully completed only 2.5% of projects.
"Current models are not close to being able to automate real jobs in the economy," said researcher Jason Hausenloy of the Remote Labor Index study.
Where AI Falls Short
Data Dashboard Disaster
Another assignment involved creating an interactive dashboard for World Happiness Report data. While AI results looked adequate at first glance, closer examination revealed:
- Countries missing data inexplicably
- Overlapping text elements
- Legends with wrong colors or no colors at all
3D Modeling Failure
A project requiring promotional material for tech earbuds asked for 3D models and demonstration videos. Results:
- No AI system produced acceptable work
- GPT-5 and Sonnet created poor 3D models
- Manus didn't create a 3D model at all
- Earbuds changed appearance across video clips
Why AI Struggles with Real Work
Researchers identified two major limitations:
- No long-term memory: AI systems cannot learn from previous mistakes or remember feedback over time.
- Visual understanding deficits: AI struggles with graphic design, spatial relationships, and object manipulation.
Graham Neubig, a Carnegie Mellon professor who studies AI systems, explained: "Code is right or wrong, but visual design is very subjective." AI tools struggle to operate visual software designed for humans, often defaulting to code generation instead of proper design work.
The Web Game Exception
AI performed better on coding tasks. One assignment involved creating a web-based video game. The best AI version was playable—an impressive technical feat. However, the system ignored the instruction that the game should have a brewing theme, showing limitations in following complex creative briefs.
Economic Implications
Despite predictions that 75% of Americans expect AI to reduce jobs (Bentley University/Gallup survey), economic data shows the technology largely hasn't replaced workers yet.
If AI could perform remote work autonomously, companies could save massively on contractor costs. But the study suggests this scenario remains far from reality.
The Future Trajectory
While current AI fails at most real work, newer models show improvement:
- Google's Gemini 3 Pro (November release) completed 1.3% of tasks
- Previous version completed only 0.8%
"The trend lines are there," Hausenloy noted, acknowledging gradual progress.
The Cost Comparison
The economic implications become stark when comparing costs:
- A human made the video game assignment for $1,485
- Researchers had Sonnet make it for less than $30
Even with current limitations, AI can still disrupt labor markets by making individual workers more productive with chatbot assistance—potentially reducing overall employment needs.
The Fundamental Question
Whether AI needs minor tweaks or fundamental breakthroughs to handle real work is "the key question in the AI field at the moment," according to Hausenloy. The study challenges predictions that AI is poised to soon replace large portions of the workforce, revealing significant gaps between AI capabilities and real-world job requirements.




Comments
Join Our Community
Sign up to share your thoughts, engage with others, and become part of our growing community.
No comments yet
Be the first to share your thoughts and start the conversation!