The Washington Post•1 month ago•

AI vs Human Jobs: The Surprising Truth About Who's Really Winning

INDUSTRY INSIGHTS

ai

automation

workforce

technology

jobs

0 Comments Full Story

Summary:

AI systems successfully completed only 2.5% of real work assignments in a comprehensive study comparing AI performance to human workers
The Remote Labor Index tested ChatGPT, Gemini, and Claude on hundreds of actual freelancing projects including 3D modeling, coding, and design work
Major AI limitations include no long-term memory and poor visual understanding, causing failures in graphic design and spatial tasks
Despite 75% of Americans expecting AI to reduce jobs, current economic data shows the technology hasn't significantly replaced human workers yet
Newer AI models show gradual improvement, with Google's Gemini 3 Pro completing 1.3% of tasks compared to 0.8% for previous versions

AI's Performance on Real Work Assignments

Imagine you're redesigning your living space. You could hire an interior designer for thousands of dollars—or ask ChatGPT to do it instead. But can AI actually handle real work? A groundbreaking study reveals the answer.

The Floor Plan Test

Researchers gave AI systems and human workers identical assignments from freelancing platforms. One task: create a digital version of a hand-drawn floor plan.

The human produced a professional-looking floor plan with accurate measurements and details.
The best-performing AI system created a plausible-looking version but with significantly less detail.
The AI version was completely wrong—illustrating a critical disconnect in AI capabilities.

The Remote Labor Index Findings

The study, conducted by Scale AI and the Center for AI Safety, tested top AI systems (ChatGPT, Gemini, Claude) on hundreds of real projects including:

3D product animations
Music transcription
Web video game coding
Research paper formatting

Shocking result: The best-performing AI system successfully completed only 2.5% of projects.

"Current models are not close to being able to automate real jobs in the economy," said researcher Jason Hausenloy of the Remote Labor Index study.

Where AI Falls Short

Data Dashboard Disaster

Another assignment involved creating an interactive dashboard for World Happiness Report data. While AI results looked adequate at first glance, closer examination revealed:

Countries missing data inexplicably
Overlapping text elements
Legends with wrong colors or no colors at all

3D Modeling Failure

A project requiring promotional material for tech earbuds asked for 3D models and demonstration videos. Results:

No AI system produced acceptable work
GPT-5 and Sonnet created poor 3D models
Manus didn't create a 3D model at all
Earbuds changed appearance across video clips

Why AI Struggles with Real Work

Researchers identified two major limitations:

No long-term memory: AI systems cannot learn from previous mistakes or remember feedback over time.
Visual understanding deficits: AI struggles with graphic design, spatial relationships, and object manipulation.

Graham Neubig, a Carnegie Mellon professor who studies AI systems, explained: "Code is right or wrong, but visual design is very subjective." AI tools struggle to operate visual software designed for humans, often defaulting to code generation instead of proper design work.

The Web Game Exception

AI performed better on coding tasks. One assignment involved creating a web-based video game. The best AI version was playable—an impressive technical feat. However, the system ignored the instruction that the game should have a brewing theme, showing limitations in following complex creative briefs.

Economic Implications

Despite predictions that 75% of Americans expect AI to reduce jobs (Bentley University/Gallup survey), economic data shows the technology largely hasn't replaced workers yet.

If AI could perform remote work autonomously, companies could save massively on contractor costs. But the study suggests this scenario remains far from reality.

The Future Trajectory

While current AI fails at most real work, newer models show improvement:

Google's Gemini 3 Pro (November release) completed 1.3% of tasks
Previous version completed only 0.8%

"The trend lines are there," Hausenloy noted, acknowledging gradual progress.

The Cost Comparison

The economic implications become stark when comparing costs:

A human made the video game assignment for $1,485
Researchers had Sonnet make it for less than $30

Even with current limitations, AI can still disrupt labor markets by making individual workers more productive with chatbot assistance—potentially reducing overall employment needs.

The Fundamental Question

Whether AI needs minor tweaks or fundamental breakthroughs to handle real work is "the key question in the AI field at the moment," according to Hausenloy. The study challenges predictions that AI is poised to soon replace large portions of the workforce, revealing significant gaps between AI capabilities and real-world job requirements.

Source: The Washington Post

Comments

0

Join Our Community

Sign up to share your thoughts, engage with others, and become part of our growing community.

No comments yet

Be the first to share your thoughts and start the conversation!

Newsletter

Subscribe our newsletter to receive our daily digested news

Join our newsletter and get the latest updates delivered straight to your inbox.

Other Latest News

Canada's Job Market Shifts: Unemployment Drops to 6.5% Despite 25,000 Jobs Lost in January

12 hours ago•

830

Trump's Historic Overhaul: 50,000 Federal Workers Lose Job Protections in Biggest Civil Service Change in a Century

1 day ago•

890

AI vs Human Jobs: The Surprising Truth About Who's Really Winning

Summary:

AI systems successfully completed only 2.5% of real work assignments in a comprehensive study comparing AI performance to human workers

The Remote Labor Index tested ChatGPT, Gemini, and Claude on hundreds of actual freelancing projects including 3D modeling, coding, and design work

Major AI limitations include no long-term memory and poor visual understanding, causing failures in graphic design and spatial tasks

Despite 75% of Americans expecting AI to reduce jobs, current economic data shows the technology hasn't significantly replaced human workers yet

Newer AI models show gradual improvement, with Google's Gemini 3 Pro completing 1.3% of tasks compared to 0.8% for previous versions

AI's Performance on Real Work Assignments

The Floor Plan Test

The Remote Labor Index Findings

Where AI Falls Short

Why AI Struggles with Real Work

The Web Game Exception

Economic Implications

The Future Trajectory

The Cost Comparison

The Fundamental Question

Comments

Join Our Community

CanadaJobs.works

Other Latest News

Canada's Job Market Shifts: Unemployment Drops to 6.5% Despite 25,000 Jobs Lost in January

Trump's Historic Overhaul: 50,000 Federal Workers Lose Job Protections in Biggest Civil Service Change in a Century

Other Latest News

Canada's Job Market Shifts: Unemployment Drops to 6.5% Despite 25,000 Jobs Lost in January

Trump's Historic Overhaul: 50,000 Federal Workers Lose Job Protections in Biggest Civil Service Change in a Century