Projects of AI
Posts
Microsoft shows AI struggles with the Debugging of Code

Microsoft shows AI struggles with the Debugging of Code

Plus: NVIDIA animates Tom & Jerry.

Amandeep Singh
April 11, 2025

Together with

Welcome Humans🤖,

Here is what we have Today:

💻 Microsoft confirms AI still sucks at debugging.
🤖 Samsung's Gemini powers long-awaited Ballie finally rolls into reality
🐱 NVIDIA's AI creates minute-long animated stories from prompts.
⚖️ OpenAI sues Musk back.
💼 New Job Opportunities

SAMSUNG

🤖 Samsung’s Robot Just Got a Brain Upgrade — Thanks to Google

Samsung just dropped a major update on Ballie—the rolling, projector-equipped home robot it’s been teasing since CES 2020.

And now? It’s finally real… and it’s got Google’s Gemini AI under the hood.

Image Source: Samsung

What’s the deal?

Ballie is a smart home assistant the size of a soccer ball that can:

Roam your home on its own
Project videos on walls or ceilings
Control smart devices
Respond to voice commands

And it’s not just running Samsung’s AI—it’s combining forces with Gemini. That means multimodal AI, capable of understanding voice, visuals, and even environmental context.

Where and when?
Ballie rolls out this summer in the U.S. and South Korea. Third-party app support is coming too, which could unlock a whole new world of skills.

Why it matters:
The home robot race is heating up fast, and Samsung’s no stranger to smart ecosystems. With SmartThings already in millions of homes, Ballie could be the first truly useful home robot—not just a novelty.

Add Google’s Gemini into the mix, and you’ve got a seriously intelligent sidekick.

Together with Superhuman AI

Find out why 1M+ professionals read Superhuman AI daily.

AI won't take over the world. People who know how to use AI will.

Here's how to stay ahead with AI:

Sign up for Superhuman AI. The AI newsletter read by 1M+ pros.
Master AI tools, tutorials, and news in just 3 minutes a day.
Become 10X more productive using AI.

Join 1 million pros and start learning AI

AI RESEARCH

🧠 Microsoft Finds AI Debuggers Still Need Debugging

Even the smartest AI still can’t fix your broken code.

A new Microsoft Research study just dropped—and it shows AI agents, powered by the most advanced models out there, are still struggling with one of software engineering’s most essential skills: debugging.

Image Source: Microsoft

Here’s what they did:

Microsoft took nine leading LLMs—including Claude 3.7 Sonnet—and threw them into the deep end with 300 real-world debugging tasks from the SWE-bench Lite dataset.

And the result?

Even the best models flopped on half the problems.

Claude 3.7 Sonnet came out on top—when paired with debugging tools—solving 48.4% of the issues. Not exactly a victory lap. OpenAI’s o1 and o3-mini? They managed just 30.2% and 22.1%, respectively.

So, why the struggle?
Turns out, these models were never really trained to think like human debuggers. Microsoft found that they lacked exposure to sequential decision-making—aka the step-by-step logic real developers use to fix code.

Why it matters:
With billions flowing into AI coding agents from Google, Meta, and others, this study is a hard dose of reality. While AIs are great at spitting out code, they're still not ready to fix it when things break. And until that changes, human devs aren’t out of a job anytime soon.

Trending Today

AI Swarm is an AI system that is able to multiple itself to complete different task at the same time. This AI is made to multiple itself to complete different segments of the same question at the same time. If I had a email that requires me to set a date, write a message, give a location and answer back to the sender the AI swarm would multiple itself to complete all the segments at the same time.
NVIDIA, Stanford, and others have developed an AI technique called Test-Time Training (TTT) that turns detailed prompts into minute-long animated clips. It produces dynamic, multi-scene stories with consistent characters—no post-editing needed. The team demoed the tech with Tom and Jerry clips, surpassing the 8–20 second generation limits of earlier methods.
Anthropic's new "Max Plan" targets heavy users managing long chats, large documents, and complex tasks. The $100 and $200 tiers offer 5x and 20x more usage than the Pro plan, plus priority access to new features like upcoming voice support.
OpenAI has filed a countersuit against Elon Musk, accusing him of harassment and disruptive tactics—asking a federal judge to block any further “unlawful and unfair” initiatives, that are designed to harm the company—marking the latest plot twist in the ongoing Musk vs OpenAI legal battle.

Job Opportunities

Hippocratic AI - Staff Machine Learning Engineer, Applied Science - Apply
People AI - Sr. Product Manager, GenAI Sales Apps - Apply
Metropolis - Transformation Office Analyst - Apply
Tempus- Manager, Strategy & Operations - Apply
Glean - Product Design Intern - Apply

AI News

Source: Ideogram

Ideas? Comments? Complaints?

We read your emails, comments and poll replies daily.

Get the most important AI, tech, and science news in a free daily email.
New Here? Subscribe!
Sponsorship Slots Open for April and Reach over 800+ active readers. (Now 40% off) 🤯

Until next time, Stay Informed!

Reply

or to participate.