• Projects of AI
  • Posts
  • Microsoft shows AI struggles with the Debugging of Code

Microsoft shows AI struggles with the Debugging of Code

Plus: NVIDIA animates Tom & Jerry.

Together with

Welcome HumansšŸ¤–,

Here is what we have Today:

  • šŸ’» Microsoft confirms AI still sucks at debugging.

  • šŸ¤– Samsung's Gemini powers long-awaited Ballie finally rolls into reality

  • šŸ± NVIDIA's AI creates minute-long animated stories from prompts.

  • āš–ļø OpenAI sues Musk back.

  • šŸ’¼ New Job Opportunities

SAMSUNG

Samsung just dropped a major update on Ballieā€”the rolling, projector-equipped home robot itā€™s been teasing since CES 2020.

And now? Itā€™s finally realā€¦ and itā€™s got Googleā€™s Gemini AI under the hood.

Image Source: Samsung

Whatā€™s the deal?

Ballie is a smart home assistant the size of a soccer ball that can:

  • Roam your home on its own

  • Project videos on walls or ceilings

  • Control smart devices

  • Respond to voice commands

And itā€™s not just running Samsungā€™s AIā€”itā€™s combining forces with Gemini. That means multimodal AI, capable of understanding voice, visuals, and even environmental context.

Where and when?
Ballie rolls out this summer in the U.S. and South Korea. Third-party app support is coming too, which could unlock a whole new world of skills.

Why it matters:
The home robot race is heating up fast, and Samsungā€™s no stranger to smart ecosystems. With SmartThings already in millions of homes, Ballie could be the first truly useful home robotā€”not just a novelty.

Add Googleā€™s Gemini into the mix, and youā€™ve got a seriously intelligent sidekick.

Together with Superhuman AI

Find out why 1M+ professionals read Superhuman AI daily.

AI won't take over the world. People who know how to use AI will.

Here's how to stay ahead with AI:

  1. Sign up for Superhuman AI. The AI newsletter read by 1M+ pros.

  2. Master AI tools, tutorials, and news in just 3 minutes a day.

  3. Become 10X more productive using AI.

AI RESEARCH

Even the smartest AI still canā€™t fix your broken code.

A new Microsoft Research study just droppedā€”and it shows AI agents, powered by the most advanced models out there, are still struggling with one of software engineeringā€™s most essential skills: debugging.

Image Source: Microsoft

Hereā€™s what they did:

Microsoft took nine leading LLMsā€”including Claude 3.7 Sonnetā€”and threw them into the deep end with 300 real-world debugging tasks from the SWE-bench Lite dataset.

And the result?

Even the best models flopped on half the problems.

Claude 3.7 Sonnet came out on topā€”when paired with debugging toolsā€”solving 48.4% of the issues. Not exactly a victory lap. OpenAIā€™s o1 and o3-mini? They managed just 30.2% and 22.1%, respectively.

So, why the struggle?
Turns out, these models were never really trained to think like human debuggers. Microsoft found that they lacked exposure to sequential decision-makingā€”aka the step-by-step logic real developers use to fix code.

Why it matters:
With billions flowing into AI coding agents from Google, Meta, and others, this study is a hard dose of reality. While AIs are great at spitting out code, they're still not ready to fix it when things break. And until that changes, human devs arenā€™t out of a job anytime soon.

Trending Today
  • AI Swarm is an AI system that is able to multiple itself to complete different task at the same time. This AI is made to multiple itself to complete different segments of the same question at the same time. If I had a email that requires me to set a date, write a message, give a location and answer back to the sender the AI swarm would multiple itself to complete all the segments at the same time.

  • NVIDIA, Stanford, and others have developed an AI technique called Test-Time Training (TTT) that turns detailed prompts into minute-long animated clips. It produces dynamic, multi-scene stories with consistent charactersā€”no post-editing needed. The team demoed the tech with Tom and Jerry clips, surpassing the 8ā€“20 second generation limits of earlier methods.

  • Anthropic's new "Max Plan" targets heavy users managing long chats, large documents, and complex tasks. The $100 and $200 tiers offer 5x and 20x more usage than the Pro plan, plus priority access to new features like upcoming voice support.

  • OpenAI has filed a countersuit against Elon Musk, accusing him of harassment and disruptive tacticsā€”asking a federal judge to block any further ā€œunlawful and unfairā€ initiatives, that are designed to harm the companyā€”marking the latest plot twist in the ongoing Musk vs OpenAI legal battle.

Recommended Reading
If WE had to recommend other newsletters

AI Tool Report is an Newsletter, we enjoy reading daily. AI Tool Report delivers top techniques on how AI can transform your business filled with practical tips, real-world examples.

A Smart Bear is for people who love thinking about strategy, startups, product, marketing, decision-making, and founder psychology.

Job Opportunities
  • Hippocratic AI - Staff Machine Learning Engineer, Applied Science - Apply

  • People AI - Sr. Product Manager, GenAI Sales Apps - Apply

  • Metropolis - Transformation Office Analyst - Apply

  • Tempus- Manager, Strategy & Operations - Apply

  • Glean - Product Design Intern - Apply

AI News

Source: Ideogram

Ideas? Comments? Complaints?

We read your emails, comments and poll replies daily.

Get the most important AI, tech, and science news in a free daily email.
New Here? Subscribe!
Sponsorship Slots Open for April and Reach over 800+ active readers. (Now 40% off) šŸ¤Æ

What`d you think of today`s edition?

Login or Subscribe to participate in polls.

Until next time, Stay Informed!

Reply

or to participate.