• Projects of AI
  • Posts
  • Poetiq claims top AI reasoning spot without building models

Poetiq claims top AI reasoning spot without building models

Plus: NYT sues Perplexity.

Together with

Welcome Humans🤖,

Here is what we have Today:

  • 🚀 Six-person Poetiq beats Gemini on reasoning benchmark.

  • ⚖️ NYT goes to war with AI: Perplexity sued

  • 👁️ Ultra-personalized AI across all Google apps raises privacy concerns.

  • 🤖 Claude becomes research interviewer in Anthropic's bold experiment.

  • 💼 New Job Opportunities

POETIQ

🏆 Poetiq Beats Google on ARC-AGI-2 — With a Twist

A six-person AI startup just outsmarted the giants.
Poetiq officially took #1 on the ARC-AGI-2 reasoning benchmark, outperforming Google’s Gemini 3 Deep Think — while spending half as much and without building a model of their own.

Image source: POETIQ

🔍 How Poetiq Pulled This Off

  • Adapts to new models in hours, hitting top scores shortly after Gemini 3 launched — no retraining needed

  • 🧠 Built on Gemini 3 Pro, Poetiq scored 54% at $30/task

  • 🆚 Google’s Deep Think variant: 45% at $77/task

  • 🔓 First system ever to break the 50% barrier on ARC-AGI-2

  • 📈 Just six months ago, models struggled to reach 5%

🔄 How It Works

Poetiq uses an open-source refinement engine:

  • LLMs repeatedly improve their own answers

  • A built-in self-auditing loop checks quality

  • Smarter orchestration = smarter outputs

No giant model.
No giant compute budget.
Just clever AI engineering.

🌍 Why It Matters

ARC-AGI-2 jumping from sub-5% to 50%+ in months shows how fast reasoning AI is accelerating.
Poetiq proves there are two tracks to rapid AI progress:
1️⃣ Frontier models
2️⃣ Smarter orchestration on top of them

And the second path is now wide open — even for small teams.

Together with Beehiiv

You can (easily) launch a newsletter too

This newsletter you couldn’t wait to open? It runs on beehiiv — the absolute best platform for email newsletters.

Our editor makes your content look like Picasso in the inbox. Your website? Beautiful and ready to capture subscribers on day one.

And when it’s time to monetize, you don’t need to duct-tape a dozen tools together. Paid subscriptions, referrals, and a (super easy-to-use) global ad network — it’s all built in.

beehiiv isn’t just the best choice. It’s the only choice that makes sense.

ANTHROPIC

📝 Anthropic Turns Claude Into a Research Interviewer

Anthropic just gave Claude a new job title: research interviewer.
Their new tool — Anthropic Interviewer — runs full-scale qualitative interviews and analyzes them automatically. And its first project is huge: 1,250 professionals sharing how AI is shaping their work lives.

Image Source: Anthropic

🔍 What This New Tool Actually Does

  • 🧠 Plans interview questions

  • 🎙️ Runs 10–15 min conversations at scale

  • 🗂️ Clusters themes + insights for human researchers

  • 📊 Delivers full analysis — fast

It’s basically a full research team in one Claude-powered system.

📣 What Workers Actually Said

  • 86%: AI saves time

  • 😬 69%: There's social stigma around using AI

  • 😟 55%: Concerned about AI’s future impact on their jobs

💡 Breakdown by role:

  • Creatives 🎨

    • Often hide their AI use

    • Fear their work will be replaced

  • Scientists 🔬

    • Want AI as a “research partner”

    • Still don’t fully trust current models

Anthropic is releasing all 1,250 transcripts publicly and plans ongoing studies to track how our relationship with AI evolves over time.

🌍 Why This Matters

Companies usually rely on dashboards, metrics, and feedback forms.
But open-ended interviews — scaled massively by Claude — reveal what people feel, not just what they click.

The early takeaway?
People are using AI more than ever, but many are unsure how it affects their reputation, identity, and future.

Presented by Guidde

From Boring to Brilliant: Training Videos Made Simple

Say goodbye to dense, static documents. And say hello to captivating how-to videos for your team using Guidde.

1️⃣ Create in Minutes: Simplify complex tasks into step-by-step guides using AI.
2️⃣ Real-Time Updates: Keep training content fresh and accurate with instant revisions.
3️⃣ Global Accessibility: Share guides in any language effortlessly.

Make training more impactful and inclusive today.

The best part? The browser extension is 100% free.

Trending Today
  • Bloom is a new AI-powered content creation platform that scans your website, extracts your styles, and creates brand assets in seconds. This tool can help you create graphics for your website, social media accounts, or marketing efforts.

  • Google is moving toward ultra-personalized AI that learns from your activity across Gmail, Search, Maps, Photos, Calendar, Drive, and your browsing history. Robby Stein, Google’s VP of Product for Search, says the company sees its biggest AI opportunity in tailoring advice and recommendations to each user’s preferences. Gemini already pulls in emails, documents, photos, and location data to shape answers, which could make results more useful but also raises privacy and consent concerns.

  • EngineAI has released a new video to prove its combat-ready T800 humanoid robot is real after days of criticism that its earlier clips were CGI. The company showed its CEO wearing protective gear and taking a full-force kick from the 75kg robot, aiming to silence doubts about the machine’s balance and power. The T800 uses actuators rated at 450 N·m of torque, which explains the impact seen in the clip but also raises safety concerns.

  • The New York Times has sued Perplexity for copyright infringement, its second major lawsuit against an AI company. The Times says Perplexity uses its articles in answers and summaries without permission, sometimes copying passages nearly verbatim and even attributing false information to the outlet. The case adds to a growing list of publishers accusing Perplexity of scraping paywalled or restricted content. Perplexity has tried to ease tensions with revenue-sharing programs and licensing deals, but the Times says it has repeatedly asked the startup to stop using its work unless a formal agreement is reached.

    Can you scale without chaos?

    It's peak season, so volume's about to spike. Most teams either hire temps (expensive) or burn out their people (worse). See what smarter teams do: let AI handle predictable volume so your humans stay great.

    Check your options

Job Opportunities
  • xAI - Network Engineer - Edge - Apply

  • Shield AI - Senior Flight Test Engineer - Apply

  • Waymo - Product Manager, Driving Behaviors - Apply

  • DeepL - People Operations Service & Support Manager - Apply

  • Metropolis - Manager, Technical Operations - Apply

AI News

Click on the image to know full story 👇️ 

Source: Ideogram

Ideas? Comments? Complaints?

We read your emails, comments and poll replies daily.

Get the most important AI, tech, and science news in a free daily email.
New Here? Subscribe!
Sponsorship Slots Open for December and Reach over 900+ active readers. (Now 40% off) 🤯

Until next time, Stay Informed!

Reply

or to participate.