Cerebras claims world's fastest AI inference

Plus: Google drops trio of souped-up Gemini models for devs.

 

Welcome Humans🤖,

Here is what we have Today:

  • Penny-Pincher`s AI Dream
    Cerebras launched World`s Fastest AI inference Solution

  • Google AI Studio Powered Up

    DeepMind released 3 new Gemini Models

Trending Today
  • GNSS (Global Navigation Satellite System), a satellite tolling tracking device that tracks the car's location on the highway and toll roads sending a signal to a satellite to calculate the distance is now making its way into India and Europe.

  • OpenAI researchers are preparing to launch a new AI model, code-named Strawberry (previously Q*), that demonstrates superior reasoning capabilities in solving complex problems — could be integrated into ChatGPT as soon as this fall. .

  • Google has released three new experimental Gemini 1.5 models, including a compact 8B parameter version, an improved Pro model, and an enhanced Flash model — all available for developers on Google AI Studio.

  • Google Meet has a new “Take Notes for Me” feature that—instead of simply transcribing work meetings, like Meet currently does—provides AI-generated notes, summarizing what people are talking about.

  • Anthropic released the prompts that power its Claude models—some include directives to avoid facial recognition, and to appear “intellectually curious.”

Recommended Reading

News for humans, by humans.

  • Today's news.

  • Edited to be unbiased as humanly possible.

  • Every morning, we triple-check headlines, stories, and sources for bias.

  • All by hand with no algorithms.

AI Model

Cerebras has unveiled a new AI inference solution claiming to be the world's fastest, delivering 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B.

How fast is it really?

  • 20 times faster than traditional GPU-based cloud solutions

  • Powered by the third-generation Wafer Scale Engine (WSE-3)

  • 21 petabytes per second aggregate memory bandwidth

What makes it special?

  • Entire model stored on a single chip

  • 7,000 times more memory bandwidth than leading GPUs like the H100

  • Uses Meta's original 16-bit weights for highest accuracy

How much does it cost?

  • Pricing starts at 10 cents per million tokens for Llama3.1 8B

  • Accessible via an open API for easy integration

What models are supported?

  • Currently supports Llama3.1 8B and 70B

  • Plans to add larger models like Llama3 405B and Mistral Large

Why does it matter?

  • Opens new possibilities for real-time AI applications

  • Could reshape AI development and deployment

  • Addresses long-standing challenges in AI inference speed and cost

What's next?

  • Potential for widespread adoption across various AI applications

  • May influence future AI hardware development

  • Could accelerate advancements in real-time AI decision-making and processing

This breakthrough could significantly impact the AI industry, potentially accelerating the development and deployment of more sophisticated, real-time AI applications.

Job Opportunities
  • Mistral AI - AI Scientist - Internship - Apply

  • Mistral AI - Site Reliability Engineer - Apply

  • Observe AI - Principal Product Designer - Apply

  • Meta - Program Manager - Apply

  • Waymo - Hardware Test Engineer - Apply

  • Databricks - Sr Manager (Specialist), Data & AI - Apply

AI ART

krakatoa in the middle of the amazon forest --v 6.1

Image: Ideogram

Ideas? Comments? Complaints?

We read your emails, comments and poll replies daily.

Get the most important AI, tech, and science news in a free daily email.
New Here? Subscribe!
Sponsorship Slots Open for August and Reach over 800+ active readers. (Now 40% off) 🤯

What`d you think of today`s edition?

Login or Subscribe to participate in polls.

Until next time, Stay Informed!

Reply

or to participate.