Projects of AI
Posts
Cerebras claims world's fastest AI inference

Cerebras claims world's fastest AI inference

Plus: Google drops trio of souped-up Gemini models for devs.

Amandeep Singh
August 27, 2024

Welcome Humans🤖,

Here is what we have Today:

Penny-Pincher`s AI Dream
Cerebras launched World`s Fastest AI inference Solution
Google AI Studio Powered Up
DeepMind released 3 new Gemini Models

Trending Today

GNSS (Global Navigation Satellite System), a satellite tolling tracking device that tracks the car's location on the highway and toll roads sending a signal to a satellite to calculate the distance is now making its way into India and Europe.
OpenAI researchers are preparing to launch a new AI model, code-named Strawberry (previously Q*), that demonstrates superior reasoning capabilities in solving complex problems — could be integrated into ChatGPT as soon as this fall. .
Google has released three new experimental Gemini 1.5 models, including a compact 8B parameter version, an improved Pro model, and an enhanced Flash model — all available for developers on Google AI Studio.
Google Meet has a new “Take Notes for Me” feature that—instead of simply transcribing work meetings, like Meet currently does—provides AI-generated notes, summarizing what people are talking about.
Anthropic released the prompts that power its Claude models—some include directives to avoid facial recognition, and to appear “intellectually curious.”

News for humans, by humans.

Today's news.
Edited to be unbiased as humanly possible.
Every morning, we triple-check headlines, stories, and sources for bias.
All by hand with no algorithms.

AI Model

🧠 Cerebras Shatters AI Inference Speed Limits

Cerebras has unveiled a new AI inference solution claiming to be the world's fastest, delivering 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B.

How fast is it really?

20 times faster than traditional GPU-based cloud solutions
Powered by the third-generation Wafer Scale Engine (WSE-3)
21 petabytes per second aggregate memory bandwidth

What makes it special?

Entire model stored on a single chip
7,000 times more memory bandwidth than leading GPUs like the H100
Uses Meta's original 16-bit weights for highest accuracy

How much does it cost?

Pricing starts at 10 cents per million tokens for Llama3.1 8B
Accessible via an open API for easy integration

What models are supported?

Currently supports Llama3.1 8B and 70B
Plans to add larger models like Llama3 405B and Mistral Large

Why does it matter?

Opens new possibilities for real-time AI applications
Could reshape AI development and deployment
Addresses long-standing challenges in AI inference speed and cost

What's next?

Potential for widespread adoption across various AI applications
May influence future AI hardware development
Could accelerate advancements in real-time AI decision-making and processing

This breakthrough could significantly impact the AI industry, potentially accelerating the development and deployment of more sophisticated, real-time AI applications.

Job Opportunities

Mistral AI - AI Scientist - Internship - Apply
Mistral AI - Site Reliability Engineer - Apply
Observe AI - Principal Product Designer - Apply
Meta - Program Manager - Apply
Waymo - Hardware Test Engineer - Apply
Databricks - Sr Manager (Specialist), Data & AI - Apply

AI ART

krakatoa in the middle of the amazon forest --v 6.1

Image: Ideogram

Ideas? Comments? Complaints?

We read your emails, comments and poll replies daily.

Get the most important AI, tech, and science news in a free daily email.
New Here? Subscribe!
Sponsorship Slots Open for August and Reach over 800+ active readers. (Now 40% off) 🤯

What`d you think of today`s edition?

Until next time, Stay Informed!

Reply

or to participate.