This week, a largely unknown company, , demonstrated unprecedented speed running open-source LLMs such as Llama-2 (70 billion parameters) at more than 100 tokens per second, and Mixtral at nearly 500 tokens per second per user on Groq’s Language Processing Unit (LPU). Groq For the : comparison “According to Groq, in similar tests, ChatGPT loads at 40–50 tokens per second, and Bard at 70 tokens per second on typical GPU-based computing systems. Context for 100 tokens per second per user — A user could generate a 4,000-word essay in just over a minute.” So: , and where is Groq (such an unfortunate name, given Musk’s Grok is all over the media) coming from? What is LPU, how does it work Remember that game of Go in 2016 when AlphaGo played against the world champion Lee Sedol and won? Well, about a month before the competition, there was a test game which AlphaGo lost. The researchers from DeepMind ported AlphaGo to the Tensor Processing Unit (TPU), and then the computer program was able to win by a wide margin. The realization that computational power was a bottleneck for AI’s potential led to the inception of Groq and the creation of the LPU. This realization came to who initially began what became the TPU project at Google. He started Groq in 2016. Jonathan Ross Unlike other computer chips that do many things at once (parallel processing), the LPU works on tasks one after the other (sequential processing), which is perfect for understanding and generating language. The LPU is a special kind of computer brain designed to handle language tasks very quickly. Imagine it like a relay race where each runner (chip) passes the baton (data) to the next, making everything run super fast. The LPU is designed to overcome the two LLM bottlenecks: compute density and memory bandwidth. Groq took a novel approach right from the start, before even thinking about the hardware. They made sure the software could guide how the chips talk to each other, ensuring they work together seamlessly like a team in a factory. focusing on software and compiler development This makes the LPU really good at processing language efficiently and at high speed, ideal for AI tasks that involve understanding or creating text. This led to a highly optimized system that not only runs circles around traditional setups in terms of speed but does so with greater cost efficiency and lower energy consumption. This is big news for industries like finance, government, and tech, where quick and accurate data processing is key. Now, don’t go tossing out your GPUs just yet! While the LPU is a beast when it comes to inference, making light work of applying trained models to new data, GPUs still reign supreme in the training arena. The LPU and GPU might become the dynamic duo of AI hardware, each excelling in their respective roles. As Elvis Saravia it: “ ” put With breakthroughs in inference and long context understanding, we are officially entering a new era in LLMs. To better understand architecture, Groq offers two papers: from ( ) and ( ). The term “LPU” must be a recent addition to Groq’s narrative, since it’s never mentioned in the papers. 2020 Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads 2022 A Soware-defined Tensor Streaming Multiprocessor for Large-scale Machine Learning Additional read: Compute is also a part of this paper: , which discusses managing AI development through compute control, focusing on its potential for regulation, benefits, and risks, and suggests balanced governance approaches. Computing Power and the Governance of Artificial Intelligence Meanwhile, the U.S. GlobalFoundries, the world’s third-largest contract chipmaker, $1.5 billion to boost semiconductor production, enhancing domestic supply chains, with expansions in New York and Vermont. awards The paper by Berkeley Artificial Intelligence Research (BAIR) argues that “ , and might be one of the most impactful trends in AI in 2024.” published compound AI systems will likely be the best way to maximize AI results in the future News From The Usual Suspects © Y Combinator Since 2009, Y Combinator has published which hints at what “ideas we’d want to see made real, in spaces that we believe will be important in the coming decades.” Request for Startups 20 Big Names Twenty tech giants, including Adobe, Amazon, Google, IBM, Meta, Microsoft, OpenAI, and TikTok, have to take “reasonable precautions” to prevent the misuse of AI in disrupting elections worldwide. agreed OpenAI OpenAI completes a deal that values the company at , nearly tripling its valuation in less than 10 months. $80 billion Models Making Headlines: Aya’s dataset: Introducing Aya: https://arxiv.org/pdf/2402.06619.pdf : This paper introduces Sora, a breakthrough in video generation technology by OpenAI, capable of producing high-fidelity videos. It leverages spacetime patches to handle videos of varying durations and resolutions, making strides toward simulating the physical world with impressive 3D consistency and long-range coherence. Introducing Sora It represents a leap in the ability to create detailed simulations that could be used for a myriad of applications, from entertainment to virtual testing environments . →read the paper Additional read: Take on the . Sora technical report on why he believes Sora is learning physics. Jim Fan Yann LeCun on why Sora world and why “ and doomed to failure as the largely-abandoned idea of “‘analysis by synthesis.’” doesn’t understand the physical modeling the world for action by generating pixel is as wasteful on why “the inner physics model doesn’t generalize to novel situations at all.” Francois Chollet Sora and Gemini 1.5 follow-ups: code-base in context, deepfakes, pixel-peeping, inference costs, and more by . Interconnects (Yann LeCun’s vision of advanced machine intelligence (AMI): Meta’s V-JEPA model revolutionizes unsupervised learning from videos by using feature prediction as its sole objective. This approach bypasses the need for pre-trained image encoders or text annotations, relying instead on the intrinsic dynamics of video data to learn versatile visual representations. Introducing V-Jepa It’s a significant contribution to the field of unsupervised visual learning, promising advancements in how machines understand motion and appearance without explicit guidance . →read the paper : Google DeepMind’s Gemini 1.5 introduces a Mixture-of-Experts architecture, enhancing the model’s performance across a broader array of tasks. Notably, it expands the context window to 1 million tokens, enabling deep analysis over large datasets. Introducing Gemini 1.5 Gemini 1.5 represents a significant step forward in AI’s capability to process and understand extensive contexts, marking a milestone in the development of multimodal models . →read the paper : Stable Cascade from Stability AI introduces a novel text-to-image generation framework that prioritizes efficiency, ease of training, and fine-tuning on consumer-grade hardware. Introducing Stable Cascade The model’s hierarchical compression technique represents a significant reduction in the resources required for training high-quality generative models, providing a pathway for wider accessibility and experimentation in the AI community . →read the paper The Freshest Research Papers, Categorized for Your Convenience Language Understanding and Generation : Explores evaluating Theory-of-Mind reasoning in LLMs, addressing their capability to understand complex social and psychological narratives. OpenToM Read the paper : Demonstrates the capability of NLP models to process exceptionally long documents, pushing the boundaries of document length comprehension. . In Search of Needles in a 10M Haystack Read the paper : Investigates the sensitivity of LLMs to the order of premises, revealing implications for reasoning tasks. . Premise Order Matters in Reasoning with LLMs Read the paper : Uncovers the inherent ability of LLMs to generate reasoning paths, suggesting an alternative to explicit prompting. . Chain-of-Thought Reasoning Without Prompting Read the paper : Addresses the challenge of topic avoidance in LLMs, proposing a novel fine-tuning method for enhanced controllability. . Suppressing Pink Elephants with Direct Principle Feedback Read the paper : Develops an AI-powered writing environment focusing on personalization and increased user control in collaborative writing. . GhostWriter Read the paper Speech and Text-to-Speech Technology : Presents a billion-parameter TTS model, showcasing advancements in speech synthesis through large-scale training. BASE TTS Read the paper Mathematical and Scientific Reasoning : Develops a dataset for math instruction tuning, aiming to improve LLMs’ mathematical reasoning capabilities. . OpenMathInstruct-1 Read the paper : Introduces a specialized LLM for math reasoning, incorporating various techniques for enhanced problem-solving in mathematics. . InternLM-Math Read the paper : Creates the first LLM dedicated to chemistry, transforming structured chemical data into dialogue for diverse chemical tasks. . ChemLLM Read the paper Efficiency and Data Utilization in AI : Proposes sampling methods for enhancing data efficiency in LLM training, optimizing example selection. . How to Train Data-Efficient LLMs Read the paper : Introduces a system for efficient inference of MoE models, leveraging CPU-GPU orchestration for improved performance in resource-limited settings. . FIDDLER Read the paper : Presents an architecture for improving the inference efficiency of LLMs, utilizing a dual-model system for faster and more accurate predictions. . Tandem Transformers Read the paper : Proposes an advanced PTQ algorithm for efficient deployment of large Transformer models on edge devices. . Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers Read the paper Multimodal and Vision-Language Models : Details the first end-to-end multimodal question-answering system with enhanced text understanding from images, advancing MM-LLMs. . Lumos Read the paper Reinforcement Learning and Model Behavior : Addresses reward hacking in RLHF, proposing a method to mitigate verbosity bias in LLMs for more concise and content-focused responses. . ODIN Read the paper : Shows the impact of MoE modules on deep RL networks, enhancing parameter scalability and performance. . Mixtures of Experts Unlock Parameter Scaling for Deep RL Read the paper Operating Systems and Generalist Agents : Proposes a framework for developing generalist computer agents, enabling automation of tasks across different applications with minimal supervision. OS-COPILOT Read the paper Graph Learning and State Space Models : Explores applying State Space Models to graph learning, addressing challenges like over-squashing and long-range dependencies. . Graph Mamba Read the paper Challenges and Innovations in AI : Explores the effects of synthetic data on neural model performance, theorizing potential risks of model collapse with synthetic data reliance. . A Tale of Tails Read the paper : Investigates Transformers’ ability to generalize to longer sequences, highlighting the challenge of maintaining robust performance. . Transformers Can Achieve Length Generalization But Not Robustly Read the paper Also published here

Is 2024 The Year of Advanced Robotics?

Could AI Usher Us to an Era of Quality Journalism?

Subscribe to Turing Post newsletter for free

What is the Language Processing Unit (LPU)? Is It GPU's Rival?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AI Industries Converge: Llama 3 and Electric Atlas Have More In Common Than You Think

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

AI Industries Converge: Llama 3 and Electric Atlas Have More In Common Than You Think

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps