Listen to this story
Got Tech, Data, AI and Media, and he's not afraid to use them.
Between Two Computer Monitors: This story includes an interview between the writer and guest/interviewee.
The best podcasts on the Internet archived and shared on HackerNoon.
Hot off the press! This story contains factual information about a recent event.
It’s early 2025, and we may already be witnessing a redefining moment for AI as we’ve come to know it in the last couple of years. Is the canon of “more GPUs is all you need” about to change?
What an unusual turn of events. First, the Stargate Project. The joint venture created by OpenAI, SoftBank, Oracle, and investment firm MGX is aiming to invest up to US$500 billion in AI infrastructure in the United States by 2029.
Arm, Microsoft, Nvidia, Oracle, and OpenAI are the key initial technology partners in what has been dubbed “the Manhattan project of the 21st century”, with direct support from the US administration. President Donald Trump called it “the largest AI infrastructure project in history”.
The list of leading US-based technology partners in the project and the vast investment in what has been a strategic initiative for the US – AI infrastructure to secure leadership in AI – is what’s driving the parallelism to the Manhattan project.
Both AI chip makers in the list – Arm and Nvidia – are led by CEOs of Taiwanese origins. That is notable, considering Taiwan’s ongoing tense relations with China, and the fact that the Stargate Project is the latest in a lineage of recent US policies aiming to invigorate domestic AI infrastructure and know-how while imposing limitations to the rest of the world, primarily China.
However, none of that mattered for the market, which sent Nvidia’s stock soaring for yet another time in the last couple of years at the announcement of the Stargate Project. But that was all before the release of DeepSeek R1.
DeepSeek R1 is a new open-source reasoning model, released just days after the announcement of the Stargate Project. The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI’s ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost.
What is remarkable about DeepSeek R1 is that it has been developed in China, despite all the restrictions on AI chips meant to hamper the ability to make progress on AI. Does that mean that the OpenAI- and US-centric conventional wisdom of “more GPUs is all you need” in AI is about to be upended?
Truth is, when we arranged a conversation on AI chips with Chris Kachris a few days ago, neither the Stargate Project nor DeepSeek R1 had burst onto the AI scene. Even though we did not consciously anticipate these developments, we knew AI chips is a topic that deserves attention, and Kachris is an insider.
It’s become somewhat of a tradition for Orchestrate all the Things to analyze AI chips and host insights from experts in the field, and the conversation with Kachris is the latest piece in this tradition.
Chris Kachris is the founder and CEO of InAccel. InAccel that helps companies speedup their applications using hardware accelerators in the cloud easier than ever. He is also a widely cited researcher with more than 20 years of experience on FPGAs and hardware accelerators for machine learning, network processing and data processing.
After InAccel was recently acquired by Intel, Kachris went back to research, currently working as an Assistant Professor in the Department of Electrical and Electronics Engineering at the University of West Attica.
When setting the scene for the conversation with this timely news, Kachris’ opening remark was that innovation in AI chips is an “expensive sport”, which is why it mostly happens in the industry as opposed to academia. At the same time, however, he noted that the resources needed does not only come down to money, but that also entails talent and engineering.
For Kachris, US policies have been on the right track in terms of their aim to repatriate expertise and make the country self-sufficient. Being a European citizen, he also called for the EU to apply similar initiatives, along many voices calling for the EU to step up its GPU game. Would looking at how DeepSeek’s success was achieved, however, have anything to teach us?
According to the “Generative AI in the BRICS+ Countries” report, unlike other BRICS countries, China uses both foreign graphics cards (via the cloud and in its own data centers) and local cards made by Chinese companies.
Currently, there are more than 10 companies in China that are developing their own graphics cards, and the process of switching to local GPUs after using NVIDIA is reportedly not difficult for Chinese companies.
It seems like in order to stay competitive in the AI race, nations will have to reconsider their options, potentially borrowing pages from China’s playbook. Kachris concurred that China has been progressing in leaps and bounds, first imitating and then developing innovative techniques of its own.
“They can mix and match. They can combine different versions of GPUs and other processing units in order to create a powerful data center or cloud. This is very useful, especially if you think that in the past, you had to buy new equipment every three or four years maybe.
Now the innovation is so fast that almost every year, you have more and more powerful chips and more powerful processors. Does it make sense to throw away processors that are one or two years old? So definitely, you need to find a way to utilize resources, even if it is heterogeneous resources. This would be much more cost efficient”, said Kachris.
DeepSeek R1’s reported cost to train is a strong argument in support of this approach. In addition to training on heterogeneous infrastructure, DeepSeek’s approach included reducing numerical precision, multi-token reading capability, and applying an intelligent Mixture of Experts technique.
The result is slashing training costs from $100 million to around $5 million and reducing hardware needs from 100,000 GPUs to a mere 2,000, making AI development accessible on standard gaming GPUs. What’s more, even if DeepSeek is not 100% open source – whatever that means for LLMs – its process can be replicated.
AI Chips and open source AI models are part of the comprehensive Pragmatic AI Training.
Theory and hands-on labs. All-inclusive retreat. Limited seats cohort.
Click here to register for the Pragmatic AI Training
The immediate reaction to the news was a selloff rally, with Nvidia’s stock going down 17% following the news. The market has already started course-correcting at the time of writing, with both the downwards and upwards trends being somewhat predictable.
On the one hand, what DeepSeek demonstrated was that there is lots of room for efficiency gains in training top performing AI models, actively undermining the conventional wisdom. On the other hand, that doesn’t mean that Nvidia isn’t still the leader, and we can expect to see Jevon’s paradox in action once again.
Nvidia kept the pace of innovation in 2024, announcing and subsequently shipping its latest Blackwell architecture, expanding its ecosystem and hitting multiple financial and business milestones. Kachris highlighted that Nvidia is not just selling chips anymore, but they’ve moved towards vertical integration of their NVLink technology with their chips on the DGX platform.
But Nvidia GPUs are not the only game in town. AMD on its part announced a new AI accelerator, the Instinct MI325X. As Kachris noted, the MI300 series is very powerful, featuring specialized units to accelerate transformers – a key architecture for Large Language Models. AMD’s growth is purportedly driven by data center and AI products.
The vast majority of people and organizations will be AI users, not AI builders. For them, using or even building AI applications is not really a matter of training their own model, but rather using or fine-tuning a pre-trained model.
Kachris also called out Intel’s progress with Gaudi. Despite the high performance capabilities of Gaudi 3, however, Intel seems to be behind in terms of market share, largely due to software. At the same time, Intel is making moves to sell its FPGA unit, Altera.
FPGAs, Kachris maintains, may not be the most performant solution for AI training, but they make lots of sense for inference, and this is where there is ample room for competition and innovation. It’s precisely this – building a software layer to work with FPGAs – that InAccel was working on, and what led to the acquisition by Intel.
Naturally, Kachris emphasized the importance of the software layer. At the end of the day, even if a chip has superior performance, if it’s not easy to use for developers via the software layer, that is going to hinder adoption. Nvidia maintains the significant advantage on the software layer due to its ubiquitous CUDA stack, which it keeps investing in.
The rest of the industry, led by Intel via the UXL Foundation / OneAPI initiative, is making efforts to catch up. AMD has its own software layer – ROCm. But catching up is not going to happen overnight. As Kachris put it, the software layer has to enable using the hardware layer without changing a single line of code.
Nvidia is ramping up its inference and software strategy too with its newly released NIM framework, which seems to have gained some adoption. The competition is also focusing on inference. There’s a range of challengers such as Groq, Tenstorrent, GraphCore, Cerebras and SambaNova, vying for a piece of the inference market pie.
While DeepSeek is a prominent display of the benefits of optimization, it’s also not the only one. Kachris was involved in a recent comprehensive survey and comparison of hardware acceleration of LLMs, with many of those geared towards inference.
One way to go about it is to do this via AI provider APIs – typically OpenAI or Anthropic. For more sophisticated use cases, however, for reasons having to do with privacy, compliance, competitive advantage, application requirements or cost, end users will want to deploy AI models on their own infrastructure.
Gary Marcus points our five things most people don’t seem to understand about DeepSeek
That may include a whole range of environments, ranging from on premise and private cloud to edge and bare metal. Especially with LLMs, there is even the option to run them locally on off the shelf machines. We asked Kachris whether he believes that local / edge deployment of LLMs makes sense.
Kachris noted that inference may work with “shrinked”, aka quantized versions of AI models. Research suggests that even 1-bit versions of models are viable. Kachris pointed out that even though there are specialized hardware architectures, from the ones broadly available GPUs and FPGAs provide the best performance, with FPGAs being more energy efficient.
As far as future developments go, Kachris highlighted in-memory computing as an area to keep an eye on. The main idea is being able to combine storage and compute on the same unit, thus eliminating the need for data transfer and leading to better performance. That is inspired by the way biological neural networks work, and is referred to as neuromorphic computing.
There are more areas of noteworthy developments, such as chiplets, specialized chips tailored for the transformer architecture that powers LLMs, photonic technology and new programming languages for AI.
In terms of more short to mid term prospects, and the question of whether there is room for innovation in an Nvidia-dominated world, Kachris believes that embedded systems and Edge AI represent an opportunity for challengers:
“There are different requirements and different specifications in the domain of Edge AI. I think there is room for innovation in Edge AI, for example in video applications for hospitals, or autonomous driving and aviation.
I think it’s going to happen. Let’s talk about GPUs. So NVIDIA is the leader in GPUs, but there was a lack of GPUs for wearable devices. And we saw a great company, Think Silicon, stepping up and developing a GPU that is specialized for fit bands or smartwatches, and then being acquired by Applied Materials.
Innovation is going to happen in areas that are too small for companies like Nvidia or Intel, but good enough for smaller companies that can make specialized products”.
Stories about how Technology, Data, AI and Media flow into each other shaping our lives.
Analysis, Essays, Interviews and News. Mid-to-long form, 1-3 times per month.