
After seeing rival Nvidia run away with the AI game through its graphics processing units (GPUs) and more importantly, tightly integrated software tools, AMD has hit back recently with an alterative that promises great performance without the “lock-in” that many businesses are increasingly worried about.
The AMD Helios system, unveiled last month, brings a large number of AMD GPUs into play with the chipmaker’s already-popular CPUs for the data centre. Plus, a crucial sweetener – a more open technology stack.
This, according to Alexey Navolokin, general manager for Asia-Pacific at AMD, is crucial for the next step in AI, when it’s not just the CPUs and GPUs that have to work together but also other components such as memory and networking that have to catch up to the processors’ groundbreaking performance.
Heterogenous and open systems can bring that performance to customers, especially with an open software stack such as AMD’s ROCm, he argues.
“What we hear consistently is that they want performance now, without being locked into a single proprietary stack long term,” he tells Techgoondu in this month’s Q&A.
NOTE: Responses have been edited for style.
Q: There has been a scramble for not just GPUs but also flash memory chips of late. Briefly, how would you describe the market for AI data centre hardware this year?
A: We are seeing unprecedented demand across the entire AI data centre stack, not just GPUs, but memory, networking, storage and power-efficient CPUs as well.
AI training and inference are scaling rapidly across data centres, while those same systems still need to support mission-critical, traditional workloads.
That combination is driving customers to look for balanced, scalable platforms that can grow AI capacity without forcing them to redesign their existing infrastructure.
AMD is addressing this by delivering the open compute foundation for this next phase of AI with our broad portfolio of AI-optimised CPUs, GPUs, networking and software that helps customers deploy AI with confidence today and scale as their compute needs evolve.
Q: One AI platform (from your rival) has locked in many customers but it also seems to be the straightest path to get the AI performance much needed now. How do you convince them to move, say, to an “open” standard?
A: Customers are under pressure to deploy AI quickly today, but they are also thinking about how those deployments scale over time. What we hear consistently is that they want performance now, without being locked into a single proprietary stack long term.
Open standards and open software give customers that flexibility. With platforms built on industry standards and the AMD ROCm open software stack, customers can move from prototype to production faster, integrate across cloud and on-prem environments, and avoid costly re-architecture as workloads evolve while tailoring their solution to their needs.
That’s why AMD invests in an end-to-end open ecosystem, spanning leadership silicon, the ROCm open software stack, and rack-scale systems built on industry standards, ensuring that the open AI building blocks are not just accessible but performant and enterprise-grade
Q: How ready is a heterogeneous platform with compute, networking and storage that are linked up with high-bandwidth connections? With the software to support as well?
A: High-bandwidth, tightly integrated compute platforms are no longer optional for AI at scale, instead they are foundational. Customers need systems where compute, networking and memory are designed together, and where software is ready to support large, distributed workloads from Day One.
AMD is prepared for this shift with our upcoming “Helios” rack-scale platform, which brings together AMD EPYC CPUs, AMD Instinct GPUs, AMD Pensando networking and the ROCm open software stack into a cohesive, scalable solution no matter the workload.
Over the last decade, we have redefined how we engineer, accelerating time-to-market while maximising IP (intellectual property) leverage across products and platforms.
We have consistently maintained a focus on R&D, co-optimised design cycles, and sharing IP across product lines, which has built a durable model for our competitive roadmap, rapid innovation, and consistent execution.
The platform is designed to simplify deployment of large AI clusters, improve interconnect efficiency, and reduce time to solution. All this, while giving customers the flexibility to scale across cloud, enterprise and research environments with confidence.
Q: How does AMD’s purchase of Xilinx, for example, add to your strategy to develop a fuller stack that provides the performance for AI workloads?
A: We recognise that AI systems are becoming increasingly complex, and there is a growing demand from customers for full-stack solutions that span training, inference, networking and deployment across cloud, data centre and edge environments.
As part of our long-term AI strategy to deliver leadership training and inferencing solutions, we have significantly expanded our investments over the last few years, both organically and through strategic acquisitions.
The acquisition of Xilinx significantly strengthened our portfolio by adding adaptive computing, while the acquisition of ZT Systems expanded our capabilities in rack- and data centre-scale design.
Combined with continued investments in our AI software ecosystem, this enables AMD to deliver a more complete, scalable platform that meets customers’ performance needs today while giving them flexibility as workloads evolve.
