
As infrastructure costs rise and businesses start seeking actual results from their AI investments in the past two years, Red Hat thinks it has the answer in open-source software libraries that make large language models (LLMs) run more efficiently.
The trick, it believes, is to reduce the cost of generating AI outputs and reducing the dependance on Nvidia’s much-sought-after graphics processing units (GPUs) that are used for much of today’s AI heavy lifting.
Through open-source software libraries that are compiled to help run LLMs faster – even on competing hardware like on AMD and Intel – Red Hat is betting that it can boost efficiency enough to overcome today’s bottlenecks and boost AI adoption.
Previously, IBM (Red Hat’s parent company) had been advising customer to go for smaller AI models, said Brian Stevens, the AI chief technology officer (CTO) for Red Hat.
However, businesses can now rely on more bigger models because they won’t have to worry as much about the cost of GPUs to get the job done, he told Techgoondu in an interview in Singapore last week.
“How do we get existing customers to be more efficient? We dropped 30 per cent of inference costs… so they can start a platform for innovation,” he said.
In March, Red Hat launched its AI Inference Server that promises to let businesses generate AI outputs more efficiently.
It packs in software libraries from an open-sourced project called vLLM, which are used to run AI models on different hardware, including custom chips made by Google and Amazon Web Services.
How it improves inference performance is partly by cutting back on GPU memory usage and better allocating various resources to different workloads.
Perhaps more importantly, Red Hat promises to run efficiently on non-Nvidia hardware as well, so there are more hardware choices for, say, a bank to choose from if it is building its own AI infrastructure.
Nvidia’s powerful Cuda software tools, which accelerate the company’s GPUs to run AI workloads, have been instrumental in keeping it in the lead in the past couple of years.
However, if other platforms and accelerators make use of Red Hat’s software tools to gain good performance at a more efficient cost, then they could turn out to be stronger alternatives in future.
“This frees up organisations from worrying about AI infrastructure,” said Stevens. “Instead, you think about how to build your agentic AI app or reasoning service… you don’t have to worry about the platform.”
Nvidia also works with Red Hat on vLLM, he noted, and the development teams have “multiple meetings” every week. “We will make it the best interface for Nvidia.”
Could the current AI gold rush turn out to be like the dot.com boom more than 20 years ago? Back then, Sun Microsystems were the only ones making the powerful servers needed to handle the high traffic volumes for any popular website.
However, it stumbled when cheap servers running commodity Intel chips proved just as powerful, essentially delivering the early cloud computing model that enabled anyone to run a website cheaply.
Could more cheaply available AI servers deliver the same impact now? Stevens, who worked for 14 years at Digital Equipment Corp, a Sun rival, said this could be the way forward.
Doing more with less is great for businesses to unlock the potential of AI that has been elusive for many because of the costs involved, he explained.
A more efficient way forward will benefit those looking to adopt new AI models, such as Meta’s Llama 4 and DeepSeek, that are coming out fast, he noted.
A year from now, inference or the generation of AI outputs and analyses will be cheaper and easier, he noted, because the technology would be more “democratised and commoditised”.