By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
TechgoonduTechgoonduTechgoondu
  • Audio-visual
  • Enterprise
    • Software
    • Cybersecurity
  • Gaming
  • Imaging
  • Internet
  • Media
  • Mobile
    • Cellphones
    • Tablets
  • PC
  • Telecom
Search
© 2023 Goondu Media Pte Ltd. All Rights Reserved.
Reading: Red Hat pitches open-source software for more efficient AI inference
Share
Font ResizerAa
TechgoonduTechgoondu
Font ResizerAa
  • Audio-visual
  • Enterprise
  • Gaming
  • Imaging
  • Internet
  • Media
  • Mobile
  • PC
  • Telecom
Search
  • Audio-visual
  • Enterprise
    • Software
    • Cybersecurity
  • Gaming
  • Imaging
  • Internet
  • Media
  • Mobile
    • Cellphones
    • Tablets
  • PC
  • Telecom
Follow US
© 2023 Goondu Media Pte Ltd. All Rights Reserved.
Techgoondu > Blog > Enterprise > Red Hat pitches open-source software for more efficient AI inference
EnterpriseSoftware

Red Hat pitches open-source software for more efficient AI inference

Alfred Siew
Last updated: June 27, 2025 at 7:08 PM
Alfred Siew
Published: June 27, 2025
5 Min Read
ILLUSTRATION: Unsplash

As infrastructure costs rise and businesses start seeking actual results from their AI investments in the past two years, Red Hat thinks it has the answer in open-source software libraries that make large language models (LLMs) run more efficiently.

The trick, it believes, is to reduce the cost of generating AI outputs and reducing the dependance on Nvidia’s much-sought-after graphics processing units (GPUs) that are used for much of today’s AI heavy lifting.

Through open-source software libraries that are compiled to help run LLMs faster – even on competing hardware like on AMD and Intel – Red Hat is betting that it can boost efficiency enough to overcome today’s bottlenecks and boost AI adoption.

Previously, IBM (Red Hat’s parent company) had been advising customer to go for smaller AI models, said Brian Stevens, the AI chief technology officer (CTO) for Red Hat.

However, businesses can now rely on more bigger models because they won’t have to worry as much about the cost of GPUs to get the job done, he told Techgoondu in an interview in Singapore last week.

“How do we get existing customers to be more efficient? We dropped 30 per cent of inference costs… so they can start a platform for innovation,” he said.

In March, Red Hat launched its AI Inference Server that promises to let businesses generate AI outputs more efficiently.

It packs in software libraries from an open-sourced project called vLLM, which are used to run AI models on different hardware, including custom chips made by Google and Amazon Web Services.

How it improves inference performance is partly by cutting back on GPU memory usage and better allocating various resources to different workloads.

Perhaps more importantly, Red Hat promises to run efficiently on non-Nvidia hardware as well, so there are more hardware choices for, say, a bank to choose from if it is building its own AI infrastructure.

Nvidia’s powerful Cuda software tools, which accelerate the company’s GPUs to run AI workloads, have been instrumental in keeping it in the lead in the past couple of years.

However, if other platforms and accelerators make use of Red Hat’s software tools to gain good performance at a more efficient cost, then they could turn out to be stronger alternatives in future.

“This frees up organisations from worrying about AI infrastructure,” said Stevens. “Instead, you think about how to build your agentic AI app or reasoning service… you don’t have to worry about the platform.”

Nvidia also works with Red Hat on vLLM, he noted, and the development teams have “multiple meetings” every week. “We will make it the best interface for Nvidia.”

Could the current AI gold rush turn out to be like the dot.com boom more than 20 years ago? Back then, Sun Microsystems were the only ones making the powerful servers needed to handle the high traffic volumes for any popular website.

However, it stumbled when cheap servers running commodity Intel chips proved just as powerful, essentially delivering the early cloud computing model that enabled anyone to run a website cheaply.

Could more cheaply available AI servers deliver the same impact now? Stevens, who worked for 14 years at Digital Equipment Corp, a Sun rival, said this could be the way forward.

Doing more with less is great for businesses to unlock the potential of AI that has been elusive for many because of the costs involved, he explained.

A more efficient way forward will benefit those looking to adopt new AI models, such as Meta’s Llama 4 and DeepSeek, that are coming out fast, he noted.

A year from now, inference or the generation of AI outputs and analyses will be cheaper and easier, he noted, because the technology would be more “democratised and commoditised”.

SAP announces in-memory application roadmap
Goondu review: Windows 8 on non-touch PCs
MediaTek Dimensity 9200 chipset promises top-end performance for flagship smartphones
M1’s new mobile remittance service uses MyInfo digital service
Canada making its imprint on global AI map
TAGGED:AIAI Inference ServerBrian StevensgpuinferenceLLMNvidiaopen sourceRed HattopvLLM

Sign up for the TG newsletter

Never miss anything again. Get the latest news and analysis in your inbox.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Whatsapp Whatsapp LinkedIn Copy Link Print
Avatar photo
ByAlfred Siew
Follow:
Alfred is a writer, speaker and media instructor who has covered the telecom, media and technology scene for more than 20 years. Previously the technology correspondent for The Straits Times, he now edits the Techgoondu.com blog and runs his own technology and media consultancy.
Previous Article Owndays x Huawei Eyewear 2 review: Music on your smart glasses
Leave a Comment

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Stay Connected

FacebookLike
XFollow

Latest News

Owndays x Huawei Eyewear 2 review: Music on your smart glasses
Audio-visual Mobile
June 26, 2025
Sony Bravia 5 review: Mini LED TV delivers good movie moments
Audio-visual
June 25, 2025
Stellar Blade (PC) and Nikke DLC review: Voluptuous, vivacious gameplay
Gaming
June 24, 2025
Agentic AI optimism grows in Asia-Pacific but implementation a challenge: Salesforce
Cybersecurity Enterprise
June 23, 2025

Techgoondu.com is published by Goondu Media Pte Ltd, a company registered and based in Singapore.

.

Started in June 2008 by technology journalists and ex-journalists in Singapore who share a common love for all things geeky and digital, the site now includes segments on personal computing, enterprise IT and Internet culture.

banner banner
Everyday DIY
PC needs fixing? Get your hands on with the latest tech tips
READ ON
banner banner
Leaders Q&A
What tomorrow looks like to those at the leading edge today
FIND OUT
banner banner
Advertise with us
Discover unique access and impact with TG custom content
SHOW ME

 

 

POWERED BY READYSPACE
The Techgoondu website is powered by and managed by Readyspace Web Hosting.

TechgoonduTechgoondu
© 2024 Goondu Media Pte Ltd. All Rights Reserved | Privacy | Terms of Use | Advertise | About Us | Contact
Follow Us!
Never miss anything again. Get the latest news and analysis in your inbox.

Zero spam, Unsubscribe at any time.
 

Loading Comments...
 

    Welcome Back!

    Sign in to your account

    Username or Email Address
    Password

    Lost your password?