The illusion of “five nines”: Why adding more servers won't stop the next outage

In association with Thales

By Daniel Toh

Digital businesses today depend entirely on Web apps, application programming interfaces (APIs), and online services as their primary customer touchpoints.

In this environment, the traditional definition of “business continuity” has shifted; it is no longer about disaster recovery after a complete stop, but the ability to deliver products and services during disruptive events, including cyberattacks and infrastructure failures.

The true measure of risk is no longer if a failure occurs, but whether the architecture is resilient enough that customers barely notice the disruption.

As we look towards 2026, resilience is the new uptime imperative: Designing application security and operations so the business keeps running even when systems are under stress.

Deconstructing high availability versus fault tolerance

To build a resilient architecture, IT leaders must first distinguish between two terms often used interchangeably: High Availability (HA) and Fault Tolerance (FT). HA systems are designed to be resilient to failure and recover quickly to avoid service loss. FT systems, on the other hand, uses redundancy so services continue operating even after a component failure.

In an ideal situation, FT should complement HA rather than replace it. Think of a twin-engine aircraft; if one engine fails, the other keeps the plane flying (FT), allowing it to land safely (HA).

True resilience requires a “specialist” architectural philosophy. It is a common misconception that more servers equal better reliability. Take Google, for example – industry leaders like them are selective about where they build data centres, reinforcing the philosophy that reliability relies on strategically placed infrastructure engineered for quality and capacity, instead of thousands of Points of Presence (PoPs).

In contrast, some security vendors have expanded into general cloud computing and edge services to compete with hyperscalers. While this lets them have broader offerings to the market, mixing security functions with general workloads creates “noisy neighbour” risks, cross-service complexity, and cascading failures.

When attacks look like outages

The reason architecture matters more than ever is that modern outages are just as likely to be caused by application-layer attacks as by hardware failure. Today’s threat landscape includes multiple vectors that mimic infrastructure collapse:

DDoS attacks: Layer 3, 4, and 7 attacks can make networks and domain name servers (DNS) unreachable, resulting in significant revenue loss – often measured in tens of thousands of dollars per hour for digital-first services.
Malicious bots: Bad bots now make up 37 per cent of all internet traffic, according to the 2025 Imperva Bad Bot Report. When these bots engage in scraping and credential stuffing, they exhaust application resources (like login or search functions), distorting traffic and severely impacting performance for real users.
API Abuse: APIs are now the “nervous system” of digital business. Logic flaws and broken authorisation can lead to outages just as damaging as a full site crash.
Operational limits: Organisations also face self-inflicted outages due to IT-related activities like patching operating systems, for example. The inability to patch zero-day threats (like React2Shell) fast enough forces emergency maintenance cycles that introduce risk and instability.

The solution: Unified, single-stack architecture

To combat these hybrid threats, we propose a Unified Web Application & API Protection (WAAP) approach. Fewer gaps between point products mean fewer weak links that an attacker can exploit to take a system offline.

This is where a single-stack approach matters when it comes to resilience, because it allows for shared telemetry and consistent policies across web apps, APIs, and microservices simultaneously. When security and delivery are integrated, the architecture is able to adapt in real-time.

Organisations should also consider the importance of a resilient Content Delivery Network (CDN) and Edge Architecture. The use of Anycast routing and intelligent caching means that attacks or localised infrastructure issues at any one node do not necessarily translate into a global outage.

To ensure optimal performance, security functions should execute within their own containerised paths, avoiding the complexity and resource-contention of multi-purpose networks.

Crucially, a WAAP enables virtual patching at the edge. This allows teams to block exploits immediately, buying time to assess critical impact and fix code safely, without taking the application offline for emergency maintenance.

Moving from SLAs to business resilience

Ultimately, business and security leaders must look beyond standard provider SLAs, such as 99.999 per cent uptime. Instead, they should define Service Level Objectives (SLOs) that map directly to business impact, such as maximum acceptable downtime for critical customer journeys.

To achieve this, chief information security officers (CISOs) should consider a pragmatic four-step roadmap:

Assess: Use management frameworks to map critical apps and align security controls to those priorities.
Consolidate: Move from fragmented point products to integrated platforms for simpler, more resilient operations.
Engage: Bring in specialist expertise to accelerate maturity and maintain resilience over time.
Iterate: Treat resilience as a continuous programme, involving regular reviews of runbooks, architecture, and SLAs as the threat landscape evolves.

Outages are inevitable, but losing customers is not. Resilience is about ensuring that when the inevitable happens, the business survives the failure.

Daniel Toh is chief solutions architect, APJ, Thales

The illusion of “five nines”: Why adding more servers won’t stop the next outage

Leave a ReplyCancel reply

Stay Connected

Latest News

Q&A: 5G has done a good job without grabbing headlines, says Ericsson

Micron to build US$24 billion wafer plant in Singapore to boost AI chip supply

In regional first, Singapore tests more efficient direct-current power for data centres

Singapore debuts agentic AI governance framework to manage growing risks

Techgoondu.com is published by Goondu Media Pte Ltd, a company registered and based in Singapore.

Everyday DIY

Leaders Q&A

Advertise with us

Sign up for the TG newsletter

Never miss anything again. Get the latest news and analysis in your inbox.