In association with Thales

By Daniel Toh
Digital businesses today depend entirely on Web apps, application programming interfaces (APIs), and online services as their primary customer touchpoints.
In this environment, the traditional definition of “business continuity” has shifted; it is no longer about disaster recovery after a complete stop, but the ability to deliver products and services during disruptive events, including cyberattacks and infrastructure failures.
The true measure of risk is no longer if a failure occurs, but whether the architecture is resilient enough that customers barely notice the disruption.
As we look towards 2026, resilience is the new uptime imperative: Designing application security and operations so the business keeps running even when systems are under stress.
Deconstructing high availability versus fault tolerance
To build a resilient architecture, IT leaders must first distinguish between two terms often used interchangeably: High Availability (HA) and Fault Tolerance (FT). HA systems are designed to be resilient to failure and recover quickly to avoid service loss. FT systems, on the other hand, uses redundancy so services continue operating even after a component failure.
In an ideal situation, FT should complement HA rather than replace it. Think of a twin-engine aircraft; if one engine fails, the other keeps the plane flying (FT), allowing it to land safely (HA).
True resilience requires a “specialist” architectural philosophy. It is a common misconception that more servers equal better reliability. Take Google, for example – industry leaders like them are selective about where they build data centres, reinforcing the philosophy that reliability relies on strategically placed infrastructure engineered for quality and capacity, instead of thousands of Points of Presence (PoPs).
In contrast, some security vendors have expanded into general cloud computing and edge services to compete with hyperscalers. While this lets them have broader offerings to the market, mixing security functions with general workloads creates “noisy neighbour” risks, cross-service complexity, and cascading failures.
When attacks look like outages
The reason architecture matters more than ever is that modern outages are just as likely to be caused by application-layer attacks as by hardware failure. Today’s threat landscape includes multiple vectors that mimic infrastructure collapse:
- DDoS attacks: Layer 3, 4, and 7 attacks can make networks and domain name servers (DNS) unreachable, resulting in significant revenue loss – often measured in tens of thousands of dollars per hour for digital-first services.
- Malicious bots: Bad bots now make up 37 per cent of all internet traffic, according to the 2025 Imperva Bad Bot Report. When these bots engage in scraping and credential stuffing, they exhaust application resources (like login or search functions), distorting traffic and severely impacting performance for real users.
- API Abuse: APIs are now the “nervous system” of digital business. Logic flaws and broken authorisation can lead to outages just as damaging as a full site crash.
- Operational limits: Organisations also face self-inflicted outages due to IT-related activities like patching operating systems, for example. The inability to patch zero-day threats (like React2Shell) fast enough forces emergency maintenance cycles that introduce risk and instability.
The solution: Unified, single-stack architecture
To combat these hybrid threats, we propose a Unified Web Application & API Protection (WAAP) approach. Fewer gaps between point products mean fewer weak links that an attacker can exploit to take a system offline.
This is where a single-stack approach matters when it comes to resilience, because it allows for shared telemetry and consistent policies across web apps, APIs, and microservices simultaneously. When security and delivery are integrated, the architecture is able to adapt in real-time.
Organisations should also consider the importance of a resilient Content Delivery Network (CDN) and Edge Architecture. The use of Anycast routing and intelligent caching means that attacks or localised infrastructure issues at any one node do not necessarily translate into a global outage.
To ensure optimal performance, security functions should execute within their own containerised paths, avoiding the complexity and resource-contention of multi-purpose networks.
Crucially, a WAAP enables virtual patching at the edge. This allows teams to block exploits immediately, buying time to assess critical impact and fix code safely, without taking the application offline for emergency maintenance.
Moving from SLAs to business resilience
Ultimately, business and security leaders must look beyond standard provider SLAs, such as 99.999 per cent uptime. Instead, they should define Service Level Objectives (SLOs) that map directly to business impact, such as maximum acceptable downtime for critical customer journeys.
To achieve this, chief information security officers (CISOs) should consider a pragmatic four-step roadmap:
- Assess: Use management frameworks to map critical apps and align security controls to those priorities.
- Consolidate: Move from fragmented point products to integrated platforms for simpler, more resilient operations.
- Engage: Bring in specialist expertise to accelerate maturity and maintain resilience over time.
- Iterate: Treat resilience as a continuous programme, involving regular reviews of runbooks, architecture, and SLAs as the threat landscape evolves.
Outages are inevitable, but losing customers is not. Resilience is about ensuring that when the inevitable happens, the business survives the failure.
Daniel Toh is chief solutions architect, APJ, Thales
