Resilience vs. Recovery - How the Facebook outage highlights important lessons

15 October 2021

The recent Facebook outage disrupted all of its key global platforms, including Instagram and WhatsApp - attracting significant attention, and six-hour communication vacuum.

Following this, Facebook published a very interesting press release setting out in detail what happened, why, and most importantly, how they were learning from the incident. This sort of public communication is a fascinating insight into the processes behind recovering from a major outage, and a shift in Facebook’s handling of communication. Had there been a significant loss of data, leading to a personal data breach, or heavy involvement of insurers in an incident, it is unlikely that this type of transparency would occur from a multinational.

What lessons can businesses learn from Facebook’s response and what legal and business issues does it bring into question?

What happened?

According to Facebook’s press release, the technical disconnections in its network, showed that the incident broke the tools normally used to investigate and resolve network outages. Repair and restoration of service therefore required physical presence of engineers at data centres, and required access to the router hardware, software and configurations which are designed to be difficult to modify, even with physical access.

Facebook specified that bringing the data centres back online had to be done carefully, to manage increasing loads as a full power up could have bought about further system failures. One of the key quotes in the article in the final paragraph in the press release stated:

“we have done extensive work hardening our systems to prevent unauthorized access and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity but an error of our own making. I believe a trade-off like this is worth it – greatly increased day to day security versus a slower recovery from a hopefully rare event like this. “

Is there a balance to strike in effective cyber security?

Cyber is part of modern warfare. Similar thinking exists in the military sphere, including for example, the designing of tanks, which involves trade-off in the so-called “iron triangle” holy trinity of mobility, protection and firepower.

Here, the design of tanks can vary very significantly, depending on their function and the context in which they are used, and according to the offensive or defensive capabilities required. In the case of cyber and infrastructure protection, Facebook has suggested that the trade-off of more resilience and cyber protection is worth it, even if this slows down recovery of the systems in the unlikely or, at least, reasonably uncommon circumstances of human error or force majeure.

There are some important lessons from this analysis that may be relevant to contracts and services reliant on technology infrastructure.

How can businesses limit the impact of a platform or IT outage? Legal terms and conditions considerations

Large enterprises, including hosting and infrastructure providers such as Amazon Web Services (AWS) and Microsoft, commonly exclude or limit their liability for service unavailability and may cover losses exclusively by way of service credits. For businesses, the careful evaluation of the remedies available for service downtime is vital. In particular, as they relate to establishing whether termination rights are required for a so-called “catastrophic” failure, by which we mean an outage of sufficient duration that it may affect the viability of the customer’s operations .Whereas smaller businesses, with little or no bargaining power over supplier terms and conditions, must balance whether it is worth suffering a temporary outage of this nature once in a while against the trade-off for greater availability and security the rest of the time.

For many businesses, there is a choice to run their own IT, or to rely on smaller IT service providers, who may offer more attractive commercial terms or liability caps in the event of an outage, against larger providers who promise greater resilience and robustness, backed with best of breed information security controls and IT certifications.

This is a judgement that each business will have to take on its own account, but realising that if an incident occurs, what the consequences could be for a slower than anticipated recovery is important for business continuity planning and operational resilience provision. A wider review of the risk profile of a business can also be balanced with appropriate insurance cover for loss of business or business interruption.

A careful review of Service Level Agreements (SLAs) may be worth considering in some cases. In an incident of this nature, the actual point at which services could technically be deemed to become available, thereby stopping the clock for the purposes of service resolution and service credit duration, may not be the point at which the service is actually fully operationally restored.

Operational Resilience

Many organisations are now reviewing operational business continuity in the light of operational resilience, which is now a mandatory consideration for many regulated businesses.

In particular, operational resilience requires businesses to assess realistically what will happen when services fail, rather than assuming that services can never fail. It is essential that the recovery point objectives (RPO) and recovery time objectives (RTO) are realistically managed and understood in the light of incidents of this nature, so that those RPOs and RTOs are not unrealistically short and could imperil the business.

Implications for remote maintenance and “dark” or edge data centres

Finally, there is increasing focus on trying to ensure that networks and data centres, in particular “edge” sites, can be supported and maintained remotely.

Clearly, the implications of diagnosis tools and virtual or remote means of access, or even entry door controls being disabled during an incident must be considered very carefully, as we have seen by this global outage incident

The new generation of “edge” processing will, of necessity, require buildings and networks to be supported remotely, and for fully “dark” data centres or microsites to be deployed, simply in order to ensure timely and cost-effective means of maintenance.

In this case, the equation between infrastructure resilience, access, physical and cyber security will have to be examined very carefully to ensure that the right balance of protection is balanced against ease of incident resolution in a similar way as the “iron triangle” applies to military hardware.

Our thinking

Women in Leadership: Planning for the future

Sarah Wigington

Events
23 April 2024
Find out more
Retail Week quotes Ilona Bateson on the CMA’s investigation into environmental claims in the fashion retail sector

Ilona Bateson

In the Press
28 March 2024
Find out more
Fashion and the Green Claims Code brought into focus by open letter from the CMA.

Ilona Bateson

Quick Reads
28 March 2024
Find out more
Charles Russell Speechlys grows its rankings in The Legal 500 EMEA directory

Frédéric Jeannin

News
27 March 2024
Find out more
Landmark European AI Act Passed By The European Parliament

Louise Zafer

Insights
19 March 2024
Find out more
Expert Evidence - Avoiding fatal failure

Claudine Morgan

Insights
08 March 2024
Find out more
Charles Russell Speechlys hosts international arbitration event in Dubai

Peter Smith

Quick Reads
05 March 2024
Find out more
Property Patter – Filming Agreements Part 2

Naomi Nettleton

Podcasts
04 March 2024
Find out more
Charles Russell Speechlys Paris significantly strengthens litigation practice with notable team hire led by Frédéric Dereux

Frédéric Dereux

News
01 March 2024
Find out more
Trade Credit Insurance – Protection, Economic Instability and Increased Demand

Mary Barrett

Insights
23 February 2024
Find out more
Consumer Duty - FCA warns that some firms are “lagging behind”

Richard Ellis

Insights
22 February 2024
Find out more
UK Government AI Regulation Response & Roadmap – Is the Government behind the wheel?

Mark Bailey

Insights
19 February 2024
Find out more
Remote Hearings – factors to consider

Richard Kiddell

Insights
19 February 2024
Find out more
Richard Davies writes for City AM on the lessons that the Premier League can learn from the Super Bowl and NFL

Richard Davies

In the Press
17 February 2024
Find out more
The ongoing fight against fakes

Charlotte Duly

Quick Reads
14 February 2024
Find out more
Abu Dhabi’s New Arbitral Centre Unveils its Rules

Dalal Alhouti

Quick Reads
08 February 2024
Find out more
Fortune quotes Richard Davies on sponsorship deals and the strength of brand/supporter loyalty in football

Richard Davies

In the Press
05 February 2024
Find out more
Legal tips and trends for Creative Design Agencies in 2024

Rebecca Steer

Insights
31 January 2024
Find out more
Charles Russell Speechlys advises Downing LLP on the successful refinancing of its loan facility with Kao Data

News
29 January 2024
Find out more
New Regulations for the UAE’s Media Sector in 2024

Mark Hill

Quick Reads
19 January 2024
Find out more
Megan Paul writes for The Grocer on why green energy can be a 'money saver' for retailers rather than a 'money spender'

Megan Paul

In the Press
17 January 2024
Find out more
Greenwashing: The Story So Far

Caroline Greenwell

Insights
12 January 2024
Find out more
Under the Influence: Legal Considerations for Social Media Influencer Partnerships in the UAE

Mark Hill

Quick Reads
08 January 2024
Find out more
Reuters quotes Megan Paul on supply chain considerations coming out of tensions in the Red Sea

Megan Paul

In the Press
21 December 2023
Find out more
EU AI Act – Will it become a law for all the world?

Nick White

Quick Reads
19 December 2023
Find out more
Indemnity Costs in Derivative Claims – Briefing Note

John Sykes

Insights
15 December 2023
Find out more
Trading insolvently or trading out of difficulty? Are we being naughty or did we have the best intentions? Part 3

Claudine Morgan

Insights
11 December 2023
Find out more
Ctrl + GCC: The Rise of e-Sports in the Gulf

Mark Hill

Quick Reads
08 December 2023
Find out more
Digital Markets, Competition and Consumers Bill: Will new consumer protection rules restrict access to Gift Aid?

Verity Heath

Quick Reads
08 December 2023
Find out more
The End of the SAG-AFTRA Strike & What it Means for the Middle East

Mark Hill

Quick Reads
13 November 2023
Find out more
UAE Strengthens its Position as Leading Destination for A.I.

Mark Hill

Quick Reads
06 November 2023
Find out more
Dubai Court of Cassation Extends Arbitration Agreement Across Subsequent Contracts

Peter Smith

Quick Reads
03 November 2023
Find out more
UAE Polishes Federal Arbitration Law

Peter Smith

Quick Reads
04 October 2023
Find out more
Drone deliveries: Be Prepared

Emma Humphreys

Quick Reads
28 September 2023
Find out more
Product compliance and Brexit - UK Government concedes to CE markings indefinite recognition

Jamie Cartwright

Quick Reads
08 August 2023
Find out more
Has the Orpéa plan impaired shareholder's consent? - Le plan de sauvegarde d'Orpéa n'a-t-il pas vicié le consentement des actionnaires historiques ?

Dimitri-André Sonier

Quick Reads
27 July 2023
Find out more
Les défaillances en France proches de leur niveau de 2019 - French insolvencies close to 2019 levels

Dimitri-André Sonier

Quick Reads
07 July 2023
Find out more
Casino Group: An agreement with investors and debt holders is expected at the end of July

Dimitri-André Sonier

Quick Reads
27 June 2023
Find out more
DIAC Issues First Annual Report

Georgia Fullarton

Quick Reads
09 June 2023
Find out more
One year on: "Influencer Culture: lights, camera, inaction" remains astonishingly accurate

Caroline Swain

Quick Reads
09 May 2023
Find out more