Facebook outage caused by a single error


Interruption of Facebook of the 4th of October, which brought down Facebook Messenger, Instagram and WhatsAppAs well as the core service, it was the result of a mistake by the company’s own network engineers.

The error led to all the services Facebook were inaccessible, with an analogy that compared it to a failure in the services of “Air traffic control” for network traffic …

The outage affected all platforms owned by Facebook, according to data from Downdetector and Twitter. This includes Instagram, Facebook, WhatsApp and Facebook Messenger […] While some interruptions of Facebook, Instagram and WhatsApp They only affected certain geographic regions, services were down worldwide.

It seemed like the problem might be related to DNS, domain name servers telling devices which IP addresses to use to access services, but it was unclear what exactly had happened, and if it was a external hack, a malicious action by an insider or a catastrophic mistake.

Facebook clarified that it was a mistake:

“Our engineering teams have learned that configuration changes to the backbone routers that coordinate network traffic between our data centers caused problems that disrupted this communication. This disruption in network traffic had a cascading effect on the way our data centers communicate, stopping our services. “

Reports say lower-level employees had to gain physical access to data centers and then rely on step-by-step instructions from senior engineers to undo the mistake. To complicate this, the unavailable networks meant that Facebook’s gateways were also offline, physically preventing access.

How to understand Facebook outage

We will certainly get the full story over time, but the consensus opinion that emerges is that the problem was a combination of domain name server settings (DNS) and border gateway protocol (BGP).