<
>

The Fallout of the Massive Facebook Outage Offers a Reminder of the Follies of Both Facebook and Tech

In the interest of full disclosure, I’m on Facebook quite a lot, especially at night. Browsing my Newsfeed, a.k.a. Timeline, has become sort of my night-time routine, my go-to activity before I call it a day. Perusing Facebook (I mostly scan what my friends are up to and watch videos) before sleeping isn’t the healthiest of bedtime routines but in these trying times, anything that works is a keeper.

So, imagine my horror when Facebook went down just before midnight on the 4th of October, and being offline until about five the following morning, the 5th of October. I distinctly remember that night, actually. I was watching my favourite home chef—a Filipino known simply as “Ninong Ry”—preparing this mouth-watering pork belly dish, only for it to be cut off mid-video by a prompt saying I had no internet connection.

The catch was, my internet connection was fine. A bit slow that night but working.

Then, it hit me: Facebook was down. I remember double-checking at Down Detector and, of course, Twitter to confirm the outage.

Yes, Facebook went down. Good thing YouTube was still up at that time for this night owl.

The Cause of the Facebook Outage

It wasn’t just Facebook that was down; so were its “friends”, Instagram, WhatsApp, Messenger and Oculus. A cyber attack was immediately dismissed as the likely culprit, though. Instead, cybersecurity experts theorised that the Facebook outage on the 4th of October, the company’s worst since 2019, was due to Domain Name System (DNS) issues.

About an hour after Facebook stopped working, ThousandEyes, the internet analysis division of Cisco, announced on Twitter that DNS failure caused the Facebook outage as “Facebook’s authoritative DNS nameservers became unreachable.” The DNS is the phonebook of the internet, translating domain names, like disruptivetechasean.com and cybersecurityasean.com, into IP addresses so that web browsers can load them.

“What is DNS?” asked Lotem Finkelstein, Head of Cyber Intelligence at Check Point Software Technologies, in an email interview with CSA. “Simply, it is the internet protocol to convert the words we use like Facebook.com to language computers know—numbers, or internet address. They do the conversion and route us to the services and applications we asked to use. When this service falls, the services look like they are down, but actually are just inaccessible”.

It is thanks to DNS that internet users do not need to memorise IP addresses, which are either numeric in IPv4 (192.168.1.1, for instance) or alphanumeric in IPv6 (2400:cb00:2048:1::c629:d7a2, for example). 

Details that trickled in all but confirmed the root cause of the Facebook outage: It was apparently a routine update on Facebook’s border gateway protocol (BGP) that went haywire. This mistake, in turn, wiped out the DNS details needed by browsers to find Facebook and its friends. The BGP, in this case, is said to be the postal service of the internet. It enables fast and efficient communication of data that allows users of the World Wide Web to access websites.

Boris Cipot, Senior Security Engineer at Synopsys Software Integrity Group, pointed out that “Facebook is maintaining millions of servers to provide different offerings such as the Facebook platform itself, Instagram, WhatsApp and Oculus Rift VR services to their users”. He added, “Part of this maintenance is also changing certain server settings that define how the server and the services on it are working”.

This, apparently, was where things went haywire for Facebook.

“In this outage, the problem was caused by the change of the Domain Name Service settings. Domain Name Service, or DNS for short, is a hierarchical and decentralised naming system that makes it possible for computers on the internet to find each other. If this setting is wrong, then your servers will no longer be reachable as other computers will not be able to find them”, Cipot explained. “In simpler terms, think about a phone number. If you have a phone number, other people can call you. If you hand out the wrong phone number, others will not be able to contact you—and this is what has happened to Facebook. Due to a misconfiguration of those DNS settings, the Facebook servers were no longer accessible and therefore took all services offline”.

Cipot, however, is surprised that Facebook’s people made such a colossal mistake in what should have otherwise been a routine procedure—if done right.

“Changes of that magnitude are not done by hand. IT administrators use scripts and specialised programs to execute their work on several machines automated and fast. However, the risk of running into glitches is always present. As with any programming language, bugs can be also a part of scripting languages”, he explained. “In order to avoid glitches on a large scale, the best way is to do it step by step. That way, you are rolling out the changes in a small, controlled environment and containing any possible threat in a sustainable manner. Why was this not possible here and why did this happen is a question only Facebook can answer. However, changes of any kind should be done in smaller steps until it’s confirmed that everything is working”.

Candid Wuest, Vice President of Cyber Protection Research at Acronis, also thinks that a DNS problem was, indeed, the culprit behind the Facebook outage. He pointed out how DNS-related outages can be caused by “non-malicious actions,” like a mistake in what would have otherwise been routine maintenance. Wuest, nonetheless, is not entirely dismissing a cyber attack, noting how both BGP and DNS protocols “happen to be popular targets among cybercriminals.”

“There are various potential attacks against DNS infrastructure—from DDoS attacks to local DNS rebinding or hijacking a DNS with social engineering against the registrar,” said Wuest. “Looking at overall attack statistics, they are a lot less popular than common malware and ransomware attacks but they can be extremely devastating if successful in a sophisticated attack. It’s like pulling the electric cable to your server room—the whole enterprise suddenly goes dark.”

Given Facebook’s rather controversial history of being tight-lipped about its practices, it might take a while before the true cause of this outage comes out in public—if at all. What is clear, a week after the outage, is that the entirety of the Facebook enterprise went dark and that it took the company several hours to fix.

Facebook Is Down—Should You Care?  

On the 5th of October, just as Facebook was going back online, its founder and CEO, Mark Zuckerberg, posted on Facebook (oh, the irony), an apology for Facebook, WhatsApp, Instagram and Messenger all going off the grid for several hours.

“Facebook, Instagram, WhatsApp and Messenger are coming back online now,” Zuckerberg wrote. “Sorry for the disruption today—I know how much you rely on our services to stay connected with the people you care about.”

For many of Facebook’s over 2.85 billion monthly active users, yours truly included, Facebook and its friends going back online coupled with Zuckerberg’s apology means the end of the story. Facebook, after all, is up and running again, so everyone can go back to their merry ways posting statuses and memes, reacting to photos, commenting on videos and connecting with family and friends.

Indeed, the Facebook outage on the 4th of October 2021 is a minor inconvenience for what is likely a majority of Facebook users—some 40% of whom are in the Asia Pacific, thus making the region both a bright spot and a crucial market for the social media monolith. Meaning, as Facebook scrambled to get everything back online, many of its users in Southeast Asia—140 million in Indonesia alone, plus another 170 million or so in the Philippines, Vietnam, Bangladesh and Thailand combined—were sleeping, or on their way to sleep.

But in the grand scheme of things, last week’s Facebook outage exposed a discomforting truth that all of us must inevitably confront: Despite rapid, unabated advancements in so many frontiers, technology is far from fool-proof, and it might stay that way forever. Facebook, with its most advanced, best in class platforms maintained by hundreds of the world’s brightest engineers, is in theory not supposed to go down like that—shut off entirely from the World Wide Web and the root cause of the problem unaddressed for several hours.

This is the reality of technology, and the sooner we embrace it, the better off we will be. Once we accept that technology can, at least to a certain degree, be feeble, unpredictable and even unreliable, we can then consume it differently, with an eye out, in particular, for instituting fail-safes in case issues arise and being proactive with the technologies being deployed. This is especially true at the enterprise level, where executives and IT personnel are relying more and more on technology to enhance business processes, deliver higher-quality products and services and improve the customer experience.

Businesses on Facebook: Bearing the Brunt of the Outage

The follies of Facebook—and of technology, to a degree—were felt the most by small- and mid-sized businesses that relied heavily on Facebook, as well as Instagram, Messenger and WhatsApp. Despite the Facebook outage lasting mostly through the wee hours of dawn for Southeast Asia, the fallout on businesses was nonetheless both adverse and wide-ranging, causing engagement and traffic to fall precipitously and sales to drop considerably.  

A bamboo socks retailer, for example, saw its sales drop by 25% due to the Facebook outage, while the founder of a jewellery brand was unable to finalise contracts with influencers for sponsored posts. The CEO of a power equipment supplier in India, meanwhile, had to switch from WhatsApp to Telegram during the outage to keep communication lines open in case power station problems came up (fortunately, no issues arose that time).

These examples underscore not only the expansive reach and growing importance of Facebook but also the increasing reliance of smaller enterprises and entrepreneurs on the social media giant to do business. This heavy reliance on Facebook has, by and large, been a boon for businesses, particularly in Southeast Asia but last week’s outage is a reminder of the need to diversify and be agile both in leveraging social media for business and in using technology as a whole.  

Facebook Encounters Another Outage

If that nearly six-hour outage did not highlight enough the feebleness of Facebook, the social media monolith experienced another outage four days later, on the 8th of October. This one, however, lasted for just about two hours and the issues were intermittent only, unlike in the first outage when Facebook was completely inaccessible for six hours.

Not to be an alarmist but those are two outages in a week, and they might be a sign of some trouble brewing in the land of Facebook. Said incidents, though, are far from a death knell, at least, not for a tech titan worth about USD $1 trillion and counting. That said, it is fair to wonder as to what will come next for the world’s largest social media platform to date.

Perhaps more importantly, the two outages should compel every one of us to rethink not only our use of Facebook and social media but more so the way we consume technology in its entirety.

You might also like
Most comment
share us your thought

0 Comment Log in or register to post comments