Blacklisted: When Google Classified the Entire Web as Malware

The 10th anniversary of the Google search incident that incorrectly classified the entire World Wide Web as malware is another opportunity to reflect upon computer system defects, human error, process flaws, organizational mistakes, and the best principles and practices for solution in the IT industry. In this blog and my upcoming book, Bugs: A Short History of Computer System Failure, I will chronicle some important system failures in the past and discuss ideas for improving the future of system quality. As information technology becomes increasingly woven into Life, the quality of hardware and software impacts our commerce, health, infrastructure, military, politics, science, security, and transportation. The Big Idea is that we have no choice but to get better at delivering technology solutions because our lives depend on it.

On 31 January 2009, a Google engineer manually updated its search engine’s blacklist of sites classified as malware to include the URL of ‘/’; this change meant that every organic Google search result for the entire World Wide Web (WWW or Web) was incorrectly classified as malware. Fortunately, Google’s on-call Site Reliability Engineering (SRE) team quickly identified the problem and fixed it within an hour. Besides affecting organic search results, the system error also impacted Google’s email service, GMail, in which users reported genuine messages routed to spam folders; interestingly, advertised or promoted search results were not affected by the error. This essay explores some of the business and technology factors that contributed to the system defect, the incident’s timely resolution, and the wider implications for the Web, search, and malware classification.

Source: VisualCapitalist.com

According to multiple sources including JumpShot, Netmarketshare.com, and Statista.com, Google has 60–80% of the market share for web search traffic depending on the country. Google is also the default search engine on most smartphones running the Android operating system; according to Gartner Research and Statista.com, Android holds about 85% market share since 2017. If one also accounts for its sister properties such as Google Image, Maps, and Youtube, then Google holds an impressive 90% market share of web, mobile, and in-app searches. There are some potential threats on the horizon to Google’s dominance in Search; they range from Amazon’s Alexa and Echo devices used to search and buy products to users spending more time on Facebook, and even some users opting out of data sharing entirely through Ad/Cookie blocking browser plugins. In the end though, Google handles 3.5 billion searches per day, has more than 1.5 billion unique users, and earns about $32B annually in advertising revenue from search.

Malware is software designed to intentionally cause harm to an individual user, a computing device, or a larger network of nodes by attacking the system’s availability, confidentiality, or integrity. There are different types of malware such as computer viruses, worms, spam, Trojan horses, ransomware, spyware, adware, and others. What began out of curiosity and fun when the Internet was an academic computing environment has now turned into malice and profit because malware means big business and serious trouble for corporations, governments, and individuals across the world. According to various computer security reports from McAfee, Center for Strategic and International Studies (CSIS), IBM, the Ponemon Institute, and Symantec, there are several cybercrime statistics one should be concerned about:

The cost of cybercrime was estimated by McAfee and CSIS at $600B in 2018, almost 0.8% of global GDP and up from $500B in 2014.
Mobile malware attacks have increased by 54% in 2017 according to Symantec, and 3rd party app stores (e.g. not Apple or Google) are the source for 99.9% of discovered mobile malware.
Nearly 60M Americans have been affected by identity theft according to a 2018 online survey conducted by the Harris Poll and about 140M worldwide (about 2% of all people around the world) according to ENISA in Europe. These identities are used to perpetrate various crimes and frauds of impersonation including credit cards, utilities, banking, loans, and government documents. Since 2014, almost 3 billion internet credentials and other PII has been stolen by hackers.
The largest sources of cyber attacks in 2017 were China (20%), USA (11%), and Russia (6%); however, Iran and North Korea are growing state sponsors of cyberterrorism.
Cryptocurrency mining is a huge growth area in cybercrime with detections of cryptojacking on endpoint computers surging 8500% according to Symantec as criminal botnets trying to add new machines such as your computer to their resource pool. Tor and Bitcoin have facilitated the growth of the Dark Web and are some of the preferred tools for cybercriminals.
The cost of the average global corporate data breach is almost $2M and the average time to detect and react to a breach is 196 days according to IBM and Ponemon. The costs include funds to help victims with losses, notification expenses, as well as business disruption, customer turnover, revenue forfeiture, and reputation damage.
There are over 300,000 new malware variants, 33,000 phishing attacks, and 80 billion malicious scans every day that are detected according to data compiled by AV-Test and McAfee.
Microsoft Office file format such as Word, PowerPoint, and Excel are the most prevalent group of malicious file extensions comprising 38% of the total.

Google Search Warning for Malware

So with great power comes great responsibility. Through the Stopbadware.org initiative since 2006, Google has partnered with the likes of Consumer Reports, Mozilla, Paypal, Verisign, Verizon, and others to prevent, mitigate, and remediate malware websites. Stopbadware receives data from different content and hosting providers, defines criteria for classifying malware sites, maintains a common clearinghouse of URLs blacklisted by community members, aggregates malware statistics, manages the appeal process if a site is blocked by providers, and publishes advisory documents and best practices to reduce the incidence of malware. Although Google supports StopBadware through data sharing, participates in its working groups, and contributes financially to the organization, Google’s Safe Browsing Initiative and Secure Web API’s are separate services that use Google’s own private blacklist curated by both man and machine. This list is periodically updated, and on 31 January, 2009, a Google engineer accidentally added and committed the “/” URL to the blacklist, and Google’s system interpreted this URL to match all Web URLs. Twitter was briefly ablaze and abuzz with people reporting the error using the hashtags #googmayharm #googmayhem. The warning message in Google’s organic search results also linked to Stopbadware.org, and the torrent of users clicking the link caused a DDOS on their website. Users could still copy-and-paste links into the URL field and visit the sites manually, but the widespread perception on that Saturday morning was that the Web was experiencing a malware catastrophe. The good news for Google and the Web was that the Google SRE team was on-call, and it was actively monitoring and supporting its cloud services. The SRE team was notified of user complaints, identified the root cause, communicated a response to the global community through its blog and Twitter, reverted the blacklist change, and deployed the updated configuration to its services. Google’s search services like much of its cloud platform are distributed on servers located across the world so the blacklist configuration update was released in a staggered and rolling fashion. The search errors began appearing between 6:27 and 6:40 AM PST when the blacklist was initially changed and then began disappearing between 7:10 and 7:25 AM when that change was reverted.

While this story is about a negative incident involving Google, there are several positive lessons to be learned for IT professionals.

Customer Service matters in a crisis, and investing in a strong support and operations organization for your IT solutions can be the difference when problems inevitably unfold per Murphy’s Law. Often the operations capability is treated as overhead by accounting and IT departments, however Google’s SRE team is first class, a key ingredient in the secret sauce of success for the company as a leading cloud service provider, and they worked fast to identify the root cause, to communicate acknowledgement of the issue as well as ongoing updates, and to resolve the problem.
DevOps automation is a major competitive advantage for IT departments and companies that are willing to invest in their personnel, platforms, and processes. In less than an thirty minutes, Google was able to reliably deploy a production configuration update to its servers distributed across 16 data centers. If you and your organization are new to the DevOps capability, then focus on two things: first on one-touch automation of the deployment pipeline and then second on production environment monitoring. IT processes can execute faster while reducing costs, defects and risks. Business can be more agile which means faster time-to-market as well as higher potential revenue growth and market share. Sure, there are initial and ongoing costs to a DevOps capability in terms of new skills and tools, but on balance the long-term ROI comes from the ability to reliably deliver new features and fixes to customers at a higher frequency and also in terms of productive hours saved in deployments and reacting to production issues. For more information about the multifaceted benefits of DevOps, check out DORA’s State of DevOps research report and read The DevOps Handbook written by Jez Humble and Gene Kim.
Quality Assurance is not so simple. The “/” URL is syntactically and semantically valid, but it did not make sense in the specific context of the malware blacklist. It is a configuration data entry error that is obscure and subtle, and while Google surely had automated tests for the core components of the Safe Browsing Services API, they did not test these blacklist configuration changes in a staging environment. Having visited Google’s headquarters in Mountain View for work, I can attest to the cultural importance of QA with written articles published on the walls of cafeterias, hallways, and yes, even the bathrooms about Quality Assurance and best practices for verifying system integrity. The lesson here is one about humility and separation of concerns. It motivates the business need to have distinct teams developing and testing software. The former wants things to work and sometimes believes that truth to a fault, while the latter wants to break things to make things better.

In subsequent articles, I will discuss specific system incidents involving malware that resulted in security breaches as well as strategies and tactics for preventing and reacting to these events.

Enjoy the article? Follow me on Medium and Twitter for more updates.

References