Enron, Scandal, and Spam Emails: The Fall of "America's Most Innovative Company"by@historicalemails
1,819 reads
1,819 reads

Enron, Scandal, and Spam Emails: The Fall of "America's Most Innovative Company"

by Historical EmailsNovember 12th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Until its collapse, Enron was one of the great corporate American success stories. Then it turned out to be built on fraud, and collapsed. One silver lining that came out of the collapse was the release of over half a million corporate emails which are still the largest public database of company emails today.
featured image - Enron, Scandal, and Spam Emails: The Fall of "America's Most Innovative Company"
Historical Emails HackerNoon profile picture

The email above doesn’t seem like anything special. In fact, it is only one inconsequential email in a sample set of over half a million sent between 1997 and 2004 to, from, and within one company, the Enron Corporation.

Including all 500,000+ emails within this article seemed excessive, so I have picked out a few samples. The history here is not so much about the individual emails, as the whole journey of the Enron Corporation to its final demise, the collapse of one of the largest accountancy firms in the world turning the Big Five into the Big Four, and the development of anti-spam filters.

This was a dramatic enough event that even over two decades later, it comes up in popular culture, even when many no longer recall what it refers to.

The Enron Scandal: A Short Summary

Founded in 1985 as a merger between two small regional companies, Enron Corporation sold energy, commodities, and services up until declaring bankruptcy in 2001. With over 20,000 staff, they claimed revenue of over $100 billion, and Fortune named it “America’s Most Innovative Company” six years in a row; it was a massive success story.

Towards the end of 2001 it became clear that the reason for its massive (disproportionate even) success was deliberate and creative fraud, overlooked by (at the time, allegedly aided by) their auditors Arthur Andersen, one of the Big Five accountancy firms at the time. On a per-employee basis, Enron was reporting profits an order of magnitude higher than almost any other similarly-sized company, and more than twice that of Exxon Mobil.

The fallout was immense, and rapid, with Enron filing for bankruptcy in 2001, Arthur Andersen being dissolved (hence we now have the Big Four of Deloitte, EY, KPMG, and PwC), and the subsequent collapse of WorldCom in 2002 due to an even larger accounting scandal, again with Arthur Andersen as their auditors.In fact, a number of faulty audits of other companies also came to light.

In 2002 the Sarbanes-Oxley Act was enacted to try and place some controls around audits and avoid similar events in future.

The Emails

During the investigation into Enron, the Federal Energy Regulatory Commission (FERC) obtained a sample of the company’s e-mail data - spanning years and 150 Enron employees (mostly senior management). The data was used as part of the investigation to identify persons of interest, and then the FERC took an unusual and controversial decision.

Every cloud has a silver lining, and the Enron scandal led to the release of the largest and most comprehensive email datasets ever compiled. What was once used to gather evidence of fraud and conspiracy, would become one of the greatest tools against spam and fraud through phishing the world has ever seen.

For transparency, historical, and academic research purposes the FERC made the dataset public and posted it to the internet.

Later on it was purchased by Leslie Kaelbling of MIT, and the hard work of a number of people at SRI International corrected integrity errors, and carried out some redactions following requests from affected employees. The latest version of the dataset is from 2015, and comes to around 1.7Gb compressed.

The impact of the emails on research is hard to overstate. This was the largest collection of emails publicly available at over 500,000. To put it in perspective, the well-known Sony Pictures hack consisted of under 200,000 emails. Working through the emails the big tell is how normal they all are, simple conversations and office chatter. There is no sense of a grand accounting fraud conspiracy behind the scenes.

Then there’s the spam. While the structure of the dataset makes it hard to analyse, sampling at different points in time is an effective way to see spam volumes increasing and the development of phishing. Which, for those trying to develop anti-spam tools or phishing filters, was incredibly valuable. These are genuine emails from an organisation, not a simple set of dummy data, and so if a filter can work effectively on the Enron dataset it’s likely to be effective elsewhere.

What Do the Enron Emails Tell Us?

This dataset was initially used to train the very filters we rely on today to detect spam and protect us from phishing, and is still the largest publicly available collection of company emails. Another team used the dataset to train a compliance tool which would alert users about sensitive elements in text, a technique still at the core of data leak prevention tools applied to email today. Others used the Enron emails to examine how people organised and stored emails to see if it could be automated effectively (largely, as anyone relying on automated sorting will know, the answer appears to be no).

Still more looked at the data to better understand companies and organisations. Social graphs of the senior management were built, revealing a nest of connections around a few nodes, with thin pathways to everyone else.


Text analytics, language processing, autocomplete, grammar correction, spam filtration, all kinds of research have made use of the Enron dataset. One study by an English Teacher, Evan Frendo, discovered a fixation on ‘ball’ metaphors in American business language.

The Enron dataset captures a period in the history of corporate America, of technology (a number of the emails were written on BlackBerry devices, for example), and of human communication. It also marks a shift in the way datasets were approached in research - shifting from a focus on authorship (value comes from an expert creating the data) to the commons (the data is valuable not because of individual contributions, but because of what they show collectively).

Since the dataset covers over a decade, it shows the evolution of email etiquette and usage from 1991 through to the mid-00’s. There’s even a few jokes that people may recognise today (one about explaining different government systems with cows), along with racism, misogyny, and pornography.

If you want a lived historical email experience, The Good Life (Enron Simulator) will give you the experience of receiving every single one of the over half a million e-mails in chronological order, over periods ranging between 7-28 years.