paint-brush
Here's How I Mined My Mailbox to Find The Top Email Service Providersby@kehers
646 reads
646 reads

Here's How I Mined My Mailbox to Find The Top Email Service Providers

by Opeyemi ObembeFebruary 23rd, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Using IMAP, I was curious about what email service top apps use; being particularly interested in transactional email services. Is there a way I can go through my mailbox, check product emails and check the service for each? Sounds like a fun experiment to give a shot. I decided to pull all the emails instead of pulling them from a Yahoo mail dedicated to product signups, subscriptions and newsletters. After reviewing a couple of headers in Yahoo mail, I found a pattern to match the provider.

Company Mentioned

Mention Thumbnail
featured image - Here's How I Mined My Mailbox to Find The Top Email Service Providers
Opeyemi Obembe HackerNoon profile picture

I was reviewing some stats for Mailintel and was curious about what email service top apps use; being particularly interested in transactional email services. Is there a way I can go through my mailbox, check product emails and check the service for each? Sounds like a fun experiment to give a shot.

Connecting my mailbox

With my recent experiments with IMAP, connecting to my mailbox wasn’t difficult. I am using emailjs-imap-client, a JS IMAP client.

const ImapClient = require('emailjs-imap-client').default

;(async function () {
  try {
    const confOption = {
      auth: {
        user: process.env.EMAIL,
        pass: process.env.PASS
      },
      logLevel: 'error'
    }

    if (process.env.PORT === '993'){ confOption.useSecureTransport = true }
    const imap = new ImapClient(process.env.HOST, process.env.PORT, confOption)
    await imap.connect()
  } catch (e) {
    console.log(e)
  }
})()

My environmental variables look something like this:

EMAIL="[email protected]"
PASS="mypassword"
PORT=993
HOST="imap.mail.yahoo.com"

If you are connecting to a Yahoo mail, you will need to generate an app password. Using your real email password won’t work. 

Gmail is a little more complex. To start with, you need to enable IMAP access your account. You can then go with any of the options:

  1. Allow less secure apps. But don’t forget to turn this off once you are done with the experiment. If you still can’t connect, you may need to allow app access to your account.
  2. Create an app password (more secured option).

Checking product emails

Getting product emails is a tricky one. How do you identify product emails from regular emails? How do you differentiate marketing from transactional emails? This can be done but the ideas I came up with were not worth the effort. In the end, I decided to pull all the emails instead. (This was also easier because I have a Yahoo mail dedicated to product signups, subscriptions and newsletters).

const ImapClient = require('emailjs-imap-client').default

;(async function () {
  try {
    // …connection code
    await imap.connect()
    const box = await imap.selectMailbox('INBOX')

    // Reading 5k mails at once can choke the process so let's chunk into 50 mails per request. We are also assuming there are > 5k emails in the mailbox
    let start = +box.exists - 5000
    while (true) {
      const messages = await imap.listMessages('INBOX', `${start + 1}:${start + 50}`, ['uid', 'body[]'])
      if (!Array.isArray(messages)) { return }
      // Do stuff with email here
      start += 50
      if (start >= +box.exists) { break }
    }
  } catch (e) {
    console.log(e)
  }
})()

Checking the service provider

How do I know the email service used for the mail? I checked a couple of email headers and noticed a couple of places the provider details can be extracted from. Here is what an email header looks like:

At first glance at some headers, the

Message-Id
header looks the most straight forward. Here are a couple of message IDs from 3 different mail headers. 

Message-ID: <[email protected]>
Message-ID: <010001703764a668-4ebb47b9-4454-4081-afc4-e6448cd22897-000000@email.amazonses.com>
Message-ID: <[email protected]>

By looking at the host part of the

message-id
, it's easy to know the service provider. But not so fast. Looking at more headers, I noticed some
message-ids
have a host that is different from the sender value in the
received
header. Below are examples from Bitbucket and Letsencrypt. (Some parts truncated for brevity).

Message-ID: <[email protected]>
Received: by filter0225p1iad2.sendgrid.net with ~
Message-Id: <[email protected]>
Received: from mail132-5.atl131.mandrillapp.com ~

The

received
headers seem the most accurate place to get the provider but it poses its challenges. The
received
headers are structured in different ways depending on how the mail is sent. After reviewing a couple of headers in Yahoo mail, I was able to find a pattern to match the provider. (Not 100% accurate).

  1. For more than one received header, match
    by *
    of the second
    received
    header. 
  2. If that doesn’t exist or does not have a host address or there is just one received header, then match the
    (EHLO *)
    of the first
    received
    header.

Now we can rewrite our script to use

received
headers. To be able to easily extract this, I will be bringing in Mailparser.

const ImapClient = require('emailjs-imap-client').default
const simpleParser = require('mailparser').simpleParser
const PULL = 10000

;(async function () {
  const sites = {}
  try {
    const confOption = {
      auth: {
        user: process.env.EMAIL,
        pass: process.env.PASS
      },
      logLevel: 'error'
    }

    if (process.env.PORT === '993') { confOption.useSecureTransport = true }
    const imap = new ImapClient(process.env.HOST, process.env.PORT, confOption)
    await imap.connect()
    const box = await imap.selectMailbox('INBOX')
    const exists = +box.exists

    if (exists <= PULL) {
      console.log('You specified a pull number more than or equal to the number of emails in the mailbox')
      return
    }
    let start = exists - PULL
    while (true) {
      const messages = await imap.listMessages('INBOX', `${start + 1}:${start + 50}`, ['uid', 'body[]'])

      if (!Array.isArray(messages)) { return }
      for (const message of messages) {
        const mail = await simpleParser(message['body[]'])
        const headers = mail.headers.get('received')
        let esp
        if (headers.length > 1) {
          const header = headers[1]
          const match = header.match(/by ([^\s]*)/)
          if (match && match.length > 1 && match[1].indexOf('.') !== -1) {
            esp = match[1].split('.').slice(-2).join('.')
          }
        }
        if (!esp) {
          const header = Array.isArray(headers) ? headers[0] : headers
          const match = header.match(/EHLO ([^)]*)/)
          if (match && match.length > 1 && match[1].indexOf('.') !== -1) {
            esp = match[1].split('.').slice(-2).join('.')
          }
        }
        if (!esp) {
          // Todo: Track failed matches
          continue
        }
        if (sites[esp]) {
          sites[esp]++
        } else {
          sites[esp] = 1
        }
      }
      start += 50
      if (start >= exists) { break }
    }

    await imap.logout()
    await imap.close()
  } catch (e) {
    console.log(e)
  }
  console.log(sites)
})()

Results

This was the breakdown after stripping the results to the top ones.


Few notes:

  1. rsgsv.net, mcsv.net and mcdlv.net are from Mailchimp. As you know, Mailchimp provides marketing email service. Mandrillapp.com is the transactional email service for Mailchimp. It used to be a standalone service until it became deeply integrated into Mailchimp.
  2. spmta.com is for Sparkpost.
  3. marketo.org is for marketo.com, currently owned by Adobe. Provides marketing email service.
  4. Interesting to see intercom there. It's amazing the number of products that use Intercom.
  5. mailgun.net and mailgun.org belong to Mailgun, obviously. Mailgun also recently acquired Mailjet. Mailjet provides both marketing and transactional email service. It will take more deep-diving into the headers (or content) to figure if the mail was sent as a marketing or transactional email.
  6. Sendgrid, also like Mailjet, offers both marketing and transactional email service. It’s the most used from my experiment but it’s hard to know what fraction of that is marketing and what other is transactional. PS: they were recently acquired by Twilio.

Conclusion

I merged the same providers for a more accurate chart. 

This is only a fun experiment and doesn’t show the true market share of the services. Since the data is based on the product emails in just my mailbox, it is flawed by selection bias.

However, it’s something you can run on your mailbox for the fun of it. You can also extend it to see what service your favourite product/app uses.

Remember, you may need to look at the full headers of some of your emails to come up with the best way to know the service provider. 

(Originally published here)