You might have heard of a tweet from @realDonaldTrump which got ‘hijacked’ by belgian security researcher Inti De Ceukelaire(@securinti) recently:
Trump has mentioned the website of the National Achievers Congress, nac2012.com, that domain name wasn’t renewed by the original owner. Therefore Inti was able to buy it and make a redirection to YouTube.
In this era of fake news, CTR shills & trolls this might be a great way to create confusion or to trick people into clicking a link. Wouldn’t you be more likely to click a link tweeted by Katy Perry than something else?
So I thought it would be a good idea to have a look at the top 1,000 twitter accounts (:
To do that I wrote a little python script. It’s a bit messy and I’ll get into how it can be improved later but basically to keep it simple this is what it does:
- Download all the tweets it can from a user
- Fetch domain names in those tweets
- Verify if the domain names are available for registration
I was definitely thinking that someone would be actively doing this already and that I wouldn’t find any available domain but I was wrong. I found 109 available domains from the top 1000 twitter accounts so a solid 10% & I think this result could be greatly improved.
Here is the top 10:
- Katty Perry, @katyperry , 95.6M
- Shakira, @shakira, 42.7M
- Jennifer Lopez, @JLo, 39.3M
- Aamir Khan, @aamir_khan, 19.8 M
- Agnez Mo, @agnezmo, 16.2M
- Triple X Movie???, @deepikapadukone, 17.3M
- Maroon 5, @maroon5, 13.7M
- shaquille o’neal, @SHAQ, 13.2M
- Thalia, @thalia, 8.77M
- Pegg News, @simongpegg, 6.63M
I redirected this tweet from Shakira. Note that there was no embedded media at first in this tweet. So twitter re-crawled that link and embedded the video for me, nice (:
Limitations & improvements
The most problematic limitation is the one put in place by the Twitter API. You can only call user_timeline() 16 times with count=200 for one user. That means we can only download 16*200=3200 tweets. Moreover they generally restrict API access so the process is quite time consuming.
The best way to do this would be to either have access to all tweets or to start saving them. I didn’t find any service that really gives all tweets for all users but some websites like TrumpTwitterArchive are archiving tweets from @POTUS & other political figures but they don’t provide an API. However it should be possible to crawl them using Selenium/Scrapy.
I also noticed a lot of bit.ly & smarturl.it that I don’t follow if they were double shortened links. Following those could potentially give more results. On top of that the pytonwhois module that I used raises some exceptions for certain TLDs…
Feel free to make a PR or fork it on Github. Excuse my Python (: