Twitter is beginning to take over the social media realm. As more communities move to Twitter, we begin to see how valuable data is to advertisers, researchers, and even consumers.
Data is now the next gold rush as we begin to understand how data needs to be extracted, transformed, loaded, and for full benefit, turned into Information. In theory, like gold, data is a commodity.
In this article, I plan to explain how easy it is to scrape Tweets from Twitter in Python3 using Tweepy’s Twitter API. This data can be accessed by Twitter’s API and Tweepy which ended up being the most successful method. I plan to focus on scraping specific user Tweet replies, since I have not discovered any tutorials which specifically highlight how to extract Tweet replies.
If you want to jump straight into the code then you can find the full code on my Github.The Python code requires your Twitter API and consumer keys, as well as the Twitter username you plan to extract replies, and the Tweet ID.
Make sure you have Python installed on your machine. If you do not, I suggest using Anaconda, otherwise read the official Python documentation to find additional resources.
To perform Twitter operations from your machine, I suggest using Tweepy. To install Tweepy, navigate to your environment and run:
Python3:
pip install tweepy
If you’re using Anaconda for Python:
conda install -c conda-forge tweepy
If you would like to interact with Twitter from a computer or machine, you will need to apply for Twitter Developers. The application is straight forward, be honest with your intentions within the application, and you will become approved if you are deemed trustworthy from Twitter. Once approved, you will be able to create an app on the platform which provides you with credentials to authorize from Tweepy or your Python Twitter library.
Twitter for Developers provides access to the Twitter API in order to Publish and analyze Tweets, optimize ads, and create unique customer experiences. Check out the Twitter API documentation here.
Before you are able to use the Twitter API endpoints, create a developer account and generate your API keys. You can apply for a developer account directly here. You must answer questions on how you plan to use the API and accept the Twitter Developer Agreement, and then you will be granted access to the Developer Dashboard.
Once you are approved access to the Developers for Twitter, log in to the developer site and create your App. This step will automatically generate your consumer API keys and access tokens, remember, you should keep them secret:
The developer account should be linked to the Twitter account where you want to have the bot active. From the Twitter Development platform, you are able to edit the app permissions. In my example, I have granted my app permission to read, write and send direct messages.
We must import Tweepy then OAuth interface to collect data as well as csv, and ssl.
import csv
import tweepy
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# Oauth keys
consumer_key = "YOUR_CONSUMER_KEY"
consumer_secret = "YOUR_CONSUMER_SECRET"
access_token = "YOUR_ACCESS_TOKEN"
access_token_secret = "YOUR_ACCESS_SECRET"
# Authentication with Twitter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
To collect tweet replies for a specific user and tweet, we must list the username of the user being scraped, as well as the Tweet ID which can be found by copying from the URL.
# update these for whatever tweet you want to process replies to
name = 'LunarCRUSH'
tweet_id = '1270923526690664448'
replies=[]
for tweet in tweepy.Cursor(api.search,q='to:'+name, result_type='recent', timeout=999999).items(1000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==tweet_id):
replies.append(tweet)
Since I was hoping to analyze the responses, I decided to export all replies to a .csv file format which can be opened in Microsoft Excel or Google Sheets.
Here’s a brief glimpse of the returned csv:
user,text
CryptoEntuziast,@LunarCRUSH @zilliqa ofcourse 🚀🚀🚀😎😎😎
ecossexrp1,@LunarCRUSH $VET $XRP 👌🏻
crypto19711,@LunarCRUSH @DAPScoin the best privacy coin in this world! https://t.co/xFHs3cYFmK
lacryptohero,@LunarCRUSH @Theta_Network
Greenmi74576867,@LunarCRUSH https://t.co/vwmznwu77V
SplendidMite,@LunarCRUSH #VeChain $VET
DAPS_CLimaDelta,"@LunarCRUSH Because I am judging a project for the best technology, transparency, reliable team and dedicated commu… https://t.co/6xS9vdx1oY"
DigiBur,@LunarCRUSH #digibyte
M_SRHI,@LunarCRUSH $ICX 💎 $ELA 💎❤️ $NOIA💎
SURAJ_041,@LunarCRUSH @electroneum #ETN .
GbhoyDownUnder,@LunarCRUSH @maticnetwork
jodibreeze86,@LunarCRUSH Zilliqa and Vechain
ghurabar1,@LunarCRUSH $EWT
SamManzi,@LunarCRUSH @NoiaNetwork @NoiaFr $NOIA
IamDavidGC,@LunarCRUSH Proud of DigiByte community and technology. $dgb
holder2017,@LunarCRUSH @Falcon_SBS #FNT token traded on #exchanges. #Anonymous coin #FNC is not traded anywhere. connected b… https://t.co/0mz7bmaG1k
Lilt8888,@LunarCRUSH It would have to be $ICX
Creeptwo_guy13,@LunarCRUSH That question is way too easy. Absolutely its $ICX #ICON.
BitStreetSheep,@LunarCRUSH #VeChain without question
jms3333333,@LunarCRUSH LInk UBT KNC EWT SOLVE
einnorka,@LunarCRUSH Digibyte
HamishDonalds0n,@LunarCRUSH $icx $vet $zil $ada $eth $link
amity3013,@LunarCRUSH $zil you know it
elianhuesca,"@LunarCRUSH @decredproject by far: hybrid PoW/PoS blockchain, formal governance in place, Treasury with 10% of bloc… https://t.co/oRnMc4UD5P"
AaronMilo,@LunarCRUSH #digibyte https://t.co/000HoTfLqB
majjjubu,@LunarCRUSH Chz
Benjy25680913,@LunarCRUSH $LUNARCRUSH
ItchyTommi,@LunarCRUSH https://t.co/y8l2WwP3qK Stakenet. The one and only
siggebaskero,@LunarCRUSH #PIVX thanks to @PIVX_Community who's doing a great job 💜 Engaging with a growing community like… https://t.co/CBlhJm7gZj
DanXrp,@LunarCRUSH $VET no doubt
crypto1618,@LunarCRUSH #icx
thelionshire,@LunarCRUSH ICON $icx
ChillMa27939777,@LunarCRUSH #Zilliqa #ZIL ✌😎
BeholdTheBeard,@LunarCRUSH Tezos $XTZ Theta $THETA
lennyshots,@LunarCRUSH #DigiByte
Shatochzi,@LunarCRUSH $CHZ #chiliz
RonDalton01,@LunarCRUSH #VET
Realmikeleonard,@LunarCRUSH #XMR no doubt about it
Incognitor00st1,@LunarCRUSH $DGB 🔥
Cryptowhale10,@LunarCRUSH $ICX https://t.co/WQTbyPkpEB
XxVegetta,@LunarCRUSH We are DAPS soliders I have been dedicated to our project for 2 years and I think for many years to co… https://t.co/QLk7kKJkhk
CaliCryptoCo,@LunarCRUSH $ICX man
MoonShotCaller,@LunarCRUSH #VeChain 💙 $VET
Dominic_LTC_DGB,@LunarCRUSH @DigiByteCoin
GrowlerGregg,@LunarCRUSH $LINK
adflondon,@LunarCRUSH We all know its $ICX
SajawalOnTech,@LunarCRUSH To many projects but I guess $Wan $link $Zil $Icx
IconPilipinas,@LunarCRUSH $ICX
jonade,@LunarCRUSH $ZIL
twills2,@LunarCRUSH Do we really have to say it...... $zil 🚀
You can view all of the code to get this working by visiting this link.
import csv
import tweepy
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# Oauth keys
consumer_key = "XXX"
consumer_secret = "XXX"
access_token = "XXX"
access_token_secret = "XXX"
# Authentication with Twitter
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# update these for the tweet you want to process replies to 'name' = the account username and you can find the tweet id within the tweet URL
name = 'LunarCRUSH'
tweet_id = '1270923526690664448'
replies=[]
for tweet in tweepy.Cursor(api.search,q='to:'+name, result_type='recent', timeout=999999).items(1000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==tweet_id):
replies.append(tweet)
with open('replies_clean.csv', 'w') as f:
csv_writer = csv.DictWriter(f, fieldnames=('user', 'text'))
csv_writer.writeheader()
for tweet in replies:
row = {'user': tweet.user.screen_name, 'text': tweet.text.replace('\n', ' ')}
csv_writer.writerow(row)
Within a few lines of code, your configurable Twitter reply scraper now pulls data from Twitter and automatically saved Tweet replies to your machine.
There are a few things that can be done to improve the code, such as mapping multiple replies, or getting the responses from those that replied to the original post. Please let me know in the comments if you have any questions or suggestions.
Knowledge is Power! Share your knowledge, open source your projects , participate in a community (any community!), and maybe just maybe publish a blog post about it.
Thank you For Reading
Constructive criticism and feedback is welcomed. Nicholas Resendez can be reached on Instagram @nirholas, on LinkedIn , and Twitter @nickresendez for updates on new articles.