Today, We're going to build a script that scrapes Twitter to gather stock ticker symbols. We'll use those symbols to scrape yahoo finance for stock Options data. To ensure we can download all the Options data, we’ll make each web request with High Availability Onion Routing. In the end, we’ll do some Pandas magic to pull the first out of the money call contract for each symbol into the final watchlist.
You should be familiar with Docker, Python 3.8, setting up a virtual environment with Poetry, and be comfortable working with Pandas DataFrames. Also, it helps to know a little bit about OTM call options contracts and implied volatility.
You will need all the above requirements properly install to follow along with this tutorial. It may work on another OS, but I've only tested it on Linux.
Let's prepare the virtual environment using poetry. Open the terminal and run the following commands.
$ poetry new options_bot
$ cd options_bot
.
├── options_bot
│ └── __init__.py
├── pyproject.toml
├── README.rst
└── tests
├── __init__.py
└── test_options_bot.py
2 directories, 5 files
You will have a directory tree that looks like this. The options_bot and tests directories won't be used. Poetry will only be used to manage the dependencies and virtual environment.
Now, let’s add the dependencies. I'll explain each one in the import section.
$ poetry add nitter-scraper requests-whaor yfs pandas
Start the virtual environment and create a main.py file.
$ poetry shell
$ touch main.py
Open main.py with your favorite code editor and start coding.
from concurrent.futures import as_completed, ThreadPoolExecutor
from nitter_scraper import NitterScraper
import pandas
from requests_whaor import RequestsWhaor
from yfs import fuzzy_search, get_options_page
from concurrent.futures import as_completed, ThreadPoolExecutor
The ThreadPoolExecutor will be used to call the yfs functions fuzzy_search and get_options_page with a pool of threads. This will allow asynchronous requests.
from nitter_scraper import NitterScraper
The nitter_scraper library is used to scrape tweets. It starts and stops a docker container instance of nitter.
import pandas
The Pandas library is used to clean and concatenate the final DataFrame.
from requests_whaor import RequestsWhaor
The requests_whaor library will supply a rotating proxy pool to proxy our requests, giving each request a unique proxy address. If a request times out or gets an erroneous response code from the server, it will retry with another proxy address.
from yfs import fuzzy_search, get_options_page
The yfs library is used for ticker symbol validation and options data.
We'll be scraping the @eWhispers Twitter Account for cashtags. A cashtag is similar to a hashtag but begins with a $ and is normal associated with a stock ticker symbol, Bitcoin, or shitcoin. @eWhispers tweets about upcoming stock earnings and averages about 30 to 40 cashtags per tweet. This should give you a ton of stock ticker symbols to play with.
cashtag_list = []
with NitterScraper(port=8008) as nitter:
for tweet in nitter.get_tweets("eWhispers", pages=1):
if tweet.is_pinned:
continue
if tweet.is_retweet:
continue
if tweet.entries.cashtags:
cashtag_list += tweet.entries.cashtags
print(".", end="", flush=True) # Simple progress bar.
print() # End progress bar with newline.
cashtag_list = sorted(set(map(lambda cashtag: cashtag.replace("$", "").strip(), cashtag_list)))
Here's the code. Let's break it down.
cashtag_list = []
The cashtag_list will hold all cashtags found from the @eWhispers tweets.
with NitterScraper(port=8008) as nitter:
The NitterScraper ContextManager will start the nitter docker container and returns a nitter object. The port 8008 is used to ensure we start up the docker container on a unique port.
for tweet in nitter.get_tweets("eWhispers", pages=1):
Here we use the nitter.get_tweets method to scrape tweets. We only want to search the first page. Each page will yield approximately 20 tweets.
if tweet.is_pinned:
continue
if tweet.is_retweet:
continue
if tweet.entries.cashtags:
cashtag_list += tweet.entries.cashtags
Skip the pinned tweet and retweets. Then check if the content of the tweet has cashtags. If a list of cashtags is found, they are added to the cashtag_list.
print(".", end="", flush=True) # Simple progress bar.
print() # End progress bar with newline.
This will print a simple progress bar to keep us from getting bored.
cashtag_list = sorted(set(map(lambda cashtag: cashtag.replace("$", "").strip(), cashtag_list)))
Now we sort, remove duplicates, and clean dollar signs $ from each cashtag. Now we have a clean list of cashtag symbols, almost ready to start searching for option data.
Before we scrape options data, we will prepare some variables which will make it easier to change settings, store validated symbols, and store call option chains.
valid_symbols = []
call_chains = []
MAX_THREADS = 6
MAX_PROXIES = 6
Here's the code. Let's break it down.
valid_symbols = []
Before downloading options data, we'll validate each ticker against yahoo finance's quote lookup. The yfs library provides the fuzzy_search function, which uses the quote lookup to verify each symbol is a US stock ticker. After we verify each symbol exists and is a US stock symbol, we append it to the valid_symbols list.
This is the quote lookup fuzzy_search uses to validate symbols.
call_chains = []
Call Options chains objects found from symbols are stored in the call_chains list.
MAX_THREADS = 6
MAX_THREADS is the max amount of threads the ThreadPoolExecutor and RequestsWhaor are allowed to use. RequestsWhaor can use threads to speed up starting and stopping Docker containers.
MAX_PROXIES is the size of the rotating proxy pool. Each proxy is a separate docker container running a TOR circuit.
You can modify the MAX_THREADS and MAX_PROXIES variables to fit your system's performance.
with RequestsWhaor(onion_count=MAX_PROXIES, max_threads=MAX_THREADS) as request_whaor:
with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
futures = [
executor.submit(fuzzy_search, ticker, session=request_whaor) for ticker in cashtag_list
]
for future in as_completed(futures):
try:
result = future.result(timeout=60)
# timeout if the response takes too long.
if result:
valid_symbols.append(result.symbol)
print(".", end="", flush=True) # Simple progress bar.
except Exception as exc:
# We want to pass on exceptions.
print("\n", exc)
print() # End progress bar with newline.
print("twitter cashtag count:", len(cashtag_list))
print("validated symbol count:", len(valid_symbols))
request_whaor.restart_onions() # Fresh proxy pool.
futures = [
executor.submit(
get_options_page,
ticker,
after_days=60,
first_chain=True,
use_fuzzy_search=False,
session=request_whaor,
page_not_found_ok=True,
)
for ticker in valid_symbols
]
for future in as_completed(futures):
try:
result = future.result(timeout=60)
# timeout if the response takes too long.
if result:
call_chains.append(result.calls)
print(".", end="", flush=True) # Simple progress bar.
except Exception as exc:
# We want to pass on exceptions.
print("\n", exc)
print() # End progress bar with newline.
Here's the bulk of the script. Let's break it down.
with RequestsWhaor(onion_count=MAX_PROXIES, max_threads=MAX_THREADS) as request_whaor:
The RequestsWhaor ContextManager will take care of starting up the rotating proxy network. In this example, only two arguments are passed. MAX_PROXIES is passed to the onion_count, which is the number of docker containers running TOR circuits. MAX_THREADS is passed to determine how many threads will be used to start and stop TOR containers asynchronously. We will use the yielded request_whaor object to pass as a session-like object and restart the TOR circuits, giving a fresh pool of proxies.
with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
The ThreadPoolExecutor is used to execute the fuzzy_search and get_options_page functions asynchronously.
futures = [
executor.submit(fuzzy_search, ticker, session=request_whaor) for ticker in cashtag_list
]
A list comprehension is used to iterate over each ticker in the cashtag_list. Each ticker will be passed to the fuzzy_search function as an argument. Additionally, we pass the request_whaor object to the session keyword argument. This lets fuzzy_search send GET requests with requests_whaor vs. the vanilla requests module. requests_whaor will ensure requests are retried on failed responses and connection, timeout, and proxy errors.
The executor's submit method takes care of scheduling the fuzzy_search function and returns a Future object. Read more about the Executor here.
for future in as_completed(futures):
try:
result = future.result(timeout=60)
# timeout if the response takes too long.
if result:
valid_symbols.append(result.symbol)
print(".", end="", flush=True) # Simple progress bar.
except Exception as exc:
# We want to pass on exceptions.
print("\n", exc)
print() # End progress bar with newline.
We use the as_completed function to iterate over the returned futures as they complete. The result method is called on each Future Object to get the returned value. If a valid result is returned, it is appended to the valid_symbols list.
We pass on all Exceptions to keep things running. Also, we use the same progress bar pattern as we did when we scraped tweets.
print("twitter cashtag count:", len(cashtag_list))
print("validated symbol count:", len(valid_symbols))
Here we'll print the count of cashtags we found on Twitter and symbols we found when validating with fuzzy_search to compare.
request_whaor.restart_onions() # Fresh proxy pool.
Now that we have made about 100 plus requests to yahoo finance's servers, we'll want to get a fresh pool of proxies before making the next
round of requests. We'll use the restart_onions method to get a fresh pool
of proxies to proxy our next round of requests through.
futures = [
executor.submit(
get_options_page,
ticker,
after_days=60,
first_chain=True,
use_fuzzy_search=False,
session=request_whaor,
page_not_found_ok=True,
)
for ticker in valid_symbols
]
This is similar to the fuzzy_search futures section. We are just passing the
get_options_page function to the executor and a few more arguments. Additionally, we are iterating over the results in the valid_symbols list. Let's go over each argument.
for future in as_completed(futures):
try:
result = future.result(timeout=60)
# timeout if the response takes too long.
if result:
call_chains.append(result.calls)
print(".", end="", flush=True) # Simple progress bar.
except Exception as exc:
# We want to pass on exceptions.
print("\n", exc)
print() # End progress bar with newline.
Again pretty similar to the fuzzy_search section. We iterate over the
returned futures as they are completed and call the result method to get
the values. Now, the get_options_page function will return an
OptionsChain Object containing the call and put options data. After
checking if the result exists, we append only the call options chain to the
call_chains list. In the end, we pass on any exceptions.
Panda's Magic
options_watchlist = []
for chain in call_chains:
dataframe = chain.dataframe
otm = dataframe["in_the_money"] == False
single_contract = dataframe[otm].head(1)
options_watchlist.append(single_contract)
final = pandas.concat(options_watchlist, ignore_index=True)
final["expiration"] = final["expiration_date"].dt.date
final.sort_values(by="implied_volatility", inplace=True)
final.reset_index(inplace=True)
final.drop(
columns=["index", "timestamp", "contract_name", "expiration_date", "in_the_money"],
inplace=True,
)
print(final)
This is the final section. Here we'll use Pandas to clean, concatenate and sort the final DataFrame. Here's the code. Let's break it down.
options_watchlist = []
Now we have a bunch of call option chains from multiple symbols. We will store the first out of the money strike from each option chain in the options_watchlist.
for chain in call_chains:
dataframe = chain.dataframe
otm = dataframe["in_the_money"] == False
single_contract = dataframe[otm].head(1)
options_watchlist.append(single_contract)
Next, we iterate over each call option chain and convert each OptionChain Object into a DataFrame using the dataframe property. Then we filter out all rows where the in_the_money column is False and use the DataFrame's head method to get the first one OTM contract. After that, the single OTM contract row DataFrame is appended to the options_watchlist.
final = pandas.concat(options_watchlist, ignore_index=True)
We use the Pandas concat method to concatenate the options_watchlist of single row DataFrames into one named final.
final["expiration"] = final["expiration_date"].dt.date
Here we convert the expiration_date column from a DateTime to a Date. Additionally, the column is renamed to expiration. This will help shorten up the output.
final.sort_values(by="implied_volatility", inplace=True)
Now, we sort_values by implied_volatility because why not. ¯\_(ツ)_/¯
final.reset_index(inplace=True)
Here reset_index is used to create a new index since we sorted the values by implied volatility.
final.drop(
columns=["index", "timestamp", "contract_name", "expiration_date", "in_the_money"],
inplace=True,
)
We drop a few columns to shorten up the output some more.
print(final)
Finally, the Dataframe is printed to the console.
Run the script
$ python3 main.py
symbol contract_type strike last_price bid ask change percent_change volume open_interest implied_volatility expiration
0 CMTL call 17.5 1.50 0.00 0.00 0.00 None 6.0 318.0 3.13 2021-01-15
1 ONB call 15.0 0.75 0.00 0.00 0.00 NaN 19.0 519.0 6.25 2020-12-18
2 VZ call 60.0 1.30 1.27 1.32 -0.10 -7.14 414.0 35068.0 18.34 2021-01-15
3 PG call 145.0 4.60 4.15 4.50 0.10 2.22 70.0 1014.0 20.01 2020-12-18
4 JNJ call 150.0 3.92 3.90 4.05 0.17 4.53 73.0 3615.0 20.01 2020-12-18
.. ... ... ... ... ... ... ... ... ... ... ... ...
148 ACB call 5.0 0.45 0.45 0.46 -0.15 -25 342.0 1935.0 116.41 2020-12-18
149 QTT call 2.5 0.40 0.20 0.75 -0.10 -20 17.0 100.0 117.19 2021-01-15
150 LLNW call 6.0 1.25 1.20 1.25 -0.10 -7.41 49.0 2775.0 125.78 2020-12-18
151 SANW call 2.5 0.55 0.35 1.55 0.00 None 1.0 6.0 193.75 2021-02-19
152 BCLI call 15.0 6.25 5.70 6.50 -0.65 -9.42 11.0 918.0 296.00 2020-12-18
[153 rows x 12 columns]
And the final output is a DataFrame of OTM call options sorted by IV.
I hope you had fun writing the script. It would be effortless to modify it to search multiple Twitter users or dump the option data to a database periodically. Thanks for reading.
CONTACT INFO
Github = Telegram = Twitter = Tradingview = Discord = @dgnsrekt
Email = [email protected]
from concurrent.futures import as_completed, ThreadPoolExecutor
from nitter_scraper import NitterScraper
import pandas
from requests_whaor import RequestsWhaor
from yfs import fuzzy_search, get_options_page
cashtag_list = []
with NitterScraper(port=8008) as nitter:
for tweet in nitter.get_tweets("eWhispers", pages=1):
if tweet.is_pinned:
continue
if tweet.is_retweet:
continue
if tweet.entries.cashtags:
cashtag_list += tweet.entries.cashtags
print(".", end="", flush=True) # Simple progress bar.
print() # End progress bar with newline.
cashtag_list = sorted(set(map(lambda cashtag: cashtag.replace("$", "").strip(), cashtag_list)))
valid_symbols = []
call_chains = []
MAX_THREADS = 6
MAX_PROXIES = 6
with RequestsWhaor(onion_count=MAX_PROXIES, max_threads=MAX_THREADS) as request_whaor:
with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
futures = [
executor.submit(fuzzy_search, ticker, session=request_whaor) for ticker in cashtag_list
]
for future in as_completed(futures):
try:
result = future.result(timeout=60)
# timeout if the response takes too long.
if result:
valid_symbols.append(result.symbol)
print(".", end="", flush=True) # Simple progress bar.
except Exception as exc:
# We want to pass on exceptions.
print("\n", exc)
print() # End progress bar with newline.
print("twitter cashtag count:", len(cashtag_list))
print("validated symbol count:", len(valid_symbols))
request_whaor.restart_onions() # Fresh proxy pool.
futures = [
executor.submit(
get_options_page,
ticker,
after_days=60,
first_chain=True,
use_fuzzy_search=False,
session=request_whaor,
page_not_found_ok=True,
)
for ticker in valid_symbols
]
for future in as_completed(futures):
try:
result = future.result(timeout=60)
# timeout if the response takes too long.
if result:
call_chains.append(result.calls)
print(".", end="", flush=True) # Simple progress bar.
except Exception as exc:
# We want to pass on exceptions.
print("\n", exc)
print() # End progress bar with newline.
options_watchlist = []
for chain in call_chains:
dataframe = chain.dataframe
otm = dataframe["in_the_money"] == False
single_contract = dataframe[otm].head(1)
options_watchlist.append(single_contract)
final = pandas.concat(options_watchlist, ignore_index=True)
final["expiration"] = final["expiration_date"].dt.date
final.sort_values(by="implied_volatility", inplace=True)
final.reset_index(inplace=True)
final.drop(
columns=["index", "timestamp", "contract_name", "expiration_date", "in_the_money"],
inplace=True,
)
print(final)