Today, We're going to build a script that scrapes to gather stock ticker symbols. We'll use those symbols to scrape for stock Options data. To ensure we can download all the Options data, we’ll make each web . In the end, we’ll do some magic to pull the first out of the money call contract for each symbol into the final . Twitter yahoo finance request ith igh vailability nion outing w H A O R Pandas watchlist What you should know You should be familiar with , 3.8, setting up a with , and be comfortable working with . Also, it helps to know a little bit about and . Docker Python virtual environment Poetry Pandas DataFrames OTM contracts call options implied volatility Requirements Docker Engine Python 3.8 Poetry Linux You will need all the above requirements install to follow along with this tutorial. It may work on another OS, but I've only tested it on . properly Linux Environment Preparation Let's prepare the using poetry. Open the terminal and run the following commands. virtual environment $ poetry new options_bot $ options_bot cd . ├── options_bot │ └── __init__.py ├── pyproject.toml ├── README.rst └── tests ├── __init__.py └── test_options_bot.py 2 directories, 5 files You will have a that looks like this. The and directories won't be used. Poetry will only be used to manage the dependencies and virtual environment. directory tree options_bot tests Python Dependencies yfs nitter-scraper requests-whaor pandas Now, let’s add the dependencies. I'll explain each one in the import section. $ poetry add nitter-scraper requests-whaor yfs pandas Start the virtual environment and create a main.py file. $ poetry shell $ touch main.py Open main.py with your and start coding. favorite code editor Script Overview Scrape @eWhispers tweets for cashtags with nitter-scraper. Clean the cashtags to build a list of stock ticker symbols. Scrape yahoo finance for call option data using the yfs library. Then, to avoid rate-limiting and download errors, we use requests-whaor to build a network of TOR nodes to proxy all yfs requests. Clean and concatenate the resulting options data into a single Pandas Dataframe. What are we importing? concurrent.futures as_completed, ThreadPoolExecutor nitter_scraper NitterScraper pandas requests_whaor RequestsWhaor yfs fuzzy_search, get_options_page from import from import import from import from import concurrent.futures as_completed, ThreadPoolExecutor from import The will be used to call the functions and with a pool of threads. This will allow asynchronous requests. ThreadPoolExecutor yfs fuzzy_search get_options_page nitter_scraper NitterScraper from import The library is used to scrape tweets. It starts and stops a docker container instance of . nitter_scraper nitter pandas import The library is used to clean and concatenate the final DataFrame. Pandas requests_whaor RequestsWhaor from import The library will supply a rotating proxy pool to proxy our requests, giving each request a unique proxy address. If a request times out or gets an erroneous response code from the server, it will retry with another proxy address. requests_whaor yfs fuzzy_search, get_options_page from import The library is used for and . yfs ticker symbol validation options data Scrape Twitter for cashtags We'll be scraping the Twitter Account for . A cashtag is similar to a hashtag but begins with a and is normal associated with a , , or . @eWhispers tweets about upcoming stock earnings and averages about 30 to 40 cashtags per tweet. This should give you a ton of stock ticker symbols to play with. @eWhispers cashtags $ stock ticker symbol Bitcoin shitcoin cashtag_list = [] NitterScraper(port= ) nitter: tweet nitter.get_tweets( , pages= ): tweet.is_pinned: tweet.is_retweet: tweet.entries.cashtags: cashtag_list += tweet.entries.cashtags print( , end= , flush= ) print() cashtag_list = sorted(set(map( cashtag: cashtag.replace( , ).strip(), cashtag_list))) with 8008 as for in "eWhispers" 1 if continue if continue if "." "" True # Simple progress bar. # End progress bar with newline. lambda "$" "" Here's the code. Let's break it down. cashtag_list = [] The will hold all cashtags found from the @eWhispers tweets. cashtag_list NitterScraper(port= ) nitter: with 8008 as The ContextManager will start the nitter docker container and returns a object. The 8008 is used to ensure we start up the docker container on a unique port. NitterScraper nitter port tweet nitter.get_tweets( , pages= ): for in "eWhispers" 1 Here we use the method to scrape tweets. We only want to search the first . Each page will yield approximately 20 tweets. nitter.get_tweets page tweet.is_pinned: tweet.is_retweet: tweet.entries.cashtags: cashtag_list += tweet.entries.cashtags if continue if continue if Skip the tweet and . Then check if the content of the If a list of cashtags is found, they are added to the . pinned retweets tweet has cashtags. cashtag_list print( , end= , flush= ) print() "." "" True # Simple progress bar. # End progress bar with newline. This will print a to keep us from getting bored. simple progress bar cashtag_list = sorted(set(map( cashtag: cashtag.replace( , ).strip(), cashtag_list))) lambda "$" "" Now we , , and dollar signs $ from each cashtag. Now we have a clean list of cashtag symbols, almost ready to start searching for option data. sort remove duplicates clean Preparing the variables. Before we scrape options data, we will prepare some variables which will make it easier to change settings, store validated symbols, and store call option chains. valid_symbols = [] call_chains = [] MAX_THREADS = MAX_PROXIES = 6 6 Here's the code. Let's break it down. valid_symbols = [] Before downloading options data, we'll validate each ticker against yahoo finance's quote lookup. The yfs library provides the function, which uses the quote lookup to verify each symbol is a US stock ticker. After we verify each symbol exists and is a US stock symbol, we append it to the list. fuzzy_search valid_symbols This is the fuzzy_search uses to validate symbols. quote lookup call_chains = [] objects found from symbols are stored in the list. Call Options chains call_chains MAX_THREADS = 6 is the max amount of threads the ThreadPoolExecutor and RequestsWhaor are allowed to use. RequestsWhaor can use threads to speed up starting and stopping Docker containers. MAX_THREADS is the size of the rotating proxy pool. Each proxy is a separate docker container running a . MAX_PROXIES TOR circuit You can modify the and variables to fit your system's performance. MAX_THREADS MAX_PROXIES Validate symbols and scrape options data. RequestsWhaor(onion_count=MAX_PROXIES, max_threads=MAX_THREADS) request_whaor: ThreadPoolExecutor(max_workers=MAX_THREADS) executor: futures = [ executor.submit(fuzzy_search, ticker, session=request_whaor) ticker cashtag_list ] future as_completed(futures): : result = future.result(timeout= ) result: valid_symbols.append(result.symbol) print( , end= , flush= ) Exception exc: print( , exc) print() print( , len(cashtag_list)) print( , len(valid_symbols)) request_whaor.restart_onions() futures = [ executor.submit( get_options_page, ticker, after_days= , first_chain= , use_fuzzy_search= , session=request_whaor, page_not_found_ok= , ) ticker valid_symbols ] future as_completed(futures): : result = future.result(timeout= ) result: call_chains.append(result.calls) print( , end= , flush= ) Exception exc: print( , exc) print() with as with as for in for in try 60 # timeout if the response takes too long. if "." "" True # Simple progress bar. except as # We want to pass on exceptions. "\n" # End progress bar with newline. "twitter cashtag count:" "validated symbol count:" # Fresh proxy pool. 60 True False True for in for in try 60 # timeout if the response takes too long. if "." "" True # Simple progress bar. except as # We want to pass on exceptions. "\n" # End progress bar with newline. Here's the bulk of the script. Let's break it down. RequestsWhaor(onion_count=MAX_PROXIES, max_threads=MAX_THREADS) request_whaor: with as The ContextManager will take care of starting up the rotating proxy network. In this example, only two arguments are passed. is passed to the , which is the number of docker containers running TOR circuits. is passed to determine how many threads will be used to start and stop asynchronously. We will use the yielded object to pass as a , giving a fresh pool of proxies. RequestsWhaor MAX_PROXIES onion_count MAX_THREADS TOR containers request_whaor session-like object and restart the TOR circuits ThreadPoolExecutor(max_workers=MAX_THREADS) executor: with as The is used to execute the and functions asynchronously. ThreadPoolExecutor fuzzy_search get_options_page futures = [ executor.submit(fuzzy_search, ticker, session=request_whaor) ticker cashtag_list ] for in A list comprehension is used to iterate over each ticker in the . Each will be passed to the function as an argument. Additionally, we pass the This lets send GET requests with r vs. the vanilla module. will ensure requests are retried on failed responses and connection, timeout, and proxy errors. cashtag_list ticker fuzzy_search object to the keyword argument. request_whaor session fuzzy_search requests_whao requests requests_whaor The 's takes care of scheduling the function and returns a Future object. Read more about the here. executor method submit fuzzy_search Executor future as_completed(futures): : result = future.result(timeout= ) result: valid_symbols.append(result.symbol) print( , end= , flush= ) Exception exc: print( , exc) print() for in try 60 # timeout if the response takes too long. if "." "" True # Simple progress bar. except as # We want to pass on exceptions. "\n" # End progress bar with newline. We use the function to iterate over the returned as they complete. The is called on each to get the returned value. If a valid result is returned, it is appended to the list. as_completed futures method result Future Object valid_symbols We pass on all to keep things running. Also, we use the same pattern as we did when we scraped tweets. Exceptions progress bar print( , len(cashtag_list)) print( , len(valid_symbols)) "twitter cashtag count:" "validated symbol count:" Here we'll the count of cashtags we found on Twitter and symbols we found when validating with fuzzy_search to compare. print request_whaor.restart_onions() # Fresh proxy pool. Now that we have made about 100 plus requests to yahoo finance's servers, we'll want to get a fresh pool of proxies before making the next round of requests. We'll use the method to get a fresh pool of proxies to proxy our next round of requests through. restart_onions futures = [ executor.submit( get_options_page, ticker, after_days= , first_chain= , use_fuzzy_search= , session=request_whaor, page_not_found_ok= , ) ticker valid_symbols ] 60 True False True for in This is similar to the fuzzy_search futures section. We are just passing the function to the and a few more arguments. Additionally, we are iterating over the results in the list. Let's go over each argument. get_options_page executor valid_symbols = Filters out options that have less than 60 days to expire. after_days 60 - = - This will only return the first option chain that meets the option expiration requirements. first_chain True = This will turn off the fuzzy_search, which is built into the get_options function. We decided to asynchronously do the fuzzy_search earlier to filter out bad tickers before this stage, no need to do it again. use_fuzzy_search False - = Just like the fuzzy_search section, we pass the request_whaor as a session-like object to make proxied requests wHOAR. session request_whaor - = This will return None when a symbol's options data is not found instead of raising an exception. page_not_found_ok True - future as_completed(futures): : result = future.result(timeout= ) result: call_chains.append(result.calls) print( , end= , flush= ) Exception exc: print( , exc) print() for in try 60 # timeout if the response takes too long. if "." "" True # Simple progress bar. except as # We want to pass on exceptions. "\n" # End progress bar with newline. Again pretty similar to the fuzzy_search section. We iterate over the returned as they are completed and call the method to get the values. Now, the function will return an containing the call and put options data. After checking if the result exists, we append only the to the list. In the end, we pass on any exceptions. futures result get_options_page OptionsChain Object call options chain call_chains Panda's Magic options_watchlist = [] chain call_chains: dataframe = chain.dataframe otm = dataframe[ ] == single_contract = dataframe[otm].head( ) options_watchlist.append(single_contract) final = pandas.concat(options_watchlist, ignore_index= ) final[ ] = final[ ].dt.date final.sort_values(by= , inplace= ) final.reset_index(inplace= ) final.drop( columns=[ , , , , ], inplace= , ) print(final) for in "in_the_money" False 1 True "expiration" "expiration_date" "implied_volatility" True True "index" "timestamp" "contract_name" "expiration_date" "in_the_money" True This is the final section. Here we'll use to , and the final DataFrame. Here's the code. Let's break it down. Pandas clean concatenate sort options_watchlist = [] Now we have a bunch of call option chains from multiple symbols. We will store the first out of the money strike from each option chain in the . options_watchlist chain call_chains: dataframe = chain.dataframe otm = dataframe[ ] == single_contract = dataframe[otm].head( ) options_watchlist.append(single_contract) for in "in_the_money" False 1 Next, we iterate over each call option chain and convert each into a DataFrame using the property. Then we filter out all rows where the in_the_money column is False and use the DataFrame's to get the first one OTM contract. After that, the single OTM contract row DataFrame is appended to the . OptionChain Object dataframe method head options_watchlist final = pandas.concat(options_watchlist, ignore_index= ) True We use the Pandas method to concatenate the of single row DataFrames into one named . concat options_watchlist final final[ ] = final[ ].dt.date "expiration" "expiration_date" Here we convert the column from a . Additionally, the column is renamed to . This will help shorten up the output. expiration_date DateTime to a Date expiration final.sort_values(by= , inplace= ) "implied_volatility" True Now, we by because why not. ¯\_(ツ)_/¯ sort_values implied_volatility final.reset_index(inplace= ) True Here is used to create a new index since we sorted the values by implied volatility. reset_index final.drop( columns=[ , , , , ], inplace= , ) "index" "timestamp" "contract_name" "expiration_date" "in_the_money" True We a few to shorten up the output some more. drop columns print(final) Finally, the Dataframe is printed to the console. Run the script $ python3 main.py Watch the show at asciinema. symbol contract_type strike last_price bid ask change percent_change volume open_interest implied_volatility expiration CMTL call ONB call NaN VZ call PG call JNJ call .. ... ... ... ... ... ... ... ... ... ... ... ... ACB call QTT call LLNW call SANW call BCLI call [ rows x columns] 0 17.5 1.50 0.00 0.00 0.00 None 6.0 318.0 3.13 2021 -01 -15 1 15.0 0.75 0.00 0.00 0.00 19.0 519.0 6.25 2020 -12 -18 2 60.0 1.30 1.27 1.32 -0.10 -7.14 414.0 35068.0 18.34 2021 -01 -15 3 145.0 4.60 4.15 4.50 0.10 2.22 70.0 1014.0 20.01 2020 -12 -18 4 150.0 3.92 3.90 4.05 0.17 4.53 73.0 3615.0 20.01 2020 -12 -18 148 5.0 0.45 0.45 0.46 -0.15 -25 342.0 1935.0 116.41 2020 -12 -18 149 2.5 0.40 0.20 0.75 -0.10 -20 17.0 100.0 117.19 2021 -01 -15 150 6.0 1.25 1.20 1.25 -0.10 -7.41 49.0 2775.0 125.78 2020 -12 -18 151 2.5 0.55 0.35 1.55 0.00 None 1.0 6.0 193.75 2021 -02 -19 152 15.0 6.25 5.70 6.50 -0.65 -9.42 11.0 918.0 296.00 2020 -12 -18 153 12 And the final output is a DataFrame of call options sorted by . OTM IV I hope you had fun writing the script. It would be effortless to modify it to search multiple Twitter users or dump the option data to a database periodically. Thanks for reading. CONTACT INFO = = = = Discord = @dgnsrekt Email = Github Telegram Twitter Tradingview dgnsrekt@pm.me Full script concurrent.futures as_completed, ThreadPoolExecutor nitter_scraper NitterScraper pandas requests_whaor RequestsWhaor yfs fuzzy_search, get_options_page cashtag_list = [] NitterScraper(port= ) nitter: tweet nitter.get_tweets( , pages= ): tweet.is_pinned: tweet.is_retweet: tweet.entries.cashtags: cashtag_list += tweet.entries.cashtags print( , end= , flush= ) print() cashtag_list = sorted(set(map( cashtag: cashtag.replace( , ).strip(), cashtag_list))) valid_symbols = [] call_chains = [] MAX_THREADS = MAX_PROXIES = RequestsWhaor(onion_count=MAX_PROXIES, max_threads=MAX_THREADS) request_whaor: ThreadPoolExecutor(max_workers=MAX_THREADS) executor: futures = [ executor.submit(fuzzy_search, ticker, session=request_whaor) ticker cashtag_list ] future as_completed(futures): : result = future.result(timeout= ) result: valid_symbols.append(result.symbol) print( , end= , flush= ) Exception exc: print( , exc) print() print( , len(cashtag_list)) print( , len(valid_symbols)) request_whaor.restart_onions() futures = [ executor.submit( get_options_page, ticker, after_days= , first_chain= , use_fuzzy_search= , session=request_whaor, page_not_found_ok= , ) ticker valid_symbols ] future as_completed(futures): : result = future.result(timeout= ) result: call_chains.append(result.calls) print( , end= , flush= ) Exception exc: print( , exc) print() options_watchlist = [] chain call_chains: dataframe = chain.dataframe otm = dataframe[ ] == single_contract = dataframe[otm].head( ) options_watchlist.append(single_contract) final = pandas.concat(options_watchlist, ignore_index= ) final[ ] = final[ ].dt.date final.sort_values(by= , inplace= ) final.reset_index(inplace= ) final.drop( columns=[ , , , , ], inplace= , ) print(final) from import from import import from import from import with 8008 as for in "eWhispers" 1 if continue if continue if "." "" True # Simple progress bar. # End progress bar with newline. lambda "$" "" 6 6 with as with as for in for in try 60 # timeout if the response takes too long. if "." "" True # Simple progress bar. except as # We want to pass on exceptions. "\n" # End progress bar with newline. "twitter cashtag count:" "validated symbol count:" # Fresh proxy pool. 60 True False True for in for in try 60 # timeout if the response takes too long. if "." "" True # Simple progress bar. except as # We want to pass on exceptions. "\n" # End progress bar with newline. for in "in_the_money" False 1 True "expiration" "expiration_date" "implied_volatility" True True "index" "timestamp" "contract_name" "expiration_date" "in_the_money" True