: 15.12.2018 Published on Disclaimer:This is primarily . written from Python programming language ecosystem point of view I have noticed that has become for data from web pages. Selenium quite popular scraping Yes, . you can use Selenium for web scraping, but it is not a good idea Also personally, I think that articles that teach how to use for web are giving a . Selenium scraping bad example of what tool to use for web scraping Why you should not use Selenium for web scraping First, . Selenium is not a web scraping tool It is and this statement is from the homepage of Selenium. “for automating web applications for testing purposes” Second, in Python, there is a better tool open-source . Scrapy web-crawling framework The intelligent reader will ask: “ ” What is a benefit in using Scrapy over Python? (not Amphetamine :-)), speed in development and speed in web scraping time. You get speed and a lot of speed There are tips on how to make scraping , and . Selenium web faster if you use Scrapy then you do not have those kinds of problems and you are faster Just because these articles exist is proof (at least for me) that , an example of “ “. people are using the wrong tool for the job When your only tool is a hammer, everything looks like a nail For what should you use Selenium I personally only use Selenium for . web page testing I (if there are no other options), but I never had that use case so far. would try to use it for automating web applications Exception on when you can use Selenium The only exception that I could see for using Selenium as web scraping tool is . if a website that you are scraping is using JavaScript to get/display data that you need to scrape Scrapy does have the with , but I have never used it, so far I always found some workaround. solution for JavaScript Splash What to use instead of Selenium for web scraping As you can guess, my advice is to use . Scrapy developing web scraping programs (web spiders) and execution time is fast. I choose Scrapy because I spend less time I have found Scrapy to be faster in development time because of a and . Scrapy shell cache In execution, it , this means that , just that you are not confused when debugging. is fast because multiple requests can be done simultaneously data delivery will not be in the same order as requested What about + Beautiful Soup Requests I have used this combination in the past before I . decided to invest time in learning Scrapy , development time and execution time is much faster with Scrapy than with any other tool that I have found so far. Do not make the same mistake as I did Last words This is not rant about using . Selenium for web scraping, for not production system and learning/hobby it is fine I get it, , that is a huge benefit for people starting to do/learn web scraping and it is important to have this kind of when you are learning something new. Selenium is easy to start and you can see what is happing in real time on your screen early moral bosts But I do think that all these article and tutorial using Selenium for web scraping should have a disclaimer . not to use Selenium in real life (if you need to scrape 100K pages in a day, it is not possible to do it in single Selenium instance) To start with Scrapy it is harder, you have to write XPath selectors and look at source code of HTML page to debug is not fun, . but if you want to have fast web scraping that is the price Conclusion (Selenium just have a lower-angle learning curve), I personally needed a few days to get the basics. After you learn Scrapy you will be faster than with Selenium Originally published at buklijas.info on December 15, 2018.

Do not use Selenium for web scraping

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Automatic backup of git repositories to Dropbox with Python

0–100 in Django: Starting an app the right way

The Noonification: Reduce Javascript: Master the Basics (1/11/2023)

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Top 10 CSS Performance Tips (11/14/2022)

The Noonification: How to Use AI for Your B2B Marketing (11/11/2022)

Automatic backup of git repositories to Dropbox with Python

0–100 in Django: Starting an app the right way

The Noonification: Reduce Javascript: Master the Basics (1/11/2023)

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Top 10 CSS Performance Tips (11/14/2022)

The Noonification: How to Use AI for Your B2B Marketing (11/11/2022)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps