Scrapy is a web crawling framework which divide the whole process of crawling to small processes so that the crawling process is well organize! Scrapy Crawl Data ( ) -> Rotate proxy or ip ( ) -> spider.py middlewares.py Clean Data ( )-> Store Data( ) items.py pipeline.py With all the settings ( ). setting.py The biggest feature is that it is built on Twisted, an asynchronous networking library, so Scrapy is implemented using a non-blocking (aka asynchronous) code for concurrency, which makes the spider performance is very great. — Michael Yin Selenium Selenium is a free automated testing suite for web applications across different browsers and platforms. Although it was created for automated testing on web app, it is really easy to apply to scrape websites! You just need to -> Download chrome, firefox or other drivers -> Use their API to scrape websites. Issues I faced using Selenium: Speed is quite slow. Need quite a lot memory if you want to build a multi threading crawler to speed the process up. Issues I faced using Scrapy: It is harder to debug. Harder to connect to if you are implementing Scrapy-Splash. Tor Lesser reference to refer when you want to use Scrapy-Splash. Sharing about my experiences: At first, I learned Selenium as it is much easier to learn and debug as I need to render JavaScript websites. When I first use selenium, it satisfies all my needs, crawling all the web-pages in required time frame. Then speed it up by using multi threading and everything goes really smooth. Yeah Really Smooth! But one day, one particular website block me by implementing I was really stuck but I was required to figure out a way to solve this problem. So, after I tried all the ways to solve the captcha, I think why not I use another framework to try and see whether it can bypass the captcha. Completely Automated Public Turing test to tell Computers and Humans Apart (Captcha). Bang my head and hope something magical come to my mind :( At last I found Scrapy framework and not only solve captcha problems but a start for me to learn a really powerful crawling framework! The learning curve for Scrapy is much steeper than Selenium but it definitely worth it base on the below: five points Write your crawler code in a much shorter Python script compare to selenium. Crawl a lot faster than selenium. If you are using scrapy-splash, there is a great terminal Splash render on localhost:8050 so that u can try your Lua script. Organize your crawler code in a really structured way so that you can attain maximum satisfaction :) Scrapy can scale well if you project need to crawl a lot of websites. Scrapy-Splash is definitely worth trying out to render heavy loaded Javascript websites but compare to Splash, Scrapy-Splash have much lesser resources compare to Scrapy. Here are some resources I find useful to learn Scrapy-Splash. https://splash.readthedocs.io/en/stable/faq.html (chinese website) https://www.cnblogs.com/shaosks/p/6950358.html (chinese website) https://juejin.im/post/5afe47b3f265da0b767db40e Here are some really useful resources to learn Scrapy. https://doc.scrapy.org/en/latest/intro/tutorial.html https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/ https://blog.michaelyin.info/scrapy-tutorial-5-how-create-simple-scrapy-spider/ https://python.gotrained.com/scrapy-tutorial-web-scraping-craigslist/ Here are some really useful resources to learn Selenium. https://selenium-python.readthedocs.io/ https://www.softwaretestingmaterial.com/selenium-tutorial/ https://www.tutorialspoint.com/selenium All resources are based on Python. Happy Learning! If you are interested to know more about tutorials for Scrapy-Splash, Scrapy or Selenium, feel free to comment below! Feel free to too:) reach out to me

Different

Mind

Who Has the Best Prices for Tech’s Top 100 Products of the Year? A Machine Learning Analysis.

Get more details for my web crawling service!

Scrapy or Selenium?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Who Has the Best Prices for Tech’s Top 100 Products of the Year? A Machine Learning Analysis.

3 Things I Wish I Knew During College Placements

7 Proven Practices to Boost Development Speed and Project Quality

7 Reasons Why Graduates Should Work for A Startup Instead Of A Corporate Job

Can Google's Chrome OS Flex Kill MacOS?

Charging Up Your Debugging Skills

Who Has the Best Prices for Tech’s Top 100 Products of the Year? A Machine Learning Analysis.

3 Things I Wish I Knew During College Placements

7 Proven Practices to Boost Development Speed and Project Quality

7 Reasons Why Graduates Should Work for A Startup Instead Of A Corporate Job

Can Google's Chrome OS Flex Kill MacOS?

Charging Up Your Debugging Skills

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps