This installation guide explains how to install and setup scrapy.
Scrapy is a Python Web scraping library, used to download and extract data off the internet. Scrapy is more than just a regular web scraper though. It also doubles as a Web crawler which can follow links, like a search engine.
The best part is that it’s an all in one library and does not require other libraries like some other web scrapers do.
The downside to all this is that Scrapy is pretty complex in both it’s learning and it’s installation and setup. In this tutorial we’ll explain how to install Scrapy properly on a variety of different IDE’s and Platforms.
Virtual Environment
Scrapy is a large package that comes with alot of different dependencies. Due to the different versions and dependencies, they may clash and cause problems in your Python installation.
Due to these reasons, it’s recommended that you create a Virtual Environment where you will download and install Scrapy. You can find many tutorials on how to do so online.
Whether you do so or not will not really effect anything else or the process through which Scrapy is used by us, the programmer. If you are using it though, remember to use one of the below methods only after opening up the virtual environment.
Scrapy Dependencies
Being the large library that it is, Scrapy relies on several other libraries in order to function correctly. You may have to install these Scrapy dependencies separately, depending on your platform.
- lxml
- parsel
- w3lib
- twisted
- cryptography
You may not notice some of these issues immediately, depending on which features of Scrapy you’re using. If you notice any missing module errors, you can use pip
to install both cryptography
and lxml
.
Command Prompt
The most simplest method that most Python programmers use to install libraries, the command prompt. The exact command can vary a bit from platform from platform, but if you’re using pip
, use the following command.
pip install scrapy
You can also use pip to install other Scrapy dependencies, if required. Installing Scrapy directly like this might not work for everyone.
PyCharm
If you’ve been using PyCharm for a while, you’ll know that PyCharm has it’s own user friendly way of downloading and installing Python libraries. You can use this method for Scrapy too!
The steps are detailed down below for those who don’t know how.
- Go to the File drop down menu and click on the Settings option.
- This should open the Settings window. Navigate to and open the drop down menu for your Python project.
- Click on Project Interpreter. You should now be looking at the same window as shown in the image below.
4. You can use the +
and -
near top right hand corner to add or remove libraries. Clicking the +
button will lead you to a new window where you can search for the library you want (Scrapy) and add it into PyCharm.
If you look closely at the image above, you can see we have Scrapy installed.
Anaconda
You can install Scrapy on the IDE Anaconda using the conda-forge channel which will provide you with the latest updates for all dependencies on all platforms.
conda install -c conda-forge scrapy
Ubuntu
Scrapy is well supported on Ubuntu systems 14.04 and above. The first thing you need to do is install the following dependencies.
sudo apt-get install python3 python3-dev python3-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
Once you’ve done that, you can install scrapy using the command prompt method we mentioned earlier (pip
).
Now that you’re done here, you can head back to our Scrapy Tutorial Series, where we’ll teach you it’s various uses and strengths.
Other Helpful Resources for Scrapy
If you’re still having some trouble, take a look at the official installation guide by the Scrapy team and see if it can resolve your issue.
If you are interested in hearing about an Alternative to Scrapy, refer to this comparison article on Selenium versus Scrapy.
This marks the end of the How to install Scrapy guide. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the article content can be asked in the comments section below.