Scrapy can be a little daunting to beginners as it is more a framework rather than a library. This means you need to learn how to setup projects, learn scrapy commands, maintain a settings file and so on. In this tutorial, we will be teaching you the simple task of how to setup your first Scrapy Project using the “startproject” command.
How to start a Project in Scrapy
To begin using Scrapy, we need to setup a “project”. To do this we can use the startproject
command, which automatically creates a project folder for us with all the basic required python files. You will understand the purpose of these files as you learn more about Scrapy.
First open up the terminal, and navigate to the directory where you want to create your Scrapy Project.
D:\VSCode_Programs\Scrapy>
I will be creating the Project Folder in my D-drive, in a folder called “Scrapy”.
Now all you have to do is call the following command:
D:\VSCode_Programs\Scrapy> scrapy startproject project1
The name of our Scrapy project will be project1
, however you may name it whatever you wish.
To check if this command executed successfully, open up the project1
folder and make sure you have these files:
In order to begin writing Scrapy code, we need to create a spider. To do this, we will create a new file inside the “spiders
” folder, e.g: myspider.py
. Most of our Scrapy code and logic will be written inside this file.
Next, copy paste the following code into the file, and now you have a basic spider setup, for the website “https://example.com”. However, since the parse function is empty, it doesn’t actually do anything right now.
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
allowed_domains = ['example.com']
start_urls = ['https://example.com/']
def parse(self, response):
# Your code here
Interested in learning more about the Scrapy and the amazing things you can do with it? Follow our tutorial series on Web Scraping with Scrapy for more!
This marks the end of the Scrapy “startproject” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.