This tutorial explains how to use cookies in Scrapy.
A Cookie is a small piece of data stored on the user’s computer by the web browser while browsing a website. Cookies were created in order to enhance the user experience by remember certain things about each User’s browsing activity.
Whenever you connect to a site you previously visited, using cookies it can try and identify what you might be looking for, and adjust the website accordingly. Common examples of cookies are the “recommended items” that you’ll often see on sites where you’re planning to buy something.
Since Cookies are a significant part of the web, and often important while accessing certain sites, Scrapy allows us to send cookies along with our requests as well. We’ll be exploring to how to accomplish this in this Scrapy tutorial.
Sending Cookies with Requests
Cookies are meant to be sent right in the start, hence they are sent with the Request that is made to the website to begin the connection.
There are many different ways in which we can attach cookies onto Scrapy requests. In this section we’ll explain several of these methods.
Method #1
request_with_cookies= Request(url="http://www.example.com",
cookies={'currency':'USD', 'country':'UY'})
This an example from the Scrapy docs, where a request object is created using a URL and cookies.
Cookies can contain multiple values and are stored in a dictionary using key value pairs. The key is the name of the cookie, and the value stores the value you wish to pass. You’ll need to investigate the target site in order to find out the names of the cookie’s you need.
Method #2
scrapy.Request(url=url, callback=callback, headers={'cookie': my_cookies})
Alternatively you can send the cookies that you want to through the Request headers. There are several different methods of passing headers in Scrapy, and you can use Cookies in any of those methods.
Method #3
def request(self, url, callback):
request = scrapy.Request(url=url, callback=callback)
request.cookies['cookie_name'] = value
return request
This function is called by default as it’s a wrapper for scrapy.request
. You don’t have to call it yourself. Another benefit of having this function around is that you can easily add other things to modify your requests like User agents.
Check which Cookies were sent/recieved
Just to be sure, it can payoff to actually check which cookies were sent in the request object. We can do this with the following code.
def parse(self,response):
cookies = response.request.headers.getlist("Set-Cookie")
If you want the cookies that the website returned to you, you can do the following:
def parse(self,response):
cookies = response.headers.getlist("Set-Cookie")
Scrapy Cookies Settings
You can enable the COOKIES_DEBUG
setting in order to see the back and forth transfer of cookies printed out on screen. Simply set this setting to True
in settings.py
file to begin.
COOKIES_ENABLED
is another setting that controls whether cookies will be sent to the web server or not. By default this setting is True
, however you can turn it off by setting it to False
if you wish.
To learn more about Scrapy and what it’s capable of, check out our Tutorial Series on Scrapy!
This marks the end of the Scrapy Cookies tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the article content can be asked in the comments section below.