This tutorial explains how to use the logging feature in Scrapy
Scrapy is a complex library with a lot of different simultaneous requests and events being executed every second. In such a program it becomes hard to debug any error that may occur since locating the error is such a complex program isn’t easy.
Luckily, Python offers us the logging library, which we can use here with Scrapy to help us log the system events as they occur. Once everything is logged, it’s easy to go through it and locate error (in case of a problem).
Logging Levels
- Debug : These are used to give information regarding the program while it’s running. Useful for debugging in the case of any error, or unexpected outcome.
- Info : These are used to Confirm that things are working as expected
- Warning : These are used to indicate that something unexpected has happened, or an error is likely to be produced due to certain events.
- Error : Used to indicate a problem in the code. Usually thrown when the program has become unable to carry out some commands.
- Critical : Highest level error that can occur. Usually severe enough to prevent the program from continuing any further. Terminates the program.
Logging Error Messages
The first thing we need to do is to import the logging library. Next we have to create an actual logger through which we’ll be using all the logging methods and functions.
import logging
logger = logging.getLogger('mylogger')
In the above example we’ve done the first two steps. We can give the logger it’s own name by passing a suitable string into the getLogger()
function. You can also choose to leave it blank.
In the below example we’ll be raising error messages of varying severity.
import logging
logger = logging.getLogger('mylogger')
# Info
logger.info("This is some random information")
# Warning
logger.info("This is a warning")
# Critical
logger.critical("This is a critical error")
Scrapy Logging Example
Below is a very simplistic example of how one would integrate the logging library with scrapy.
Once you’ve created the logger, you can begin calling the logging function anywhere within the scrapy code.
import logging
import scrapy
logger = logging.getLogger('my_logger')
class ScrapySpider(scrapy.Spider):
name = 'spider'
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
logger.info('Parse Function called on %s', response.url)
The above code will log into a file, every URL that it scans and extracts from the quotes.toscrape
web page.
Logging Settings
LOG_ENABLED
takes a Boolean value of True of False to determine whether logging should be enabled or not.
LOG_FILE
takes a file path as it’s value, and in stores all the logging messages within that file.
LOG_FORMAT
is a setting that helps you choose what data you want to be printed out in the log message. By default, it’s '%(asctime)s [%(name)s] %(levelname)s: %(message)s'
, which prints out the time, name, level and the content of the log message.
LOG_LEVEL
determines the minimum level of the error message which should be logged. Default value is DEBUG, which means any level of message will be logged. (Level increases from DEBUG to CRTICIAL)
LOG_DATEFORMAT
determines the way the date is printed out in the log. The default format is '%Y-%m-%d %H:%M:%S'
.
This marks the end of the Scrapy Logging tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the article content can be asked in the comments section.