Logging

Note

scrapy.log has been deprecated alongside its functions in favor ofexplicit calls to the Python standard logging. Keep reading to learn moreabout the new logging system.

Scrapy uses Python’s builtin logging system for event logging. We’llprovide some simple examples to get you started, but for more advanceduse-cases it’s strongly suggested to read thoroughly its documentation.

Logging works out of the box, and can be configured to some extent with theScrapy settings listed in Logging settings.

Scrapy calls scrapy.utils.log.configure_logging() to set some reasonabledefaults and handle those settings in Logging settings whenrunning commands, so it’s recommended to manually call it if you’re runningScrapy from scripts as described in Run Scrapy from a script.

Log levels

Python’s builtin logging defines 5 different levels to indicate the severity of agiven log message. Here are the standard ones, listed in decreasing order:

  • logging.CRITICAL - for critical errors (highest severity)
  • logging.ERROR - for regular errors
  • logging.WARNING - for warning messages
  • logging.INFO - for informational messages
  • logging.DEBUG - for debugging messages (lowest severity)

How to log messages

Here’s a quick example of how to log a message using the logging.WARNINGlevel:

  1. import logging
  2. logging.warning("This is a warning")

There are shortcuts for issuing log messages on any of the standard 5 levels,and there’s also a general logging.log method which takes a given level asargument. If needed, the last example could be rewritten as:

  1. import logging
  2. logging.log(logging.WARNING, "This is a warning")

On top of that, you can create different “loggers” to encapsulate messages. (Forexample, a common practice is to create different loggers for every module).These loggers can be configured independently, and they allow hierarchicalconstructions.

The previous examples use the root logger behind the scenes, which is a top levellogger where all messages are propagated to (unless otherwise specified). Usinglogging helpers is merely a shortcut for getting the root loggerexplicitly, so this is also an equivalent of the last snippets:

  1. import logging
  2. logger = logging.getLogger()
  3. logger.warning("This is a warning")

You can use a different logger just by getting its name with thelogging.getLogger function:

  1. import logging
  2. logger = logging.getLogger('mycustomlogger')
  3. logger.warning("This is a warning")

Finally, you can ensure having a custom logger for any module you’re working onby using the name variable, which is populated with current module’spath:

  1. import logging
  2. logger = logging.getLogger(__name__)
  3. logger.warning("This is a warning")

See also

  • Module logging, HowTo
  • Basic Logging Tutorial
  • Module logging, Loggers
  • Further documentation on loggers

Logging from Spiders

Scrapy provides a logger within each Spiderinstance, which can be accessed and used like this:

  1. import scrapy
  2.  
  3. class MySpider(scrapy.Spider):
  4.  
  5. name = 'myspider'
  6. start_urls = ['https://scrapinghub.com']
  7.  
  8. def parse(self, response):
  9. self.logger.info('Parse function called on %s', response.url)

That logger is created using the Spider’s name, but you can use any customPython logger you want. For example:

  1. import logging
  2. import scrapy
  3.  
  4. logger = logging.getLogger('mycustomlogger')
  5.  
  6. class MySpider(scrapy.Spider):
  7.  
  8. name = 'myspider'
  9. start_urls = ['https://scrapinghub.com']
  10.  
  11. def parse(self, response):
  12. logger.info('Parse function called on %s', response.url)

Logging configuration

Loggers on their own don’t manage how messages sent through them are displayed.For this task, different “handlers” can be attached to any logger instance andthey will redirect those messages to appropriate destinations, such as thestandard output, files, emails, etc.

By default, Scrapy sets and configures a handler for the root logger, based onthe settings below.

Logging settings

These settings can be used to configure the logging:

The first couple of settings define a destination for log messages. IfLOG_FILE is set, messages sent through the root logger will beredirected to a file named LOG_FILE with encodingLOG_ENCODING. If unset and LOG_ENABLED is True, logmessages will be displayed on the standard error. Lastly, ifLOG_ENABLED is False, there won’t be any visible log output.

LOG_LEVEL determines the minimum level of severity to display, thosemessages with lower severity will be filtered out. It ranges through thepossible levels listed in Log levels.

LOG_FORMAT and LOG_DATEFORMAT specify formatting stringsused as layouts for all messages. Those strings can contain any placeholderslisted in logging’s logrecord attributes docs anddatetime’s strftime and strptime directivesrespectively.

If LOG_SHORT_NAMES is set, then the logs will not display the Scrapycomponent that prints the log. It is unset by default, hence logs contain theScrapy component responsible for that log output.

Command-line options

There are command-line arguments, available for all commands, that you can useto override some of the Scrapy settings regarding logging.

See also

Custom Log Formats

A custom log format can be set for different actions by extendingLogFormatter class and makingLOG_FORMATTER point to your new class.

  • class scrapy.logformatter.LogFormatter[source]
  • Class for generating log messages for different actions.

All methods must return a dictionary listing the parameters level, msgand args which are going to be used for constructing the log message whencalling logging.log.

Dictionary keys for the method outputs:

  • level is the log level for that action, you can use those from thepython logging library :logging.DEBUG, logging.INFO, logging.WARNING, logging.ERRORand logging.CRITICAL.
  • msg should be a string that can contain different formatting placeholders.This string, formatted with the provided args, is going to be the long messagefor that action.
  • args should be a tuple or dict with the formatting placeholders for msg.The final log message is computed as msg % args.Users can define their own LogFormatter class if they want to customize howeach action is logged or if they want to omit it entirely. In order to omitlogging an action the method must return None.

Here is an example on how to create a custom log formatter to lower the severity level ofthe log message when an item is dropped from the pipeline:

  1. class PoliteLogFormatter(logformatter.LogFormatter):
  2. def dropped(self, item, exception, response, spider):
  3. return {
  4. 'level': logging.INFO, # lowering the level from logging.WARNING
  5. 'msg': u"Dropped: %(exception)s" + os.linesep + "%(item)s",
  6. 'args': {
  7. 'exception': exception,
  8. 'item': item,
  9. }
  10. }
  • crawled(request, response, spider)[source]
  • Logs a message when the crawler finds a webpage.

  • downloaderror(_failure, request, spider, errmsg=None)[source]

  • Logs a download error message from a spider (typically coming fromthe engine).

New in version 2.0.

  • dropped(item, exception, response, spider)[source]
  • Logs a message when an item is dropped while it is passing through the item pipeline.

  • itemerror(_item, exception, response, spider)[source]

  • Logs a message when an item causes an error while it is passingthrough the item pipeline.

New in version 2.0.

  • scraped(item, response, spider)[source]
  • Logs a message when an item is scraped by a spider.

  • spidererror(_failure, request, response, spider)[source]

  • Logs an error message from a spider.

New in version 2.0.

Advanced customization

Because Scrapy uses stdlib logging module, you can customize logging usingall features of stdlib logging.

For example, let’s say you’re scraping a website which returns manyHTTP 404 and 500 responses, and you want to hide all messages like this:

  1. 2016-12-16 22:00:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring
  2. response <500 http://quotes.toscrape.com/page/1-34/>: HTTP status code
  3. is not handled or not allowed

The first thing to note is a logger name - it is in brackets:[scrapy.spidermiddlewares.httperror]. If you get just [scrapy] thenLOG_SHORT_NAMES is likely set to True; set it to False and re-runthe crawl.

Next, we can see that the message has INFO level. To hide itwe should set logging level for scrapy.spidermiddlewares.httperrorhigher than INFO; next level after INFO is WARNING. It could be donee.g. in the spider’s init method:

  1. import logging
  2. import scrapy
  3.  
  4.  
  5. class MySpider(scrapy.Spider):
  6. # ...
  7. def __init__(self, *args, **kwargs):
  8. logger = logging.getLogger('scrapy.spidermiddlewares.httperror')
  9. logger.setLevel(logging.WARNING)
  10. super().__init__(*args, **kwargs)

If you run this spider again then INFO messages fromscrapy.spidermiddlewares.httperror logger will be gone.

scrapy.utils.log module

  • scrapy.utils.log.configurelogging(_settings=None, install_root_handler=True)[source]
  • Initialize logging defaults for Scrapy.

Parameters:

  • settings (dict, Settings object or None) – settings used to create and configure a handler for theroot logger (default: None).
  • install_root_handler (bool) – whether to install root logging handler(default: True)

This function does:

  • Route warnings and twisted logging through Python standard logging
  • Assign DEBUG and ERROR level to Scrapy and Twisted loggers respectively
  • Route stdout to log if LOG_STDOUT setting is TrueWhen install_root_handler is True (default), this function alsocreates a handler for the root logger according to given settings(see Logging settings). You can override default optionsusing settings argument. When settings is empty or None, defaultsare used.

configure_logging is automatically called when using Scrapy commandsor CrawlerProcess, but needs to be called explicitlywhen running custom scripts using CrawlerRunner.In that case, its usage is not required but it’s recommended.

Another option when running custom scripts is to manually configure the logging.To do this you can use logging.basicConfig() to set a basic root handler.

Note that CrawlerProcess automatically calls configure_logging,so it is recommended to only use logging.basicConfig() together withCrawlerRunner.

This is an example on how to redirect INFO or higher messages to a file:

  1. import logging
  2. from scrapy.utils.log import configure_logging
  3.  
  4. logging.basicConfig(
  5. filename='log.txt',
  6. format='%(levelname)s: %(message)s',
  7. level=logging.INFO
  8. )

Refer to Run Scrapy from a script for more details about using Scrapy thisway.