How to run a Scrapp Scraper several times simultaneously on different input websites and write on different output files?


Does anyone know how I could run the same Scrapy scraper over 200 times on different websites, each with their respective output files? Usually in Scrapy, you indicate the output file when you run it from the command line by typing -o filename.json.

multiple ways:

  • Create a pipeline to drop the items with configurable parameters, like running scrapy crawl myspider -a output_filename=output_file.txt. output_filename is added as an argument to the spider, and now you can access it from a pipeline like:

    class MyPipeline(object):
        def process_item(self, item, spider):
            filename = spider.output_filename
            # now do your magic with filename
  • You can run scrapy within a python script, and then also do your things with the output items.