Scraping the web

Links to the notes for my web scraping workshop, in html and ipython notebook formats.

  1. Preparation (notebook | html)
  2. HTML (notebook | html)
  3. Scrapy (notebook | html)
  4. Scraping (notebook | html)
  5. Recursion (notebook | html)

If you simply want to read along with the instructions, you can use the html format files. However, especially once you start writing your own web scrapers, it will be very useful if you can interactively test and play around with python code statements. To do so, download the notebook version (save to disk); then launch Jupyter/IPython, and open the saved file from within there. This will attach the notebook to a 'kernel' running python, which allows you to test out code snippets.

Note: any references to files you make in these code snippets will use as their starting point the directory in which you have downloaded the notebook.