Webscraping by Example: An introduction to BeautifulSoup

11:15 AM - 12:10 PM on July 16, 2016, Room CR5

Stevie Slotterback

Audience level:


This is a basic tutorial on the various features of the popular html parser BeautifulSoup. In this tutorial, we will cover the basic functions and data structures that make up the BeautifulSoup package. We will utilize this knowledge as we automate some data extraction tasks on the Buildings Information System (BIS) published by the New York City Department of Buildings.


The BeautifulSoup Python package is a useful tool for automating extraction tasks for web-based data sources. One particular web-based data source, the Buildings Information System (BIS) from the New York City Department of Buildings, consistently serves up access to a rich data set with a straightforward format. In this tutorial, we will demonstrate how we can utilize the features of BeautifulSoup to automate data extraction from the BIS database.