Webscraping by Example: An introduction to BeautifulSoup

11:15 AM - 12:10 PM on July 16, 2016, Room CR5

Stevie Slotterback

Audience level:
novice
Watch:
https://www.youtube.com/watch?v=5U702pICY8k

Description

This is a basic tutorial on the various features of the popular html parser BeautifulSoup. In this tutorial, we will cover the basic functions and data structures that make up the BeautifulSoup package. We will utilize this knowledge as we automate some data extraction tasks on the Buildings Information System (BIS) published by the New York City Department of Buildings.

Abstract

The BeautifulSoup Python package is a useful tool for automating extraction tasks for web-based data sources. One particular web-based data source, the Buildings Information System (BIS) from the New York City Department of Buildings, consistently serves up access to a rich data set with a straightforward format. In this tutorial, we will demonstrate how we can utilize the features of BeautifulSoup to automate data extraction from the BIS database.