Make data cleansing fun again with Pandas

04:00 PM - 04:25 PM on July 16, 2016, Room CR6

Joe Hooper

Audience level:


This talk will provide a fast-paced introduction to cleansing text data with Python. We'll cover specific Pandas tools as well as strategies for managing the cleansing processes. Topics include: slicing/splitting/joining, transforming text, working with missing values, supplementing with additional sources, and dealing with duplicates.


Cleaning data isn't everyone's favorite task, but Python can ease the pain. Whether you are preparing data to migrate into a database driven application or for analysis and visualization, a strong data cleansing toolkit will save you time and improve the quality of your work. With the tools and strategies covered in this talk, you'll be a data cleansing powerhouse in no time.

Planned topics include: Making effective use of Pandas features to slice, split, and join data to support data cleansing Text value transformations using conditional logic, mappings, and regular expressions Filling and supplementing missing values Merging data sources and dealing with unmatched rows Dealing with duplicates: criteria-based deduplication, merging duplicates, and handling conflicts