Dplython: Intuitive Data Analysis, Funky Python

01:45 PM - 02:10 PM on July 16, 2016, Room CR6

Chris Riederer

Audience level:
intermediate
Watch:
https://www.youtube.com/watch?v=4YAcwCe1mAE

Description

The goal of data analysis is to write code that manipulates data to give us answers. Ideally, we could translate our questions into code as quickly as we could think! Dplython is an open source Python library (inspired by R's "dplyr") that improves productivity by constraining analysis to a core set of the most common data manipulation operations. By mapping the way we think about typical tasks to functions, dplython moves data analysis closer to "speed-of-thought." In this talk, I'll describe the core ideas behind dplython, present a tutorial on using it for data analysis, and give a technical peek at Pythonic lazy evaluations created with operator overloading.

Abstract

Dplython is a new way of writing data analysis (based on R's "dplyr") on top of Python's Pandas library. The goal of dplython is to make data analysis more natural by mapping common data manipulation tasks into functions, which can be easily composed to give any result. In dplython, users express data analysis with a small set of typical data manipulation operations. Data gets piped through these operations, resulting in a natural syntax that is easy to both write and read. Dplython relies on writing operations that will take place at a point in the future, which necessitates the creation of an object that represents stored computation-- a cool lazy evaluation object that will interest intermediate and advanced Python users. In this talk, I'll describe the core ideas behind dplython, present a tutorial on using dplython for data analysis, and give a technical peek at Pythonic lazy evaluations created with operator overloading.