Push Data, Pull Data, Present Data

10:00 AM - 10:25 AM on July 16, 2016, Room CR4

Adrian Cruz

Audience level:
intermediate

Description

In this day and age, data plays such an important role in so much of what we want to do. Whether it be an analysis of visitor traffic, monitoring data for fraud transactions, or making a business decision of whether a certain color would be accepted with your users, the data is your best friend in helping you find your way.

Abstract

It's morning, you settle in, check your dashboards and it looks like there is an increase of load coming through on some of your web server logs. What happened? You're about to deploy code that will hopefully fix some issues; how will you know that things worked well? The design team is thinking about changing some of the site icons; do your users like seeing big icons or small icons on your site? These are all scenarios that are all too common and the one thing that helps you answer these is your data.

Pushing data is typically easy. If you're tracking tracking events on a website, you'll probably want to know a lot about click tracking, URL referrals, and user sessions. If you're curious about the number of downloads your users go through per day, you'll probably have some data that you can aggregate a sum. Your data can be small or large or anything in between, but making it available is the most important piece that you'll need to have.

Pulling data can be a bit more complex. Do you have a small amount of data that you're just pulling from a relational database? Or are you processing data through Hadoop or Spark? Data is what you want; how you pull it is dependent on your architecture needs.

Presenting data is a simple task, but are you presenting the correct story? Whether you are presenting your web traffic or your user behavior data, you'll want to present your data that tells the story you want to tell in the best way.

Push data, pull data, present data; these are your main tasks in your typical cycle of product development and analysis. We built out a fairly quick data pipeline using Airflow, a workflow framework made by Airbnb. We push a lot of data so we can make good data-driven business decisions. Pulling data and presenting them have gone hand-in-hand for us. We have utilized Google's BigQuery in order for us to have a fast, columnar data store in order for us to build out dashboards to visualize our data. This will shed light into what a typical push-pull-present cycle looks like and will be exemplified with real-world examples.