Simple Serverless ETLs in AWS

02:15 PM - 03:10 PM on July 17, 2016, Room CR4

Ryan Tuck

Audience level:
intermediate

Description

A brief introduction to leveraging AWS services to design simple and powerful python-based ETLs without ever SSHing onto a server. Learn about running code in response to events with Lambda, and configuring a cron, notifications, metrics collection, and database both in-browser and with boto3, AWS's python SDK.

Abstract

The rise of cloud computing platforms such as AWS has enabled developers to build awesome tools and rich applications with little-to-no DevOps experience. This opens up new possibilities for data engineers, data scientists, and developers at large to focus more on writing code and less on managing servers. This talk will give a simple overview of how to configure python-based ETLs around AWS Lambda using boto3, and highlight the benefits of going serverless.

The end result of this talk will be a simple ETL that retrieves records from an API and inserts them into a database. We'll set the job to repeat at a given interval, enable email notifications, view metrics like invocation and error counts, and expose an API endpoint for running the job on-demand. We'll show how to do this all in the browser via the AWS console as well as configuring deployment using AWS's python SDK, boto3.

As a data engineer who learned to love tools like ansible and nginx after being inadvertently thrown into the world of DevOps, I've come to appreciate how simple (and affordable) infrastructure becomes upon leveraging AWS's services. I believe cloud computing services will continue to lower the barrier to entry for developers who want to leverage the power of the cloud without requiring a traditional DevOps skillset, and provide a proof of concept that it can be done.