We're All Database Engineers (WADE)

02:15 PM - 03:10 PM on July 17, 2016, Room CR6

Adrian Kramer

Audience level:
intermediate
Watch:
https://www.youtube.com/watch?v=8cR1lSdic0Q

Description

WADE is a distributed database framework providing strong consistency and high throughput using chain replication, in lieu of the traditional primary/backup model. Quite unlike traditional databases, WADE is a framework. Programmers implement the storage interface themselves and write custom query or update functions in Python that are executed by the database avoiding the dreaded read-write update cycle that degrades performance. Since the storage layer is implemented by the programmer it can be hand customized to the use case at hand using tools such as LevelDB, RocksDB, or whatever the programmer desires. However, WADE takes control at the networking layer handling replication, message forwarding amongst nodes, and fault tolerance in the event of failure so long as a few simple functions are implemented by the end user.

Abstract

At Chartbeat, we process 300,000 events per second, measure the attention of some 10 million concurrent users across our network, and aggregate statistics about 50 billion page views a month on many of the top news publishing sites in the world today. Finding a database to ingest and serve up the data is difficult with few providing us with a performant and cost effective way to store and query our data at scale. Additionally, the complexity of many of these databases makes it very difficult to reason about the performance and debug issues without outside support.

Many of you would call us crazy, but these problems led our CTO Wes Chow to think about writing our own database to handle the unique cases of our data. But WADE is not a database, it is a framework. It does not provide a storage layer, a query language, or even an interface to query or update it. Instead, the programmer implements the storage interface herself (or keep it in memory! it's up to you!) and provides a set of query and update functions that live on and are executed by nodes in the database. What WADE provides is the hard part of distributed databases: replication, consistency, horizontal scalability, fault tolerance, and recovery.

WADE is a developer’s best friend. Simplicity, correctness, and debuggability are first class concerns to an extent not found in commercial databases. Those principles dictate the design and selection of algorithms in a way that makes it comfortable for an application engineer to customize and debug the system in the field. WADE allows, scratch that, wants you to pop open the jalopy's hood and see what's going on as it flies over potholes down the freeway.

This talk covers a mix of real world use cases, the database theory and algorithms that underpin the system, and demonstrates a simple but fault tolerant and scalable in-memory key-value store.