The Talent500 Blog

How Airbnb Developed a Key-Value Store for Managing Petabytes of Data

Airbnb’s operations hinge not only on raw data but also on derived data, which is essential for personalizing user experiences. Derived data refers to information generated from extensive offline datasets processed by tools like Apache Spark or real-time event streams from systems such as Apache Kafka. This type of data plays a critical role in tailoring services according to user activity and preferences.

However, the efficient access and management of this data present unique challenges. The underlying system must be exceptionally reliable to guarantee uninterrupted service. It should also provide high availability, allowing data access without delays, and handle scalability to accommodate the increasing data demands of a platform as expansive as Airbnb. Additionally, low latency is vital since users expect immediate responses without lag.

To address these challenges, Airbnb developed Mussel, a key-value store specifically designed to ensure timely retrieval of the necessary data. This article delves into the architecture of Mussel and explores how Airbnb engineered this key-value store to manage petabytes of data effectively.

Evolution of Derived Data Storage at Airbnb

Mussel was not the first solution employed by Airbnb for storing derived data; rather, it represents the culmination of several earlier attempts. Below is an overview of the key stages in this evolutionary process:

Stage 1: Unified Read-Only Key-Value Store

Initially, Airbnb faced numerous technical hurdles in managing derived data effectively. Existing tools such as MySQL, HBase, and RocksDB fell short in meeting critical requirements including:

In response to these challenges, the engineering team created HFileService in 2015. This custom solution utilized HFile, a foundational component for HBase based on Google’s SSTable technology.

The architecture operated as follows:

Despite resolving several issues, HFileService had notable limitations:

Stage 2: Real-Time and Derived Data Store (Nebula)

In its second phase, Airbnb introduced Nebula to bridge the gap between batch-processed and real-time data access.

Nebula incorporated several enhancements over HFileService:

However, Nebula also faced challenges:

Mussel Architecture

In 2018, Airbnb’s engineering team developed Mussel to overcome the limitations encountered with previous systems. Its architecture was specifically designed for enhanced scalability and performance.

Key Features of Mussel’s Architecture

  1. Partition Management with Apache Helix: Mussel increased shard numbers from 8 to 1024 to accommodate growing data needs. Apache Helix automated shard management, dynamically balancing server loads without requiring manual intervention.
  2. Leaderless Replication with Kafka: Utilizing Kafka as a write-ahead log ensured consistent recording and replication of updates across shards while allowing any node holding a shard replica to handle read requests.
  3. Unified Storage Engine with HRegion: Mussel replaced DynamoDB by extending HFileService to manage both real-time and batch data within a unified framework using HRegion from HBase. This facilitated advanced query support and efficient organization of data through LSM Trees and MemStore.
  4. Bulk Load Support: Two types of bulk load pipelines were established from the data warehouse to Mussel via Airflow jobs—Merge Type and Replace Type—to optimize the loading process by only importing incremental changes rather than reloading entire datasets daily.

Adoption and Performance of Mussel

Mussel has become integral to Airbnb’s infrastructure, supporting various services reliant on key-value storage with impressive performance metrics:

Conclusion

Mussel exemplifies the evolution towards a robust key-value store capable of addressing significant challenges such as scalability and low latency. Its impressive performance metrics highlight its critical role in enabling high-performance data-driven services at Airbnb. Looking ahead, the engineering team remains committed to enhancing Mussel further to support advanced use cases like read-after-write consistency and auto-scaling capabilities.

Read more such articles from our Newsletter here.

0