RocksDB: The Bedrock of Modern Stateful Applications

Apache Flink, Kafka Streams, and Apache Kvrocks use RocksDB as a persistent layer. Consider before embarking on the creation of your own data storage engine; use RocksDB.

Mark Andreev

May. 16, 24 · Review

Like (2)

Save

1.0K Views

RocksDB is an embedded database that manages (key, value) pairs in the disk by a single write process. Originally developed on top of LevelDB (by Google); Meta founded a database that in 10 years, is projected to become the default choice for storage in application mediums.

Adopters

Facebook

MyRocks is a MySQL storage engine that was built on top of RocksDB
MongoRocks is a Mondo DB storage engine that was built on top of RocksDB
Dragon is a distributed graph query engine
LogDevice is a distributed data store for logs

Apache Samza uses RocksDB for State Management in streaming aggregates

Yahoo

Sherpa is a distributed data store
CockroachDB is a database with PostgreSQL PostgreSQL-compatible interface that uses RocksDB as a storage engine
Apache Flink for state management in data streaming aggregates
Kafka Streams / KSQL for state management in data streaming aggregates
TiKV for the storage engine
Uber for task queue
Apache Doris for metadata management

You can other adopters on the RocksDB wiki page about users and use cases.

Usage Cases

RocksDB supports only one writer process (not thread) and multiple read-only focused instances. Secondary instances (not named as read-only instances) support read-only mode with dynamic catch-up updates from the primary replica that are triggered by the scheduler or some event. They can be run on different hosts which allows for it to distribute a read workload across nodes. But at the same time, secondary instances don't support snapshot-based read and tailing iterators.

The main operations that supported are:

Get and MultiGet
Put
Delete and DeleteRange
Write and WriteBatch
CompactRange and CompactFiles
Iterator over records

That makes this database key value-oriented. With different value type support including large files with BlobDB extension that allows to store effectively large files in separate data files (as LOBs in PostgreSQL).

Apache Kvrocks (compatible with Redis protocol) describes how they implemented Redis-related data structures on top of RocksDB here.

Apache Kafka Streams use RocksDB for state management. For example, KTable which represents a stream's snapshot uses KeyValueStore for data management that is implemented by RocksDBStore. But at the same time, KTable does not necessarily materialize as a local store because KTable may persist to a topic. This change was introduced by KAFKA-2856 (commit).

TiKV is a highly scalable and low latency key-value database that uses RocksDB for managing data in disk because RocksDB is mature and high-performance. They mention prefix bloom filter, event listener and ingest external file capabilities when describing the advantages of RocksDB. Other features such as data distribution across nodes, transactions, and leader elections are implemented by TiKV.

For a long time, CockroachDB used RocksDB as a local disk-backed KV store that provides efficient writes and range scans to enable performant SQL execution (SIGMOD 20). But at the same time, they rewrite this solution in Go because it allows them to tune better for their workload (Pebble). So they started with a boxed solution, and only then replaced it with their own.

Your Adoption

I recommend that you start with simple solutions like SQLite or H2 that support SQL and relations. It might be easier to start and support because the community has a lot of expertise.

When you face a performance barrier like a wall that is not breakable by a previous solution it might be time to use tools like RocksDB. It will require more investigation and deeper understanding because in some cases you might go to source code instead of StackOverflow.

But at the same time, I suggest you ignore your own data format before RocksDB adoption because it might be hard to overperform this solution and it might take less time to adopt RocksDB than to do it yourself.

Database RocksDB

Opinions expressed by DZone contributors are their own.

RocksDB: The Bedrock of Modern Stateful Applications

Apache Flink, Kafka Streams, and Apache Kvrocks use RocksDB as a persistent layer. Consider before embarking on the creation of your own data storage engine; use RocksDB.

Adopters

Facebook

LinkedIn

Yahoo

Usage Cases

Your Adoption

Partner Resources

Related

Trending

RocksDB: The Bedrock of Modern Stateful Applications

Apache Flink, Kafka Streams, and Apache Kvrocks use RocksDB as a persistent layer. Consider before embarking on the creation of your own data storage engine; use RocksDB.

Adopters

Facebook

LinkedIn

Yahoo

Usage Cases

Your Adoption

Related

Partner Resources