DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Contexts in Go: A Comprehensive Guide
  • Essential Monitoring Tools, Troubleshooting Techniques, and Best Practices for Atlassian Tools Administrators
  • A Comprehensive Guide To Building and Managing a White-Label Platform
  • Mastering System Design: A Comprehensive Guide to System Scaling for Millions, Part 2

Trending

  • 7 Linux Commands and Tips to Improve Productivity
  • Machine Learning With Python: Data Preprocessing Techniques
  • Setting up CI/CD Pipelines: A Step-By-Step Guide
  • How to Configure Custom Metrics in AWS Elastic Beanstalk Using Memory Metrics Example
  1. DZone
  2. Data Engineering
  3. Databases
  4. RocksDB: The Bedrock of Modern Stateful Applications

RocksDB: The Bedrock of Modern Stateful Applications

Apache Flink, Kafka Streams, and Apache Kvrocks use RocksDB as a persistent layer. Consider before embarking on the creation of your own data storage engine; use RocksDB.

By 
Mark Andreev user avatar
Mark Andreev
·
May. 16, 24 · Review
Like (2)
Save
Tweet
Share
1.0K Views

Join the DZone community and get the full member experience.

Join For Free

RocksDB is an embedded database that manages (key, value) pairs in the disk by a single write process. Originally developed on top of LevelDB (by Google); Meta founded a database that in 10 years, is projected to become the default choice for storage in application mediums.

Adopters

Facebook

  • MyRocks is a MySQL storage engine that was built on top of RocksDB
  • MongoRocks is a Mondo DB storage engine that was built on top of RocksDB
  • Dragon is a distributed graph query engine
  • LogDevice is a distributed data store for logs

LinkedIn

  • Apache Samza uses RocksDB for State Management in streaming aggregates

Yahoo

  • Sherpa is a distributed data store
  • CockroachDB is a database with PostgreSQL PostgreSQL-compatible interface that uses RocksDB as a storage engine
  • Apache Flink for state management in data streaming aggregates
  • Kafka Streams / KSQL for state management in data streaming aggregates
  • TiKV for the storage engine
  • Uber for task queue
  • Apache Doris for metadata management

You can other adopters on the RocksDB wiki page about users and use cases.

Usage Cases

RocksDB supports only one writer process (not thread) and multiple read-only focused instances. Secondary instances (not named as read-only instances) support read-only mode with dynamic catch-up updates from the primary replica that are triggered by the scheduler or some event. They can be run on different hosts which allows for it to distribute a read workload across nodes. But at the same time, secondary instances don't support snapshot-based read and tailing iterators.

The main operations that supported are:

  • Get and MultiGet
  • Put
  • Delete and DeleteRange
  • Write and WriteBatch
  • CompactRange and CompactFiles
  • Iterator over records

That makes this database key value-oriented. With different value type support including large files with BlobDB extension that allows to store effectively large files in separate data files (as LOBs in PostgreSQL). 

Apache Kvrocks (compatible with Redis protocol) describes how they implemented Redis-related data structures on top of RocksDB here. 

Apache Kafka Streams use RocksDB for state management. For example, KTable which represents a stream's snapshot uses KeyValueStore for data management that is implemented by RocksDBStore. But at the same time, KTable does not necessarily materialize as a local store because KTable may persist to a topic. This change was introduced by KAFKA-2856 (commit).

TiKV is a highly scalable and low latency key-value database that uses RocksDB for managing data in disk because RocksDB is mature and high-performance. They mention prefix bloom filter, event listener and ingest external file capabilities when describing the advantages of RocksDB. Other features such as data distribution across nodes, transactions, and leader elections are implemented by TiKV. 

For a long time, CockroachDB used RocksDB as a local disk-backed KV store that provides efficient writes and range scans to enable performant SQL execution (SIGMOD 20). But at the same time, they rewrite this solution in Go because it allows them to tune better for their workload (Pebble). So they started with a boxed solution, and only then replaced it with their own.

Your Adoption

I recommend that you start with simple solutions like SQLite or H2 that support SQL and relations. It might be easier to start and support because the community has a lot of expertise. 

When you face a performance barrier like a wall that is not breakable by a previous solution it might be time to use tools like RocksDB. It will require more investigation and deeper understanding because in some cases you might go to source code instead of StackOverflow. 

But at the same time, I suggest you ignore your own data format before RocksDB adoption because it might be hard to overperform this solution and it might take less time to adopt RocksDB than to do it yourself.

Database RocksDB

Opinions expressed by DZone contributors are their own.

Related

  • Contexts in Go: A Comprehensive Guide
  • Essential Monitoring Tools, Troubleshooting Techniques, and Best Practices for Atlassian Tools Administrators
  • A Comprehensive Guide To Building and Managing a White-Label Platform
  • Mastering System Design: A Comprehensive Guide to System Scaling for Millions, Part 2

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: