DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Injecting Chaos: Easy Techniques for Simulating Network Issues in Redis Clusters
  • How To Manage Redis Cluster Topology With Command Line
  • How to Quickly Create and Easily Configure a Local Redis Cluster
  • Designing and Developing WebSphere Cells, Clusters, and Nodes

Trending

  • What Is Plagiarism? How to Avoid It and Cite Sources
  • Handling “Element Is Not Clickable at Point” Exception in Selenium
  • A Comprehensive Guide To Building and Managing a White-Label Platform
  • Microservices Design Patterns for Highly Resilient Architecture
  1. DZone
  2. Software Design and Architecture
  3. Microservices
  4. Techniques for Chaos Testing Your Redis Cluster

Techniques for Chaos Testing Your Redis Cluster

This article explores a few techniques to create chaos testing scenarios on a Redis cluster and uncover potential weaknesses in a controlled way.

By 
Rahul Chaturvedi user avatar
Rahul Chaturvedi
·
Jun. 06, 24 · Tutorial
Like (2)
Save
Tweet
Share
4.8K Views

Join the DZone community and get the full member experience.

Join For Free

For large-scale, distributed systems, chaos testing becomes an essential tool. It helps uncover potential failure points and strengthen overall system resilience. This article delves into practical and straightforward methods for injecting chaos into your Redis cluster, enabling you to proactively identify and address weaknesses before they cause real-world disruptions.

Set Up

  • Create a Redis cluster

You can follow this article to set up a Redis cluster locally before taking it to production 

  • Then generate a load on your Redis cluster. You can use memtier benchmark or any other framework to generate load on your Redis cluster. 
  • Inject the following chaos scenarios into your Redis cluster to test its performance and recovery. If the results do not meet your expectations, apply fixes and repeat the tests to ensure the solutions work, ultimately enhancing the reliability of your cluster.       

Let's explore a few techniques below to create chaos test scenarios.

Promote Replica to Primary (Failover)

Cluster Failover

Initiate this command on a replica to promote this replica as a primary and the original primary will become the replica.

Here’s What Happens Under the Hood

Once the command is invoked, the primary stops processing new requests. The replica initiates the failover process and replicates the data to match the primary's state. After this synchronization, along with updating necessary configurations and epochs, the replica begins serving as the new primary, while the original primary transitions to a replica role.

cluster failover

In the above screenshot, we can observe a Redis node with ID 2b570b9c76127bdf38955ea7181ff8f8bbe62cdf (port 30001) is a replica of node id equal to aa24dc9d601a2ae348e4902ed8b38a08f915f21c.

After invoking the command we can see in the screenshot below that this node (2b570b9c76127bdf38955ea7181ff8f8bbe62cdf (port 30001) has become the primary and original primary (with node id a24dc9d601a2ae348e4902ed8b38a08f915f21c) has become the replica. 

replica

In normal circumstances, clients connected to the cluster should not experience any issues, as replicas are typically very close to the primary node in the state. However, if you inject a failover scenario and observe issues like latency spikes or decreased throughput, it's crucial to investigate the root cause. This could indicate potential bottlenecks in your cluster that require further optimization.

Remove a Replica

In this scenario, we remove a replica node so that it is not available for any operation. Removal can be of two types namely: Soft removal and Hard removal. 

Soft (Temporary) Replica Removal

In this case, we just stop the replica node so it becomes unavailable but it is still a part of the cluster. So in other words, it is still a part of the cluster topology. 

We can use the following command to stop:

Shell
 
redis-cli -p <port> shutdown

shutdown

cluster nodes

As we can see from the above screenshot, the replica node is now in a “fail” state which indicates that this node is not available although it is still a part of cluster topology.

To start it back we can run the following command. 

Shell
 
redis-server --port <port>


Hard (Permanent) Replica Removal

In this case, the replica is removed from the cluster itself. Hence, calling it a hard removal. We can use the “CLUSTER FORGET <node_id>” command as shown below. This command will update the node table of the current node on which the command is run and remove the node_id supplied from its node table. To completely remove the node from the cluster we need to run this command on all the nodes of the cluster as shown below.

Shell
 
# Pseudo code 
for port in <list of ports>; do
  # Run the CLUSTER FORGET command for each node
  redis-cli -p $port CLUSTER FORGET <node_id_of_the_node_to_be_removed>
done


Remove a Primary

Following the same steps as above to remove a replica, we can also remove a primary node. This can be done through soft removal (where the node is marked as failed but remains part of the cluster topology) or hard removal (where the node is completely removed from the cluster and its topology) as stated above.

The key difference is that this removal will trigger a replica to take over as the new primary.

Special Chaos Scenario When Both Replica and Primary Are Removed

This is a special chaos scenario designed to test the reliability of your system and the behavior of different clients when both the replica and primary are removed. You can follow these steps to create this scenario.

  • Update the redis.conf file so that the cluster is available when part of the key slots are not covered. For that update the following config as “no” in the redis.conf.

Shell
 
Cluster-require-full-coverage no


Remove the replica using CLUSTER FORGET command as mentioned above, so that it is removed from the cluster topology. 

  • Stop the primary node using the following command to keep it in the cluster topology with a "fail" status. This will cause clients to continue sending requests to the node, providing an opportunity to test cluster stability and observe client behavior based on their versions in this chaos test scenario.

Shell
 
redis-cli -p <port> shutdown


Conclusion

We have explored a few straightforward techniques to create chaos scenarios on Redis backed for testing cluster stability and client behavior in those situations. However, please exercise caution, as these operations and commands are risky. Only perform them in test environments, ensure safeguards are in place, and execute them in a controlled manner.

References

Create and Configure a Local Redis Cluster

Redis Documents

Memtier Benchmark

Chaos cluster Redis (company) Testing

Opinions expressed by DZone contributors are their own.

Related

  • Injecting Chaos: Easy Techniques for Simulating Network Issues in Redis Clusters
  • How To Manage Redis Cluster Topology With Command Line
  • How to Quickly Create and Easily Configure a Local Redis Cluster
  • Designing and Developing WebSphere Cells, Clusters, and Nodes

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: