Welcome!

Java Authors: Maureen O'Gara, Bruce Armstrong, Liz McMillan, Walter H. Pinson, III, Yakov Werde

Related Topics: Java

Java: Article

Slow Receivers in a Distributed Management System

Slow receivers explained

The system can quarantine the slow receiver and isolate the rest of the system from the ill effects of the slow receiver. The senders could consider store and forward models for updates to that slow receiver. Applying interleaved updates from multiple publishers would become an issue in a system where all publishers are equal peers. In a single publisher system for a given piece of information, this would work well. Another option is to have the notion of data ownership. This lets the slow receiver apply updates from the owner of the data without worrying about updates from other nodes.

A less desirable option is for the system to do nothing and run at the speed of the slow receiver. If the problem is temporary, the slow receiver comes out of that mode and performance of the system improves.

So the options for dealing with slow receivers come down to the following:

  • Quarantine the slow receiver until it recovers. Store and forward messages to disk-based mechanisms and let the slow receiver continue.
  • The slow receiver drops messages, catches up, and fires appropriate notifications to connected applications and clients.
  • Alert the system administrator about the slow receiver so remedial action can be initiated.
  • Drop messages to slow receivers and let them continue and alert system administrators
Slow Receiver Support in an Enterprise Data Fabric (EDF)
Above we discussed scenarios that cause a distributed data management system. An Enterprise Data Fabric (EDF) provides mechanisms to detect slow receivers in a distributed system by collecting stats on network activities in the system. Being an active data management platform the EDF can be configured to make decisions on slow receivers in real-time. These decisions can be based on the applications sharing data in the data fabric, and the need for data consistency across multiple applications. It can also be based on roles played by different applications in the data fabric and the criticality of getting data to the applications in the event of slow receiver behavior in the system. More importantly, an EDF can speed time-to-market in customer deployments and is easily configurable to maximize application performance when faced with a slow receiver in the system. Since data sharing and event delivery in the data fabric are built on a highly available platform that supports transparent failover and recovery, customers can deal with slow receivers to deploy applications that are guaranteed to have Extreme Transaction Processing capabilities. This can be done even in the face of slowdowns in sections of the fabric without losing any data and lets customers adapt to network vagaries in real-time while maintaining data consistency

Conclusion
A distributed data management system is a complex entity and deploying one in a production environment requires careful planning and analysis. Since we're dealing with temporal data and data consistency, it's important to have a good understanding of the network environment in which the application operates.

Every distributed system has to have policies for dealing with slow receivers. These policies have to be crafted keeping in mind the load characteristics of the system, data consistency guarantees, data loss notifications, and system throughput requirements. Tuning the network to meet system objectives like throughput and latency has to be a part of the overall system design when you consider deploying an Enterprise Data Fabric.

Upfront capacity planning to ensure that hardware resources like network bandwidth, network partitioning, CPU, memory, and I/O characteristics of the nodes that participate in the distributed system will go a long way in avoiding unnecessary slowdowns and glitches in overall system performance. It's also important to understand the congestion characteristics of the network to ensure that the system as a whole is geared to deal with burst traffic and temporal unavailability. Planning system redundancy, disk usage, and the number of applications/instances that compete for resources on a system are factors that help prevent slow receiver problems in a smooth running system.

It's also a good idea to ask what support your distributed data management vendor has in their offering to deal with slow receivers. When it comes to dealing with slow receivers in a distributed data fabric, it's a question of "when" rather than "if."

More Stories By Sudhir Menon

Sudhir Menon is the director of engineering for GemStone Systems and works closely with various development teams (both onsite and offshore) working on the Gemfire Enterprise Data Fabric. With over 17 years of cutting-edge software experience with marquee firms like Gemstone, Intel, EDS and CenterSpan communications, he is one of the key architects for the Gemfire Enterprise Data Fabric. His expertise in distributed data management spans multiple languages (Java, C++ and .NET) and multiple platforms and he has architected and developed network stacks for the last 10+ years.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.