Concluding Remarks - Gossip-based Protocols for Large-scale Distributed Systems

Gossip protocols have recently generated a lot of interest in the research community. The overlays that result from these protocols are highly resilient to failures and high churn rates. The underlying paradigm is clearly appealing to build large-scale distributed appli-cations

Our contribution is to factor out the abstraction implemented by the membership mechanism underlying gossip protocols: the peer-sampling service. The service provides every peer with (local) knowledge of the rest of system, which is key to have the system converge as a whole towards global properties using only local information.

We described a framework to implement a reliable and efficient peer-sampling ser-vice. The framework itself is based on gossiping. This framework is generic enough to be

instantiated with most current gossip membership protocols [5, 62, 63, 88]. We used this framework to empirically compare the range of protocols through simulations based on synthetic and realistic traces as well as implementations. We point out the very fact that these protocols ensure local randomness from each peer’s point of view. We also observed that as far as the global properties are concerned, the average path length is close to the one in random graphs and that clustering properties are controlled by (and grow with) the parameterH. With respect to fault tolerance, we observe a high resilience to high churn rate and particularly good self-healing properties, again mostly controlled by the param-eter H. In addition, these properties mostly remain independent of the bootstrapping approach chosen.

In general, when designing gossip membership protocols that aim at randomness, fol-lowing a push-only or pull-only approach is not a good choice. Instead, only the com-bination results in desirable properties. Likewise, it makes sense to build in robustness by purposefully removing old links when exchanging views with a peer. This situation corresponds in our framework to a choice forH >0.

Regarding other parameter settings, it is much more difficult to come to general con-clusions. As it turns out, tradeoffs between, for example, load balancing and fault toler-ance will need to be made. When focusing on swapping links with a selected peer, the price to pay is lower robustness against node failures and churn. On the other hand, mak-ing a protocol extremely robust will lead to skewed indegree distributions, affectmak-ing load balancing.

To conclude, we demonstrated in this extensive study that gossip membership proto-cols can be tuned to both support high churn rates and provide graph-theoretic properties (both local and global) close to those of random graphs so as to support a wide range of applications.

Chapter 3 Average Calculation

As computer networks increase in size, become more heterogeneous and span greater ge-ographic distances, applications must be designed to cope with the very large scale, poor reliability, and often, with the extreme dynamism of the underlying network. Aggregation is a key functional building block for such applications: it refers to a set of functions that provide components of a distributed system access to global information including network size, average load, average uptime, location and description of hotspots, etc.

Local access to global information is often very useful, if not indispensable for build-ing applications that are robust and adaptive. For example, in an industrial control ap-plication, some aggregate value reaching a threshold may trigger the execution of certain actions; a distributed storage system will want to know the total available free space; load balancing protocols may benefit from knowing the target average load so as to minimize the load they transfer.

In this chapter we elaborate on the aggregation protocol we introduced in Section 1.3.

As mentioned there, the class of aggregate functions we can compute is very broad and in-cludes many useful special cases such as counting, averages, sums, products and extremal values. The protocol is suitable for extremely large and highly dynamic systems due to its proactive structure—all nodes receive the aggregate value continuously, thus being able to track any changes in the system. The protocol is also extremely lightweight making it suitable for many distributed applications including peer-to-peer and grid computing systems. We demonstrate the efficiency and robustness of our gossip-based protocol both theoretically and experimentally under a variety of scenarios including node and commu-nication failures.

3.1 Introduction

In this chapter, we focus onaggregationwhich is a useful building block in large, unreli-able and dynamic systems [89] (see also Section 1.3). Aggregation is a common name for a set of functions that provide a summary of some global system property. In other words, they allow local access to global information in order to simplify the task of control-ling, monitoring and optimization in distributed applications. Examples of aggregation functions include network size, total free storage, maximum load, average uptime, lo-cation and intensity of hotspots, etc. Furthermore, simple aggregation functions can be used as building blocks to support more complex protocols. For example, the knowledge of average load in a system can be exploited to implement near-optimal load-balancing schemes [61].

We distinguishreactiveandproactiveprotocols for computing aggregation functions.

Reactive protocols respond to specific queries issued by nodes in the network. The an-swers are returned directly to the issuer of the query while the rest of the nodes may or may not learn about the answer. Proactive protocols, on the other hand, continuously provide the value of some aggregate function to allnodes in the system in an adaptive fashion. By adaptive we mean that if the aggregate changes due to network dynamism or because of variations in the input values, the output of the aggregation protocol should track these changes reasonably quickly. Proactive protocols are often useful when ag-gregation is used as a building block for completely decentralized solutions to complex tasks. For example, in the load-balancing scheme cited above, the knowledge of the global average load is used by each node to decide if and when it should transfer load [61].

We introduce a robust and adaptive protocol for calculating aggregates in a proac-tive manner. We assume that each node maintains a local approximate of the aggregate value. The core of the protocol is a simple gossip-based communication scheme in which each node periodically selects some other random node to communicate with. During this communication the nodes update their local approximate values by performing some aggregation-specific and strictly local computation based on their previous approximate values. This local pairwise interaction is designed in such a way that all approximate values in the system will quickly converge to the desired aggregate value.

In addition to introducing our gossip-based protocol, the contributions are threefold.

First, we present a full-fledged practical solution for proactive aggregation in dynamic environments, complete with mechanisms for adaptivity, robustness and topology man-agement. Second, we show how our approach can be extended to compute complex ag-gregates such as variances and different means. Third, we present theoretical and exper-imental evidence supporting the efficiency of the protocol and illustrating its robustness with respect to node and link failures and message loss.

In Section 3.2 we define the system model. Section 3.3 describes the core idea of the protocol and presents theoretical and simulation results of its performance. In Section 3.4 we discuss the extensions necessary for practical applications. Section 3.5 introduces novel algorithms for computing statistical functions including several means, network size and variance. Sections 3.6 and 3.7 present analytical and experimental evidence on the high robustness of our protocol. Section 3.8 describes the prototype implementation of our protocol on PlanetLab and gives experimental results of its performance. Section 3.9 discusses related work. Finally, conclusions are drawn in Section 3.10.

In document Gossip-based Protocols for Large-scale Distributed Systems (Pldal 59-62)