Synthesis Lectures on Distributed Computing Theory
3 total works
Fault-Tolerant Agreement in Synchronous Message-Passing Systems
by Michel Raynal
Published 6 June 2010
Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. A previous book Communication and Agreement Abstraction for Fault-tolerant Asynchronous Distributed Systems (published by Morgan & Claypool, 2010) was devoted to the problems created by crash failures in asynchronous message-passing systems.
The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement and non-blocking atomic commit.
Being able to solve these basic problems efficiently with provable guarantees allows applications designers to give a precise meaning to the words ""cooperate"" and ""agree"" despite failures, and write distributed synchronous programs with properties that can be stated and proved.
Hence, the aim of the book is to present a comprehensive view of agreement problems, algorithms that solve them and associated computability bounds in synchronous message-passing distributed systems.
The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement and non-blocking atomic commit.
Being able to solve these basic problems efficiently with provable guarantees allows applications designers to give a precise meaning to the words ""cooperate"" and ""agree"" despite failures, and write distributed synchronous programs with properties that can be stated and proved.
Hence, the aim of the book is to present a comprehensive view of agreement problems, algorithms that solve them and associated computability bounds in synchronous message-passing distributed systems.
Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. Considering the uncertainty created by asynchrony and process crash failures in the context of message-passing systems, the book focuses on the main abstractions that one has to understand and master in order to be able to produce software with guaranteed properties. These fundamental abstractions are communication abstractions that allow the processes to communicate consistently (namely the register abstraction and the reliable broadcast abstraction), and the consensus agreement abstractions that allows them to cooperate despite failures. As they give a precise meaning to the words ""communicate"" and ""agree"" despite asynchrony and failures, these abstractions allow distributed programs to be designed with properties that can be stated and proved.
Impossibility results are associated with these abstractions. Hence, in order to circumvent these impossibilities, the book relies on the failure detector approach, and, consequently, that approach to fault-tolerance is central to the book.
Impossibility results are associated with these abstractions. Hence, in order to circumvent these impossibilities, the book relies on the failure detector approach, and, consequently, that approach to fault-tolerance is central to the book.
Theory is what remains true when technology is changing. So, it is important to know and master the basic concepts and the theoretical tools that underlie the design of the systems we are using today and the systems we will use tomorrow. This means that, given a computing model, we need to know what can be done and what cannot be done in that model. Considering systems built on top of an asynchronous read/write shared memory prone to process crashes, this monograph presents and develops the fundamental notions that are universal constructions, consensus numbers, distributed recursivity, power of the BG simulation, and what can be done when one has to cope with process anonymity and/or memory anonymity. Numerous distributed algorithms are presented, the aim of which is being to help the reader better understand the power and the subtleties of the notions that are presented. In addition, the reader can appreciate the simplicity and beauty of some of these algorithms.