Distributed Shared State Concurrency

The distributed shared state concurrency problem is the problem of maintaining the consistency of some shared state in the face of multiple concurrent processes, running on a DistributedSystem.

This is a key open problem in computer science. Some claim there is no general solution for this problem. (Instead, it is often argued that we should abandon all hope of achieving GlobalConsensus on such systems, and instead use other concurrency models, such as MessagePassingConcurrency, and use algorithms that can tolerate a lack of consensus.) That caveat stated, there are many problems whose apparent solutions call for shared state concurrency in a distributed architecture -- or something that looks a lot like it.

Various ways to address the DistributedSharedStateConcurrency problem:

Is this classification all right? For me, it's rather confusing that relational databases are opposed other database architectures altogether, which by some reason also includes directory services. It's as well not entirely clear to me what is so special about NetworkFileSystem? in a way of addressing DistributedSharedStateConcurrency problem that separates it from generic directory services. -- IgorLobanov?

It seems to me that we have two completely different (but not mutually exclusive) approaches to state management: centralized and distributed. It is not a strict dichotomy but more likely a gradient scale. Also it classifies various techniques by their logical structure rather than by implementation details.

In centralized approach there is single facility which is managing all shared state in given distributed system. It acts as single point of entry for any state-affecting operations and therefore potentially introduces performance bottleneck in transaction-intensive distributed systems as every transaction must be backed by centralized facility.

OTOH such facility provides unmatched querying/analyzing capabilities because it contains all state of a given distributed system. The other advantage of theese systems is simplicity to develop. The most radical example of this camp is RDBMS.

On the other end of out scale there are truly distributed state management technologies. In such systems state usually maintained ``in the air`` via sophisticated caching and replication between system nodes. In such scenario there is no need for centralized coordinating facility because distributed system automaticaly manages it's state. Such systems usually scales up very well almost linearly, but imposes very high requirements on skills of its developers and especially architect. LindaTupleSpaces and DistributedCache? are examples of such technologies.

What about such classification shcheme? Now we can dig down in each approach to consider their subcategories and also discuss unmentioned pros and cons of either approaches. Any thought?

See also AnoteOnDistributedComputing

View edit of December 12, 2006 or FindPage with title or text search