Understanding Distributed Systems

(Moved from FutureOfObjects)

Michael, I'm only an egg with respect to distributed systems. Please explain why a distributed object system looks different from the undistributed version of the same system. --RonJeffries

Ron, A major consideration of distributed systems is that crossing the boundary between processes is much more expensive than communication between objects within the same process. Even though proxies make it appear that local and remote objects are the same, in reality it ain't so. Minimizing the impact of this is a major component of designing a distributed system. Also, in a distributed system you have to deal with the possibility that suddenly an object you were communicating with just isn't there any more. Processes and communication lines can go down. Finally, there is the problem that you have to figure out how to "Start" the system -- setting up communication between multiple parties on multiple machines is a non-trivial exercise.

Share and Enjoy.


Yabbut, how does the design vary because of this? I see that there are new objects, like the proxies and such. I see that there's some surrounding commit/abort stuff. That seems like elaborating the design to me. But people seem to talk about the design changing rather than just elaborating. Am I missing something? I feel I must be ... --RonJeffries

I couldn't help but comment. This ties exactly into what I just added to WhatIsAnalysis. I'd argue that the Analysis model doesn't change, but the design does. --GregVaughn
Since round trips are expensive, APIs that may not be the most elegant, are sometimes the most efficient. I might have a nice clean, low-granularity interface if the objects aren't distributed. As soon a distribution beckons, some refactoring of the code is pretty much in order. Since TheSourceCodeIsTheDesign , I'd argue that the design is changing. -- AlanFrancis

You could take it to be just an "elaboration" but the problem is that the "elaboration" ends up dominating the rest of the design. In InterfacesLast and FacadesAsDistributedComponents I talk about some of the "elaboration" that you have to put on top of your design. In most cases I do end up doing at least a preliminary design & implementation of the core objects as if they are non-distributed, but then the whole distribution mess starts taking over and overwhelming the rest of the project.

F'r instance, we once did an apprentice program where we were building an options trading system prototype. The first week finished the domain model in a non-distributed way. The following THREE weeks were devoted to handling distribution details. That was mostly spent modeling and implementing tough problems caused by distribution.


Three weeks, sounds about right. But when you were done how much had the domain component changed because of it? Or were you able to use the OpenClosedPrinciple to keep domain classes closed, and the distribution details as additions to them (rather than modifications of them)? -- JohnClonts

John: The domain classes didn't change much at all. The changes we did make were inconsequential to the design (minor implementation changes). However, there was a ton of gunk added around them. --KyleBrown

AnoteOnDistributedComputing at http://www.sunlabs.com/techrep/1994/abstract-29.html makes a strong case against local-remote transparency in distributed computing. The gist of it is summed up in the first line of the abstract:

"We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space."

- StephenPetschulat

I remember a presentation by SteveJobs about "Portable Distributed Objects" (done in ObjectiveCee during his NeXt phase). There were assurances that developers wouldn't have to write "networking gunk" and that things would be a lot simpler. You might get to hide the implementation details of the distributed code, but you just can't hide the nature of an unreliable communications medium or the timing differences. Reminds me of RMI today...

Seconded. The real use about the distributed object stuff, like RPC before it, is that it cuts down a lot of the networking, event-handling, marshalling, etc, code you used to have to write, and both client and server code get much simpler. Unfortunately, the marketing story tends towards "you don't have to think about distribution any more". --SteveFreeman

Has anyone on wiki used ObjectSpace's Voyager for real things recently - anyone from Novell, Galileo or Tivoli out there for example ?

Voyager's a remarkably cool 'non-intrusive agent orb' for Java which now aims to become a Java-centric 'rosetta stone for distributed objects' (of the CORBA/DCOM/RMI varieties). That's the marketing puff.

The reality of using Voyager 1.0 in anger though for a major extranet system late in 1997 was very encouraging. It enabled the creation of a remarkably solid, scalable and fault tolerant server system very quickly. (I was at some distance in version 1.0 scepticism mode at the time I fully admit.)

Voyager makes you very aware of distribution - but a lot of tiresome housekeeping stuff (even in RMI) is taken out. So you can prototype functions thinking pure Java objects, try different distribution strategies and get much quicker feedback on the impact on the measurable qualities you're interested in.

I'd be interested in recent experience with it in the light of comments here and in FutureOfObjects. --RichardDrake

The domain classes didn't change much at all. The changes we did make were inconsequential to the design (minor implementation changes). However, there was a ton of gunk added around them. --KyleBrown

This is more consistent with my experience and expectation. There will be lots of work associated with distributed, but I don't expect the design to change that much. That's why I asked the original question. Still listening. --rj

Since round trips are expensive, APIs that may not be the most elegant, are sometimes the most efficient. I might have a nice clean, low-granularity interface if the objects aren't distributed. As soon a distribution beckons, some refactoring of the code is pretty much in order. Since TheSourceCodeIsTheDesign , I'd argue that the design is changing. -- AlanFrancis

Clearly not every change to the source code is worthy of being called a change to the design. We need to see what a design would look like with, and without distribution, and how they'd vary.

GemStone/S keeps all the objects, looking just like objects, in the (server) database. The attached clients are written in the same language, and they can talk to forwarders, and to replicates. There are efficiency considerations in using either*. Forwarders are good for few messages / big execution. Replicates are good for lots of messages / little updating. GemStone lets you determine, for any object, the level of faulting that will occur.

These adjustments have essentially nothing to do with the design of the application, in my experience. Since these are exactly the tradeoffs in every distributed situation, as far as I can see, I am still wondering why we think the design changes when you go distributed. --rj

Ron, you keep emphasising design. I'm not sure what you mean by it. Your idea of design is not mine. To me at least a part of design is deciding how to partition your application across multiple servers and processes in order that it by performant. The Gemstone system has warped your mind to make you think that distributed programming is easy. I love the Gemstone idea - it solves certain distributed problems easily. But, try to imagine how Gemstone is implement efficiently. That should help! -- StevenShaw

* GemStone makes this a bit trickier by replicating objects into your client space even if you say you only want forwarders. By doing this behind the scenes, GemStone cuts off a corner of the efficiency space, making some solutions inaccessible. Wonderful.

I think that a system like Gemstone/S which allows for the option of transparent replication of objects to the 'client', including automatically and transactionally saving that state back to the server, is quite a bit beyond what is assumed as part of a distribution architecture. -- AlanKnight
Yup. I've told many people that if they want to know what EnterpriseJavaBeans and distributed programming will look like in 5 years, look to see what Gemstone/S provided five years ago.


Most of the toolkits (RMI, CORBA, etc.) that enable distributed computing barely scratch the surface of helping developers solve the problems surrounding such a system. JimWaldo?'s paper (referenced above) is a great intro to those problems:

Those are the big issues that surround distributed systems that immediately come to mind... some of the solutions to the above problems include RPC (like RMI & corba), MessageQueuing (MqSeries), TransactionProcessing (CICS, TuxedoMonitor and the lot), GroupCommunicationSystems? (Isis, SoftwiredIbus?, Horus, etc.) and TupleSpaces (LindaTupleSpaces, JavaSpaces, Voyager, etc.)

I love this area of computing.. very far-reaching, and it fits well with object-orientation (DistributedObjects).

I find that the Partial Failure problems are far and away the biggest issue, given the parlous state of most if not all middleware technologies today. Latency is a fact of life, you know it's there and live with it: the concurrency stuff is well-researched, with literature increasingly available if not as well understood as it might be, but the failure issues just don't seem to have given rise to products or improved consciousness in the same way. In my experience the best bet for communicating things that matter are the guaranteed messaging technologies, but using these typically means adopting a workflow metaphor as opposed to a distributed object model.


People like writing tests because its a big fad now, but I've rarely seen anyone write or create a test environment that is at least as actually unfair as the real world is, and this is what's needed to familiarize a team with failures. I've learned a lot by using a psycho-killer process that picks processes from a set and kills them at random (throttled, you can pointlessly overdo this) during live environment testing to see how things turn out, and on projects where the system is distributed physically across a LAN went a step further and randomly restarted, purposely crashed, or disconnected network components on systems (and its important to note that you can't simulate the reality of a network for testing by simply virtualizing it -- try replacing "virtual" with "pretend" and you may find enlightenment on more than just distributed testing...). This kind of thing cannot replace case analysis -> test-writing -> coding and is not part of development methodology, but taking these sort of steps after we thought our systems were already pretty solid was extremely useful. Introducing an element of deliberate malice which doesn't exist within the optimistic psychology of primary development is critical to simulate the real world, IMO, because nothing is colder, more uncaring or crueler to your ideas than the real world.


In addition to Stu's very good list there is:
  1. constraining consumers. In a distributed system it is very easy for the number of clients to grow arbitrarily making it so the server has to be able to constrain consumers in a way to preserve real-time, memory, bandwidth, etc.
  2. constraining data exchanges. You just can't send 10MB of data to someone as you can in a nondistributed system. In a nondistributed system you could just send a pointer (or whatever). In a distributed system you have to setup cursor like protocols between many consumers and producers and deal with failures that can occur.
  3. Keeping distributed state machines in sync across arbitrarily complex failure scenarios. Most interesting distributed entities are in relationships with other distributed entities. If someone you are in relationship with goes down or up you have to react to that and possible resync with them or do other things. And of course others are dependent on you so this can get extremely complex. If you have not dealt with this it's hard to imagine.
  4. Having hot standbys and other high availability issues complicate things to an even greater degree.
  5. Guaranteed delivery is an illusion. Machines can go down forever and you have to deal with the dynamic reconfiguration this entails.
  6. Latency and detecting failure complicate code quite a bit. There's no real way to know what timeouts should be used, how many times you should retry, etc.
  7. Lack of programmers who understand distributed programming. Especially because distributed programming tends to bring in threads, queuing, mutual exclusion, etc.

In a real system distributed programming is nothing like "normal" programming, especially when considered from the systems perspective. If you can just look at tiny parts it may look sorta normal, but that's like looking at a cell and thinking you understand the whole organic being.


Maybe we're all missing the boat here. What we really want is DistributedDataCentralizedProgramming.


See also DistributedComputing, MessageQueuing, JavaSpaces, LindaTupleSpaces, InSearchOfClusters, MessagePassingConcurrency

View edit of July 10, 2012 or FindPage with title or text search