Please take a look at the recently created Flow-Based Programming Wiki at http://www.jpaulmorrison.com/cgi-bin/wiki.pl
, and add your name to the Visitors
What is now known as "Flow-Based Programming" - and also as DataflowProgramming
- was invented/discovered in 1969-71 by PaulMorrison
at IBM Canada. An early implementation of these concepts has been in continuous use at a major Canadian bank since around 1975, running some very large batch applications, but it has remained largely unknown in the industry as a whole, although it has recently started to attract more attention as a result of PaulMorrison
's 1994 book ISBN 0-442-01771-5
, and his web site http://www.jpaulmorrison.com/fbp/
Flow-Based Programming is an approach to developing applications, not in terms of the old von Neumann paradigm, but based on the concept of multiple asynchronous processes communicating by means of data streams. An application is viewed as a system of data streams being transformed by processes, rather than a single "dumb" actor doing one thing at a time until all the data is processed. In addition no process knows who its neighbours are - communication is indirect via connections defined externally to the process modules or components. This allows components to be "black boxes", so that complex applications can often be built without writing any new code at all. Although this concept is similar to concepts current in the area of distributed and parallel systems, up until now it has not been recognized that it is also an extremely productive approach to improving programmer productivity and application maintainability.
In some ways this resembles DataCentricThinking.
Absolutely, FBP is a data-centred approach to programming - data should
be central to data processing. I find it interesting that, even in the early days of DP, people were noticing that business programming seemed surprisingly complex. I am convinced this came from having to force data processing into a VonNeumannArchitecture
would seem to be an example of a ServiceOrientedArchitecture
- or an infrastructure enabling such an architecture. For a high-level diagram of an interactive application, see SchematicOfBrokerageApp
An interesting design decision in DataflowProgramming
systems is whether to provide bounded or unbounded message buffers primitively:
- In the ActorsModel, unbounded buffers are primitive, and bounded buffers can be constructed by use of continuations and "serializers".
- In the 'pure' ActorsModel there are no synchronous sends. In such a system it is impossible to construct bounded buffers except by having all actors in a system report to a grand-central-processor actor that essentially controls communication flows between actors - an approach that cannot be readily used to compose disparate actor-based services. Hybrid ActorsModel systems (e.g. using SendReceiveReply along with holding a reply across receives) can be used to implement bounded buffers but cannot simultaneously support transactions.
- Supposing bounded-buffers are only necessary within an actor-configuration system, it is conceivably possible to achieve such by defining some of the services surrounding the actors... such as the rules governing the scheduler.
- In CommunicatingSequentialProcesses and in the system described in FlowBasedProgrammingBook, bounded buffers are primitive, and unbounded buffers can be constructed by explicitly storing messages in queue objects.
Being able to support both is an important test of expressiveness for a dataflow model.
for a comparison between that model and FlowBasedProgramming
I think it is related to CommunicatingSequentialProcesses
The FBP concept, or more usually a subset of it, has been discovered independently many times, ...
, for example.
... but for some reason, none of these have really taken off. One possible reason is that each lacks some key features - in CSP's case, probably the concept of ports, which are necessary to create true reusable components (unless Hoare introduced them after I read his paper).
CSP "channels" are equivalent to ports.
Also an FBP network is a directed graph, so FBP is a highly visual PatternLanguage.
I'm one of those blissfully ignorant re-discoverers, so when I saw this page some time back I was pretty happy, and immediately bought and read your book. It's quite interesting, although some of the terminology took adjusting to, since I'm from the microprocessor/UNIX/systems programming world much more than from that of DP/mainframes. -- DougMerritt
Thanks, Doug! Please feel free to contribute to the wiki - I'd like to broaden the discussion...
One (very) simplistic genre of DataflowProgramming
are Unix filter pipelines. They are used quite extensively, too, by experienced shell programmers.
FBP, and DataflowProgramming
languages in general, are closely tied to the style of FunctionalProgramming
depending heavily on FunctionalComposition
, especially in lazily evaluated environments. This holds even to the degree that I'd claim FBP (at least all examples I've seen of it) is isomorphic to a subset of this FunctionalComposition
style, namely that in which all underlying data structures are lists.
Yes, my book (it's also on my web site) draws parallels between FBP and FunctionalProgramming - see http://www.jpaulmorrison.com/fbp/recurs.htm . However, FBP seems more powerful as the connections constitute a DirectedGraph, including loops (carefully!), plus the components run asynchronously, so the connections are BoundedBuffers.
abstracts away whether the evaluation is parallel or sequential, and you can
form circular graphs in languages with LazyEvaluation
, such as in the following definition of the FibonacciSequence
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
One other point I should have made is that in FBP what flows across the connections is streams of structured data objects (actually their handles), which have a well-defined lifetime - from creation to destruction. These resemble Linda's "tuples" (see LindaLanguage) in that they exist in a space of their own, like TupleSpace. However, unlike in Linda, these data objects are not retrieved associatively.
This, too, is a property of lazy lists -- the ability to hold complex data structures. So essentially, FBP is like programming with lazily-evaluated functions from lists to lists. Not that this value-type restriction wouldn't be useful, probably... but in effect, FBP is a subset of lazy functional programming.
I agree in the case of components - I have a chapter on this in my book, with the logic of a couple of FBP components expressed in functional notation - see http://www.jpaulmorrison.com/fbp/recurs.htm . I think FBP network definitions are a bit different, though - they are basically lists of connections. However,
parts of a network can definitely be expressed functionally - see the new Wikipedia article http://en.wikipedia.org/wiki/Flow-based_programming.
Semantics of an asynchronous call are much more difficult than synchronous. In my experience, business applications built in this style tend to have more bugs and produce much bigger maintenance overhead than similar applications using synchronous calls. Variety of error conditions that the coder must think about explodes. Handling transaction rollbacks across components is nearly impossible (technically possible, but practically too difficult), so database inconsistencies become commonplace. So, there are cases where asynchronous communication is necessary, but I prefer to avoid it whenever possible.
Alexey, take a look at my wiki and/or web site. We wrote extremely complex programs, with a much lower bug rate, with a lifetime that has, so far, extended over more than 30 years, processing millions of transactions per day! As you will see, we are not just doing forks and joins - FBP is a whole architecture and methodology. There are very good reasons why FBP programming results in more reliable and maintainable code. BTW We also developed techniques for doing transaction checkpointing - there is some discussion of this in the FBP wiki - maybe you would like to contribute...
Some 15 years ago, I (a postgrad student then) worked with people from organizations like Computing Center of Academy of Science, USSR. In that context, any paradigm was possible and most were even easy. My recent experience is quite different, as you can see. What I reported above is not a subjective opinion - it's a hard-learned lesson. The problem is, most commercial programmers are not motivated to learn anything new in-depth. They only read stuff about specific technology.
Alexey, I'm afraid I don't get your point... 15 years ago works out to around 1989. At that point we had been using this technology for about 20 years (and I had been in the IT business for 30 years) - programs built in the early 70's using this technology are still working, helping to run a bank, which as you know requires very high reliability. What does your comment about commercial programmers refer to?
Now I am reading what I wrote, and it looks like a free flow of consciousness, too :) Let me try to put it in a clearer way:
1. Complex paradigms are both powerful (in the hands of masters), and dangerous (in the hands of "commodity programmers"). Most of the people writing and maintaining business apps [that I have seen recently] are the latter kind. They don't get along very well with complex paradigms. All this stuff about 15 years ago vs today is about that distinction between masters and "commodity programmers".
2. As I said above, asynchronous call is IMHO a more complex paradigm. The moment you have it, you must think about off-line error handling, error correction, retries, and all kinds of other interesting issues. Whereas in case of synchronous calls, all these things are a lot easier, and even when you fail to deal with some error, container (Tuxedo, EJB or something else) usually saves you from yourself, so that the whole thing ends up rather gracefully. As a side benefit, you get a nice log entry on the caller which tells you the story from both sides of the interface - easy debugging.
3. In my current job, I have seen at least twice as many "interesting" bugs per functional point in anything that used asynchronous communication, than any other way.
4. Something else not mentioned before - it is difficult to write automated integration tests for any flow that involves async communication, too.
I am not denying that it can be a good choice in some situations, just questioning the assertion that it can be safer and less bug prone. My experience says otherwise.
Yes indeed, but you are talking about "asynchronous communication". This page is about "Flow Based Programming". You seem to be assuming they are identical topics. They are not. I've read Paul's book, and what it describes is not particularly similar to any of the widely used systems I've heard of. So you should not just assume that your experience applies; maybe it does, maybe it doesn't. Paul thinks it does not apply (thus the disagreement), and I'm inclined to agree with him. Read the book and see. -- DougMerritt
Interesting discussion! I now see that part of the problem is different types of "complexity": FBP may seem more complex because it has multiple processes, but a large conventional program can be very
complex in a different way, because of the timing constraints on data use and modification imposed by the sequential structure. After a bit the fact that there are multiple processes sort of recedes into the background, and one just finds FBP very easy and natural. FBP is a clear example of that over-worked term "ParadigmShift
". You have to change the way you think of an application to something more like a collection of machines communicating by means of objects travelling on conveyor belts. People who have built up lots of experience with conventional programming, but not to the point where they are frustrated with it, are the ones who seem to have the most trouble with FBP! -- PaulMorrison
A related point - I think this came up in the discussions about the ActorsModel
- people like Alexey worry about multiple processes accessing the same data, which I agree is fraught with perils. FBP sort of turns that idea upside-down: no FBP programmer would do that, except in the case of static (or almost static *) data, as s/he knows that the asynchronism makes such logic unpredictable. Similarly to the ActorsModel
(I believe), the only data an FBP process can see is a) local (working storage) data, b) data packets created or received by that process and not sent on or discarded, or c) static data or data in a very restricted global facility (call it "almost static") *.
The almost static data referred to above (*) occurs in a situation that comes up quite often in business applications: a table is read in from disk by process A, and stored where processes B and C can access it, but not modify it. Processes B and C are constrained so they don't start until A terminates or sends them an explicit signal. If B or C have a need to modify the table, this mechanism is not appropriate. -- PaulMorrison
A very interesting bit of history: DougMcIlroy
was very interested in what amounted to FlowBasedProgramming
very early on, starting from an interest in coroutines, and his continued insistence led directly to the implementation of the less-powerful but more manageable Unix pipes by Ken Thompson.
In 1964 (at Bell Labs, but well before Unix efforts began) DougMcIlroy
wrote a memo that said in part "We should have some ways of connecting programs like garden hose - screw in another segment when it becomes when it becomes necessary to massage data in another way. This is the way of IO also." Quoted on Dennis Ritchie's site at http://cm.bell-labs.com/cm/cs/who/dmr/mdmpipe.html
. [Ignore the second "when it becomes".
A related bit of the history of pipes and how they finally appeared in Unix in 1972 (but which doesn't make as clear DougMcIlroy
's interest in more full-fledged FlowBasedProgramming
) can be found at http://cm.bell-labs.com/cm/cs/who/dmr/hist.html#Pipes
's interest and advocacy does not give him historical precedence over PaulMorrison
, of course, but it is still interesting.
-- dm (umm...DougMerritt
, not DougMcIlroy
This is another example of what I've heard called "steam-engine time": when it's time for the steam-engine to get invented, it shows up all over! Conway's famous article appeared in '63, but I don't believe coroutines caught on until quite a while later. I had run into coroutines quite early on, but IIRC the first descriptions of them were expressed in terms of "A calls B and B calls A", which seems to violate one's sense that subroutines ought to call and return in LIFO sequence. (I don't have Conway's paper in front of me.) In the late '60s, I sort of accidentally implemented what I later
came to call coroutines, using suspend/resume and bounded buffers, rather than calls, which worked very well, and solved a number of problems that had been bothering me about the VonNeumannArchitecture
. I published a paper on it in the IBM Systems Journal in 1978, and shortly afterwards got a nice letter from the Unix guys, saying basically "Funny thing...". IBM also published a letter from them and my response to it in the correspondence section of the Systems Journal - I think in the following issue or the one after that. In it I said, as I have said elsewhere, that FBP processes communicate using streams of structured objects, rather than streams of characters as UNIX does. I know the ancestor of that idea too - in the mid 60s IBM was working on a simulation language (I don't think it was ever sold commercially), where the objects travelled between nodes in a network - they called them "entities" - for instance, think of cars travelling between facilities in a service station such as pumps, carwash, air, etc... This seemed like a neat way to build programs...! -- PaulMorrison
Some years ago, I wrote a data-stream processing library in Perl. To use it, you'd attach processor objects together, similar to a Unix pipeline. Each object could have zero or more Data
Source interfaces and zero or more Data
Sink interfaces. The objects would connect themselves together as you specified, a Data
Source connecting to the appropriate Data
Sink, and vice-versa, in a doubly-linked network.
Then you'd tell the network to process the data, starting at each output. (This was BTW in the days before Perl threads.) The output object knew which data it needed, so it would request it from the appropriate upstream Data
Sources, whose objects would request the data they needed from their upstream objects, and so forth. This continued until the request reached an object that could access raw data, an object such as a File
Source, which would then send a block of data to all of its Data
Sinks (or respond that it had no more data). Each of these objects would then process the data and send their results further down the network. The process repeated until all the data was processed.
The design proved flexible and suitable for working with megabytes of data. (Though I'm sure it could have been suitable for gigabytes, in certain cases, if we had had gigabytes of data to process.) The AchillesHeel
was that if the processing network was arranged such that some data came through before it could be used, the data would have to be cached, which used up RAM. Sometimes, lots of RAM. For example, if you wanted to count the number of bytes and prefix the output with that byte-count, you'd have to cache all the bytes while they was being counted. (This would be done in a concatenation object.) Then after all the data was counted, the counter object could send the actual count down the stream, and then the cached data could be sent down after. It went something like this:
$input = infile("foo.dat");
$count = count_bytes($input);
$counted_data = cat($count, $input);
$output = outfile($counted_data, "counted-foo.dat");
In FBP we treat this as a design problem. It's much easier to see with a graphical notation. I don't see this as an Achilles heel! See the discussion of "storage deadlock" in http://www.jpaulmorrison.com/fbp/deadlock.htm.
Good point. So we could do something like:
$buffered_input = disk_FIFO($input);
$counted_data = cat($count, $buffered_input);
In this case, $buffered_input could even be another infile("foo.dat"), and it would easily avoid memory overflow if the data is large.
Yes, that is what I mean! In our applications, you could very often do this by just writing the stream to disk using a disk writer, which would then trigger a disk reader to read the same file and send it on. Of course, you can have multiple instances of this in a single network as long as they use different files.
What do you think of the work being done in OzLanguage
? Oz is (for those unfamiliar with it) a MultiParadigmProgrammingLanguage
which includes extensive support for a dataflow model (actually, several different dataflow models). See ConceptsTechniquesAndModelsOfComputerProgramming
for more info on Oz.
At first glance, it certainly looks fascinating! It seems to combine a number of important programming technologies - I'll be interested in how they all fit together. At the risk of sounding like an ogre, though, how does it perform?! Can it support high volume business applications? See also some of the comments in the OzLanguage page.
Postscript: I still believe the correct direction is to use a coordination structure tying together modules which can be written in conventional languages or mini-languages - perhaps even independent interpreters. This means that the various languages don't have to be overloaded - indeed some of the components don't need to be TuringComplete, e.g. Sort parameters, SQL, etc. This idea works even better if the coordination "language" is visual.
It seems like this page is incomplete without a reference to PrographLanguage
! I found the ParadigmShift
involved in Prograph FlowBasedProgramming
to be a mind-expanding experience.
What about LabView
An exciting topic and discussion.. Let me add a biological perspective on the FBP. Hopefully, it may be
helpful when discussing FBP. Nobody programs explicitly a brain - the environment does. And environment is the (structured) signals. A signal triggers a chain response - data flows through a layered pipeline and get processed at each layer (like, e.g., a lens 'logic' collects all rays from an image to a focal point) doing a (sophisticated) signal processing. Where the logic is? Somewhere - it does not matter. And how the signal/data *finds* a proper piece of logic related to the processing of that signal or its segment? Good question, indeed. Well, if logic is *distributed* in space, the only way for the signal to find the proper logic is to be "associated" with the proper piece of logic. Thus, the control flow gets specified by the signal/data, its structure. The association architecture mentioned (that is, all the data-logic associations) is a kind of built-in interporeter decoding the signal/pattern and directing the proper segments of the signal to the associated logic. [imagine a signal f(t) to be decoded by the "interpreter" into the frequency components and then these components to be sent to the proper frequency-dependent logic for processing] In other words, the logic already sits in the right locations, and the signal gets interpreted by the system in terms of associated logic and gets redirected properly. The output may be integrated into the input signal - a powerful feedback, thus making the control highly flexible.
Does it make any sense? -- Val
These ideas remind me more of LindaLanguage
, as the former retrieves data associatively, rather than by using "tram-lines" as does FBP. I am just not convinced Linda has the "rigor" to do large-scale data processing - but I'm willing to be convinced otherwise. However, another idea we had with FBP was to have network nodes that would accept logic modules (think DLLs) to customise their logic. A stream of data could then be preceded by the logic to process it. Still, if one could add self-organizing abilities to FBP, it would be pretty interesting... :-)
Another idea along these lines would be to allow certain parts of the network to adjust their multi-threading levels based on traffic requirements. -- PaulMorrison
Would you see FlowBasedProgramming a useful/suitable stepping stone to the implementation of a future WebServices based ServiceOrientedArchitecture? How do you organize a "Proof of concept" of this approach?
Sorry, David, didn't see your post! I am not familiar with SOA concepts, but I'm wondering if a Brokerage app we built a few years ago using the Java implementation of FlowBasedProgramming
is along the lines you have in mind. A very high-level flow is shown on http://www.jpaulmorrison.com/cgi-bin/wiki.pl?BrokerageApplication
. -- PaulMorrison
feb 15 '04 I've been doing things with a tcl/tk package I wrote years ago which allows sort of what the title of this page suggests, called BWise: blocks with pins and wires which house functions and various ways to make the blocks 'run' in sequence automatically, and all this with real time feedback and programmability (in tcl), see: http://wiki.tcl.tk/bwise
and for examples: http://wiki.tcl.tk/8565
. Let me know what you think!
That's pretty impressive, and the animation is neat! We did a simple animation of data chunks travelling through a FlowBasedProgramming
network, and we found it got the ideas across better than any amount of talking! I keep thinking about putting some animation into DrawFBP - http://www.jpaulmorrison.com/cgi-bin/wiki.pl?DrawFBP
, but haven't yet gotten around to it. Couple of questions: do you support subnets (or zoom in/zoom out)? Does the infrastructure support asynchronous operation, or is the model purely synchronous? Are you familiar with LabView
? How does BWise compare with it? -- PaulMorrison
Thank you. Was that animation done with BWise? Well, bwise was based on various ideas and concepts and packages from the past, like in Network Theory Section (of Delft Uni, EE) people were programming this sort of stuff, Khoros (version 2.0 isstill around, but commercial) and AVS (Commercial visualization package with flow programming) and Ape (another flow based package) were all around over a decade ago, and were usable.
However, most were not trivial to program, and the ideas behind the block flow were sort of hidden, and they would, with the exception of the Khorosheart maybe be very bulky (and in the case of the network theory section software to buggy and crashing to be useful). So I set out at some point to do only the essence of block-wise programming, with as direct and 'simple' Tk interface, in the interpreted tcl language, so that all things can be reprogrammed, seen and changed 'live'. The 'scheduler' that is most used it based on the net_funprop procedure, which starts from one block, recomputes its output, and passed the changes through the network in the most efficient way" calling each block function exactly once.
There are also function which use for instance a 'flood' algorithm, which also works, but is less efficient, and sometimes requires inputs to remain fixed. Also, I've done some things with 'triggered' blocks, which is only little described in the wiki pages, I think.
Matview is very bulky, too, I think, I've only quickly looked at the demo version, because, major difference, it's quite expensive... A lot of measurement and other equipment can be interfaced pretty rapidly with matview, and I IIRC it has a lot of analysis functions like ffts and such built in. BWise is more a bare bones idea, and I thus far haven't had any commercialization ideas, it started as a tool as a part of software (synthesizer) sound synthesis research for myself, and to proof that such a package could be small and reliable enough, even with realtime user interface.
Theo, I guess I put in my question about subnets while you were answering my previous questions! -- pm
This looks the same as FunctionalReactiveProgramming
, which has seen a couple of interesting implementations recently.
I suspect rules-based systems, or at least their 'trigger' definitions that are based on observations of data and events, would be extremely useful in hooking into external sources of data, streams, and events. One can also observe other such 'rules', or abstract over rules by having variables for what systems are being observed.