On Decomposing Systems

Short WikiName for a seminal paper by DavidParnas in 1972 titled "On the Criteria To Be Used in Decomposing Systems into Modules".

See: If anybody knows where the actual source-code is, please let us know. Thanks

Huh?

The computer programming listing/text source-code that was used in this study. You know, IF-statements, functions, variable declarations, WHILE-loops, etc. I'm sure the two approaches compared are not the only way to program it.
Various comments:
As a RelationalWeenie, I see the paper as justifying the need of a DataBase of some kind (probably a NimbleDatabase in this case) rather than hand-build your own arrays and linked lists of objects/nodes to house and manage the data. Parnas' favored solution looks like a mini-database (non-relational) of sorts to me. However, the paper lacks specifics about the application requirements, so it is hard to get into a point-by-point comparison. I suppose this is kind of an ink-blot test where OO fans see ADT-like encapsulation and database fans see it as promoting databases to hide and reduce reinvention of the inners of indexing, querying, and sorting. This paper probably won't settle any HolyWar. -- top
This is a great paper. This doesn't mean it settles any HolyWar. Papers can only start holy wars, not settle them. -- RalphJohnson

It is not a great paper because:


While 'top' is certainty entitled to his 'opinion', I urge all who have not read this paper to do so for yourself. For one thing, the paper dates to 1972. It is therefore not surprising that is does not provide an empirical comparison of OO vs anything. OO was still a research project at the time, and indeed was most of modular programming. It is a paper about principles and the exercise of professional judgment, it is a seminal paper, and it is quite possible that it will be far less opaque to others than it was to 'top'. In any case, until some magic methods appear which can be applied without professional judgment, where upon we can all look forward to permanent unemployment, as programming becomes fully automated, it may be useful to peek inside the minds of the some of the greats who have come before us. We could even come to realize that the exercise of professional judgment is part of any non-trivial discipline, and embrace it.

-- MarcGrundfest

Perhaps there's a more fitting topic already, but heavy "fully automated programming" is a pipe-dream in my opinion (short of human-like AI) because of the inherit fuzziness of requirements and impact of not understanding the longer-term implications of certain info organization decisions. Software developers are not just coders, but also a (hopefully) skilled liaison between machine (literal logic engines) and humans in order to make the link between the two as effective and practical as possible. I doubt a non-human liaison will understand the human side of things sufficiently.

First, are we to fail to make a distinction between creating requirements and generating code? If we make the distinction, then we have automatic programming already. If we do not, then we may yet become 'programmers' who do only requirements analysis. The context above is that 'top' appears to claim that the principles of design upon which we rely are not 'scientific' because they are not objective. In practice, this means that we need to apply professional human judgment. I deny the claim that the application of judgment is a disqualification for 'Science' (strong claim) and then bypass the issue to suggest that even if true we need to learn, understand and apply these principles (weak claim) and that the paper cited is a great paper because it allows, us (most of us) to do that. I am not claiming that human judgment can be removed from the practice of software development in either the strong or weak form, and did not intend to create that impression. It fact my point is the opposite.

-- MarcGrundfest

You are mischaracterizing my viewpoint. I don't claim that software engineering is inherently beyond science; only that the necessary science has yet to be done and that it won't be easy to "do it right". The existing material tends to focus on the low-hanging fruit; but to get a better sample of the tree will require higher samples and more samples. -- top

OK, but what I am saying here is that it does not matter at all whether it is Science or not. If it's not Science and never can be (strong form), or if it could be but is not yet (weak form), then in either case we must learn and apply the principles in this paper. The more you think they are not Scientific (yet) the more judgment is needed, then the more you need to understand the paper and not dismiss it. If we are to become a Science (weak form), we will do so by studying the greats who have come before. If we can never become a science (strong form), then we must still learn and apply the principles in this paper to have any hope of doing our jobs. I am sorry if you think otherwise, but I am not all that concerned - I am concerned that others have the opportunity to read the paper and judge for themselves. I do not agree with your characterization of the paper, but I do not dispute your right to hold your views. Neither do I think it's a fruitful use of our time to debate the issue. I am content to allow the paper to stand on its own merits (or lack thereof) and I trust others to form their own opinion.

-- MarcGrundfest

The devil is usually in the details, and without source-code we don't have sufficient details. How can one test competing theories if the skull is locked away in somebody's drawer?

I guess I will just have to trust the other readers to have better luck. Most of them seem to manage. Of course, you could be right - but once again I am not all that concerned. What I really do not understand is why you seem to care? If, as you say, the paper is not a great paper, and offers little of value, how can it possibly bother you that others disagree? I mean if I thought the rest of the world was wrong I would look for a way to leverage that to my advantage not give away the store. You are truly a mystery, top. Now I care only that others judge for themselves. Do they have that right - the same right you treasure so for yourself? Can we at least agree that they have to right to believe it's a great paper even if you do not?

-- MarcGrundfest

MarcGrundfest

Without seeing the actual source-code, I find it hard to verify or comment on the write-up alone. If you believe the write-up alone is sufficient, we'll just have to AgreeToDisagree and move on. I'd really like to see if modern tools/techniques can give us more choices and better designs.

Now are we agreeing that others have the right to have a different opinion of the paper? Or are we agreeing about the fact that the paper is in dispute. We already know that the paper is in dispute. What I want to know is why you think that others can't disagree on the paper unless they argue with you first. I am ok it you want to think it's a bad paper. I am ok if you want to think that others are fools if they think otherwise, I am ok if you think you have the right to tell others it's a bad paper (which you have in you original comment). I do have a problem when you can't say the words "I am ok if you think it's a good paper, and I think it is good thing for other people to read it an decide for themselves."

Can you write back, as a show of good faith as follows and I will do the same (in fact, I will do it first):

I agree to let others decide for themselves whether this is a good paper or not. I trust their judgment and recognize their right to do so - the same right that I have. -- MarcGrundfest

I made my comments based on the limited information available. It's like commenting on Jupiter using an Earth-bound telescope. I'd like to launch a probe to Jupiter called "Actual-Source-Code I" to verify Earth-based observations and/or challenge it with new theories based on closer observations. We already know that Earth-based observations are not giving us a very clear view. But just as astronomers still commented on Jupiter before the days of probes, one can still comment on the paper without a source-code probe. Nobody was fired over that. If probes don't exist, they don't exist. We do the best with what's available. But it cannot qualify as a "great paper" without source code in my book. Unlike astronomers and probes, the lack of code is author/publisher sloth, not a technical limitation. You want the "great" badge, you need the code. -- top

I do not know what you are talking about. If I am very lucky I never will. -- MarcGrundfest

The feeling is mutual. I have no ability or desire to stop others from commenting on the paper. I am merely cautioning against over-interpreting incomplete information. (I've since tried to improve the above paragraph to hopefully make it clearer.)

[It shouldn't come as a shock that there's a good reason for not including the source. It's irrelevant, and in 1972 you didn't include irrelevant things that would increase the size of the paper by an order of magnitude.]

Why would it be irrelevant? There's probably infinite ways to code to the spec. Why should incarnation X be representative of all other possible implementation incarnations of a given approach "type"? For example, some algorithms are greatly simplified by use of associative arrays or tree arrays over positional arrays. Back in 1972, associative arrays were uncommon.

{Good heavens. The paper no more needs to show source code than a blueprint of a house needs to show individual nails.}

[It's irrelevant because it doesn't matter which of those infinite ways it's coded, as long as it's one of those infinite ways that matches the decomposition described.]

Please tell me that the following quote is NOT tied to actual code/language:

The formats of control blocks used in queues in operating systems and similar programs must be hidden within a "control block module." It is conventional to make such formats the interfaces between various modules. Because design evolution forces frequent changes on control block formats such a decision often proves extremely costly. [page 1056]

Back in 1972, it was common to include the record "shape" and/or schema in each using module or routine. We generally don't need to do this in modern systems. We can address only the cells relevant to the routine if need be. And database "views" can hide some of the more extreme schema changes.

[Sure thing. That quote is NOT tied to actual code/language. It's tied to a property that a particular piece of code may or may not have. With regards to the rest of what you said, you write code that doesn't have that property. Good for you. Doesn't change the consequence of writing code that does have that property. Therefore, that too is irrelevant.]

[[Persons attempting to understand the above should see TopMind.]]

How about if you have a question about the above, please ask. Jeez!

[What makes you think double square brackets had a question?]

AdHominem attacks are unprofessional.

{It is telling that you consider a simple reference to your chosen nom de guerre to be an AdHominem attack.}

The paper fails on the grounds of GoodMetricsProduceNumbers. For example, it commonly uses the pattern of "if you do X, then Y is easier, but harder than if you did Z" and so forth. However, it says nothing about the probability of X. It matters little to the real world whether X makes Y easier or harder if Y rarely happens. If it's free to avoid something, then please do it, otherwise use the higher probability scenarios to decide.

Even if they used "guessed" probabilities based on surveys, it would have more ties to the real world. And readers could plug in numbers that better fit their domain if need be if formulas or probability trees with variables are given. Probability is paramount to deciding between design options when maintenance is a key focus, and many keep ignoring this. SoftwareDevelopmentIsInvesting.

OOP fans often use this paper to suggest that one should wrap all access to data-structures in OOP or at least ADT accessors on the grounds that it "hides implementation". But what if implementation is not likely to change that often (it depends on domain)? Accessors can add code bloat and code bloat slows down reading and changing of code in many cases. It creates eBureaucracy. Thus, if the probability of changing the underlying implementation is low, then the "bloat tax" overrides the benefits of reducing the cost of implementation overhaul. Plus, what seems like an "implementation" is often really an interface. If one uses SQL to talk to the "data structures", that's not an implementation, it's an interface. And, it's one that's more flexible than OOP accessors, which often end up reinventing CollectionOrientedVerbs and expressions for each of the domain nouns involved, which is a violation of OnceAndOnlyOnce. Accessors are just not powerful enough to be flexible. A bigger concept is missing from ADT's and OOP encapsulation. It may be cheaper to reinvent the module wheel than reinvent collection idioms. In 1972 they didn't really know how to package common collection-oriented idioms. In that scenario, wrappers may have made sense. But again, the real answer to that and related claims lies in the actual future change requests. But since we cannot know the future when designing, it's all back to probability (above), and this paper does diddly squat to address that.

Conclusion

We have tried to demonstrate by these examples that it is almost always incorrect to begin the decomposition of a system into modules on the basis of a flowchart. We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others. Since, in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing. To achieve an efficient implementation we must abandon the assumption that a module is one or more subroutines, and instead allow subroutines and programs to be assembled collections of code from various modules. [pg. 1058]

Yes, "likely to change" is important, but they didn't prove what's likely to change, and/or tied their estimate to a specific implementation and/or language, which the reader can't see. Show the actual change in the actual code and the customer request scenarios used to estimate probability, and I might perhaps start to agree with their faulty generalization.

[It's not surprising they didn't prove what's likely to change. What's likely to change, itself, undergoes changes. By not tying their conclusion to a particular set of changes, or a particular implementation, they can produce a conclusion that is actually meaningful. If they had tied it to a particular implementation, then it would only apply to a particular implementation. Since I'm unlikely to ever need the exactly that implementation, it would be unlikely that the result would ever apply.]

I only half agree. If your "implementation" allows you to add new features or behaviors with minimal effort, then there may be net benefits to it. For example, if your "implementation" is a relational database (assuming for the moment that is even an "implementation"), then you can fairly easily get new collection-oriented behaviors/idioms (COBI) with very little new coding. If you use pure encapsulation and/or ADT's, then you could possibly have to reinvent many COBI's from scratch if you swap implementation.

One may have to do say 10 times the coding, for example, to switch from SQL as the primary underlying processing mechanism to C positional arrays. Most businesses with custom software are NOT going to go down that route. Obsessing on swap-ability is meteor insurance. If you are using a high-level language and it's working well for you, it's stupid (poor resource balancing) to dump it for a lower-level language and/or wrap everything behind methods/ADT's just "in case" you swap in the future. See DecisionMathAndYagni for ways to apply estimates to such.

In other words, the power of sticking to a powerful and flexible implementation may exceed the benefits of implementation swap-ability. The encapsulation/ADT crowd often overhypes the benefits of implementation swap-ability as a general goal. I agree it is a characteristic to strive for, but not necessarily a top priority for many domains. The only way to determine that is through informed domain analysis and trade-off weighing.

And, 1972 didn't have enough high-level tools available for this to be an issue to consider. - t

[[Do you know why we call this paper a seminal paper? Do you think that the lack of tools in 1972 may be explained by the practices that Parnas is critiquing? If not, then how frustrating it must be for you. I take great satisfaction in that observation.]]

Who knows? An alternative universe doesn't exist to test. Given linked lists and positional arrays as one's only real choices for collection and attribute management, it's probably a step up. I applaud the paper for presenting alternatives to consider, but the actual selection requires (at least) economics and investment-like decision tree analysis. (Such trees are also found in GameTheory.)

[[So trees are not collections, hash tables are not collections. bit vector sets are not collections... Wow! Just how do you think RDMS's are implemented? On second thought, let's just call that a rhetorical question...]]

The change patterns for SystemsSoftware are generally different than custom applications software. As far as this paper, we don't have enough info to know who the target audience is. Is it a kit to be sold/used by hundreds of different organizations? Is it a single in-house project? Will it be boxed and sold to thousands? - t

[[We know that you are not the target audience - it's a research paper. We know that your failure to generalize is debilitating beyond redemption, so why do you insist on misinforming others? Can't you let the rest of the world go on its merry way without your interference? Does it bother you so much that others are quite capable of reading understanding and applying the lessons of Parnas?]]

No no no, I meant the target audience of the word-processing system described in the paper, not the target audience of the paper itself. I don't understand how you confused the two because the context was a software product. And I'll pick probability trees over Parnas any day. They are closer to the ideal of GoodMetricsProduceNumbers. Parnas was too chicken to get into number-land. And I am not entirely dismissing the paper. It's good as a design suggestion candidate, but NOT as a universal rule. Absolute claims require stronger evidence. - t

[[For the love of god NO SUCH CLAIM IS MADE This is the purpose of the paper in His own words.

Usually nothing is said about the criteria to be used in dividing the system into modules. This paper will discuss that issue and, by means of examples, suggest some criteria which can be used in decomposing a system into modules. Page 1 col 2 Par 3 under the Introduction.

That's it!!!]]

{Your accusation that "Parnas was too chicken to get into number-land" might have some weight if you were known for your rigorous statistical analyses, especially if they regularly punched holes in both intuitive and logical conclusions. As you're not, your claim is naught but hot air. Parnas is demonstrably worth reading. You are not.}

It's not my burden to put numbers on Parnas' claims. As far as my claim that excessive reliance on wrappers can make a system less flexible due to the anti-collection-orientation of API's, I'm merely putting it out there as something to ponder, consider, and discuss, similar to what Parnas did in the paper. If you instead treat Parnas' paper like a Bible or Koran, then we won't get anywhere, will we?

And again, I couldn't put numbers on the probabilities without knowing the environment of the software. ItDepends. As far as systems I've built and/or maintained that used an RDBMS as the primary "data structure", the probability of tossing an RDBMS and going to something like files or linked lists is roughly around 1% per year. Switching to another vendor's RDBMS is roughly 4% per year (and heavy wrappers would have resulted in insignificant reduction of code change anyhow because of the high number of single or low-use SQL sections, if reasonable factoring of repeating concepts was already done.). And, I do apply FutureDiscounting to my decision calculations at a rate of around 10% to 20% a year, depending on the biz at hand. - t

{No one is asking you to put numbers on Parnas's claims. We don't need numbers to grasp the simple way he points out a strategy that is intuitively obvious once it has been clearly articulated. However, it would give your claims much more credibility if you were to practice what you preach. How many times did you find "excessive reliance on wrappers" or "anti-collection-orientation of APIs" anywhere in his paper? How do you quantify "excessive"? How did you obtain "3%", "10%", "20%" and "1%"? No, don't answer that -- I know where you got them: You pulled them out of your ass in order to produce a thin (and meaningless) veneer of quantification.}

{And why would use or not of DBMSes have anything remotely to do with the conclusions of Parnas's paper? It's about modularization in general, and the concept is as obviously applicable whether a DBMS is used or not. DBMSes are irrelevant here.}

Well, it's not obvious to me. The API's are too low level, making one likely have to reinvent collection-oriented idioms. Any savings from "hiding the implementation details" is possibly counteracted by having to reinvent those idioms. That is obvious to me. Collection-orientation is power, and 1972-style API's are not going to yank that power from my cold, dead fingers without numerical justification. API's and wrappers are often just too damn primitive. The paper is fighting over (comparing) 1960's versus 1970's technology when I'm familiar with 1980's technology that kicks both those decades' asses. It's comparable to a paper (tablet?) that compares sticks to rocks for starting fires. I gotta fucking cig lighter here, dude! Well okay, slight exaggeration; a book of matches. - t

[[The only reason you have your lighter matches whatever is because of the principles described in the paper. This is a fact Interface does not mean API. Declarative languages are not possible without Modularization. Orthogonality follows nearly directly from it. You are in fact not familiar with anything other the first tool you learned to use, and you will stay that way god bless. Meanwhile why are you pissing all over this WIKI? I promise no one wants you learn anything. We just want to be able to learn from people like Parnas. Your only impact is obstruct that goal- you have no hope of debunking anything and will not repeal all of computer Science, so it must be that this is your purpose to obstruct others. Now how do you think we should deal with that? Let me guess --- you think we should let you piss where you please and nominate you for a noble prize for your insight.]]

Re: "The only reason you have your lighter matches whatever is because of the principles described in the paper." - Like I mentioned above, SystemsSoftware has a different change profile than custom software. Parnas' suggestions may indeed be applicable to building an RDBMS engine (such as Oracle's) from scratch. But that doesn't necessarily extrapolate to software that uses an existing RDBMS. PickTheRightToolForTheJob. You are implying a universal truth.

And Parnas' paper has not graduated to real science yet. It's only "interesting speculation" at this stage. GoodMetricsProduceNumbers and Parnas has none. Show specifically using representative scenarios outside of SystemsSoftware how it reduces finger steps, eye steps, and/or brain steps. I'm just the messenger. Insult the messenger all you want, but the fact reminds the science hasn't hatched it yet. And database-related tools were not my first tools. You got that wrong too, insultboy.

[[No one cares. We have addressed at length what Parnas is trying to do and why. You know nothing about science. You know nothing about Art. You serve no purpose here. If that really insults you can change your behavior. Since you don't you clearly are not insulted, and in fact are proud of your work product. Maybe you should self publish a book so others who fully appreciate your genius will have a chance to worship you. Lord knows lack of content and accuracy will not impede your efforts and you may even win a real book contract. You are here to badger an audience you could not gather on your own merits. This makes you a Troll. Period.

There is one thing that you have taught me -- the wiki experiment is not viable in the face of full open access. This is a lesson that Usenet has long taught and I fear the implications. The wiki form is even more vulnerable as it is impossible to filter. In future all successful wikis will require access control in the same way that all doors require locks. I hope you are proud of your accomplishment. I am curious as to how long this wiki will last as the Troll of admission increases.]]

Only when you pass along someone else's opinion are you "just a messenger". At the moment, your words make you a liar.

[[This is only true if he does not believe what he is saying. I am of the opinion that he does.]]

[Another really sad aspect of top's rant here is that Parnap only mentioned one metric, and he presents all the information necessary to easily produce the numbers for that metric in the paper.]

[[There are no sad aspects, only tragic consequences for the purpose and goal of the wiki concept without access control. All that is necessary for this to become TOP's wiki is for the community to move on, which is why real communities die as well. Eventually as a neighborhood declines they with means to do so move on. It is ever thus. I still hope that the grand experiment can continue -- but I would not bet in its favor.]]

Resorting to insulting Top doesn't help.

It's done out of wiki tradition. -- top


Domain-Specific Language

The second half of the conclusion that talks about order independence probably refers to parts of the paper that give hints of creating a DomainSpecificLanguage (such as an API), which I generally agree with to a point. The DSL in this paper resembles a text processing API.

However, if the DSL involves heavy use of collection-oriented idioms, then I tend to leave that to the database query language. This way, I don't have to create APIs around bunches of collection-oriented idioms to avoid the kinds of problems mentioned above.

But unlike encapsulation purists, I treat DSP/API's as helpers, not wrappers.

Note that I sometimes mix "pure" wrappers and direct database access. For example, if we only want to do something with a limited set of paragraphs in a text processor, then we could have:

  doFoo(glob, bar, "x > (a + b) and z <> 7");
The third parameter is a portion of a WHERE clause. This way I don't have to create a formal accessor for many variations of the "doFoo" process which only vary by the filter criteria. (Some have argued for single-use accessors for every filter combo on this wiki, but I couldn't re-find that debate.)

-- top


On Implementation Hiding

I'd like to point out that making the implementation flexible (easy-to-change) is more important than "hiding the implementation" in my observation (in my domain at least). The value of hiding the implementation is often greatly exaggerated. It is a character to strive for, all else being equal, but not at the expense of implementation flexibility. - t

That makes a lot of sense if you're writing, say, business reporting applications. They're typically small, relatively simple, usually maintained by one individual, and undergo almost constant change that needs to be implemented yesterday. In that case, the syntactic infrastructure used to maintain implementation hiding is mostly an obstacle with little value. At least, it is in your own code. In reusable libraries and the like, it ensures that implementation internals can be improved from version to version without the risk of breaking your code.

It doesn't make sense if you're developing, say, large-scale enterprise applications that involve dozens of developers and complex functionality that is relatively stable. If you expose implementation details, there's nothing to prevent your fellow developers from placing dependencies on your internal machinery. That makes it difficult and error-prone for you to make changes.

And an RDBMS can often serve as the interface between modules for bigger systems. One way to avoid the pitfalls of big applications is to not make big applications. DivideAndConquer. Use the Nile model: small villages around the Nile that trade and communicate via the Nile. The DB is the "interface spec" for each app group. This will avoid a lot of the collection-orientation idiom interface repetition common with ADT-style API's.

The interface then becomes more substantial, focusing more on the domain instead of setA, getA, deleteA, sortA, findA; setB, getB, deleteB, sortB, findB; setC, getC, deleteC, sortC, findC, etc. That is an ugly, bloated, busy-work style. Kill it, don't promote it. (See InterfaceFactoring.) Parnas opened Pandara's Repetitious Box. He should be slapped, not praised. (True, it may be useful for building the base of higher-level abstractions/tools, such as RDBMS, but to extrapolate has been a mistake.) And it's a lot easier to debug a domain model in the RDBMS than in RAM because you can use existing tools to study, sift, and report on it. Encapsulation generally gets in the way of that.

As far as your comment on scaling; the documents, lines, words, characters in Parnas' example are all data structures, or at least can be represented as data structures. If we are going to have millions of those with hundreds of workers working on them at the same time, I'd much much much rather manage them all with a RDBMS than with RAM ADT's. (Me and power users.) - t

That being said, the OO/ADT style does have it's place, but that place is not everywhere.

FacePalm

Personal insults? That's all you got? Slacker.

Troll.

Name-calling. Nice. All that education and all you have is name-calling? You are just table- and C.O.P.-ignorant. Parnas is obsolete. Use numeric metrics to prove I'm a troll, or shut the [bleep] up, you arrogant bell-bottomed 70's-hugger. I'm confident a document processor built with a RDBMS would score equal or higher on most representative numeric maintenance metrics thrown at it compared to the ADT/API/OO version.

This is largely because new requirements can use the existing collection-oriented features of the DB to avoid having to reinvent them, or at least simplify the creation of them for each activity. If it's a one-off need, for example, you can often just write a throw-away query in one sitting. The encapsulation-heavy approach would usually be much more coding via explicit loops and explicit conditions and explicit marshaling (see ViewingAlgorithmsAsCollectionProcessing). This will be more code per new requirement, knocking the score over. Yes, encapsulation will help with certain things, but not near enough. It results in a meta language or intermediate language that is just too low-level, resulting in too much nitty-gritty work.

And the DB approach could be used with multiple languages, something behavior-centric paradigms have problems with.

You, my friend, are afraid of science and praise ArgumentFromAuthority. That would make YOU the troll, not me. Kill me with science and numbers, not insults.

If you think I am wrong, is it that you think my pet techniques would score lower, or that the metrics I propose are not sufficient to gauge practical utility? If the second, would you toss the metrics out, or suggest others in addition to them? - t
See EncapsulationIsNotInformationHiding, PerceptionOfChange, OoConflictsWithCollectionOrientation, TopMind
CategoryEncapsulation?, CategoryInfoPackaging
SeptemberTen

EditText of this page (last edited October 6, 2010) or FindPage with title or text search