Rocket Analogy Problem

This is about a "rocket" analogy I sometimes use to describe a possible relationship between theorists, practitioners, and/or users. However, there seems to be a problem translating this analogy into software engineering terms for yet-identified reasons. A rocket is used here because it is a complex tool with lots of parts, not unlike software.

Suppose you are the CEO of UPS (package delivery) in the year 2070 and are trying to select a rocket brand to deliver packages (unmanned) to the new space colonies in orbit around Earth and on the moon.

The UPS managers and staff are not rocket experts. However, they created a list of criteria that will be use to evaluate the rocket brands. These criteria include rocket price, speed, fuel economy, accuracy, safety, pollution, etc. (There's already a more-detailed list somewhere on this wiki. ToDo: find it.)

Even though the UPS staff are not rocket engineers, they can measure the effectiveness of rockets without having to be rocket engineers and theorists. This is largely because most of their criteria either use numbers or could be restated to use numbers which can be verified without having to be rocket experts. GoodMetricsUseNumbers.

The numbers are generally related to issues that are important to UPS. If the rockets are not accurate, then packages are late or lost, for instance. You don't have to be a rocket scientist to know that your ship never arrived at or near the space-port. If they have to hop on their space scooter to go find the off-target load, they can use their GPS odometer to measure how far away it is. If the GPS is broken or not reliable, then the fuel costs needed to retrieve the way-ward load can be the metric. The emphasis here is that the metrics are close to the heart of the domain.

If a rocket pollutes too much, it may hurt UPS's public image. UPS doesn't need to be rocket scientists to check for pollution. They can buy detection kits and/or hire environmental specialists.

True, it would probably be helpful to have a rocket expert on staff; and in practice a big company would probably hire at least one. However, because rockets are expensive, management is still going to want objective criteria, such as the numeric metrics listed. There is still going to be a "communications interface" between the rocket experts and the decision makers.

Use the analogy of a shipping company much smaller than UPS that cannot afford an in-house rocket expert if it simplifies the scenario for you. Or consider a company that uses rockets on a limited basis.

I don't see why software should be different from rockets in this regard. Yet some imply it is somehow different. I don't get it. If somebody has a personal or pet theory that says that language/tool/paradigm/technique X is "better than" other language/tool/paradigm/techniques (LTPT), they should be able to present their case in terms of numerical tests that LTPT users can relate to.

The GoldenHammer "vendors" shouldn't expect the LTPT user to become a theorist (general or an individual's pet theory). Yes, it would be nice, but if we apply the rocket scenario, it is not realistic. Many of us merely use "rockets", we don't build them.

If a candidate rocket company told UPS, "forget such numeric metrics, just trust us because we are far smarter than you about rockets", they'd be shown the door pretty quickly.

A similar situation would happen if the metric was too far removed from UPS's direct needs/business, even if it was numeric. "Trust us, the Heisenbooti Zero Load Zenith Vibration metric is the best metric there is".

It's true that the weights on each measurement may be subject to debate, but we are not even at that stage yet. For example, how much rocket pollution may affect UPS's public image and/or how much a bad public image on pollution would affect their shipment sales may be highly debatable. However, at least the measurements give a starting point for that discussion. Different customers may apply their own weights based on their experience or personal preferences. It's not really the rocket-builders job to tell the customer which factors are more important. (If they do, hopefully they have a clear justification beyond "just trust me".)

So, what's going on with software engineering?


"Perhaps the fact that it's difficult to perform repeatable tests with humans in the mix - you can't be sure that Test Group A has the same skillset and levels as Test Group B. Since programming has social aspects, a team of smart jackasses using Methodology A might be outperformed by a team of cooperative average guys using Methodology B" --PeteHardie

I have suggested that WetWare plays a large role in software design success, but have been sharply rebuffed by some, who appear to believe that the "truth" of software engineering is largely outside of the human mind, that is, principles of "pure" math and logic. See ArgumentByElegance. Software is written and read by humans, not Vulcans, gods, nor math robots (assuming the alternatives produce the same or domain-equivalent output such that we are not counting the CPU). --top

PageAnchor: End_Users

Saying a layman can compare the end-product (rockets) does NOT imply he can compare (or even grok the differences between) the production methodologies (rocket-building-tools). Maybe you can compare end-product software without being a software engineer. That doesn't mean you'll grok the relevant distinctions between languages, tools, paradigms, or techniques. You overgeneralize your ability to have an opinion on rockets to assert you will develop a reasonable opinion on LTPT if given good information. That is fallacy. To understand the information, you need to get an education, TopMind. You aren't bright enough to fake it.

This does bring up the issue of who will be doing the measuring. Or, more specifically, from whose perspective will we be creating metrics from? The end-users could count bugs, time to develop/fix, wages and/or hours spent by developers, etc. And it doesn't take a PhD to count source code bulk, if we do include that metric. (Although there can be disagreements over how to score code size, it's probably only an issue if there is a big difference. And it can be explored deeper if it's the final pivotal factor.) I don't believe there is a need to get into the realm of the developer's world at this stage. Measuring can still be from the end-user's perspective. After all, they are the ones paying for the software. Thus, your point is so far moot. We can still explore the developer's perspective as a convenient shortcut, but that's still not true road-testing. No matter how complicated the tool, the tool user still knows what they want as far as the outcome. True, sometimes consumers are oblivious to certain factors, but decent research can clue them in that it may be an issue. But ultimately they assign weights to the factors. We can create a scoring sheet with the scores of the metrics and leave the weights up to each consumer/user. The weights may be zero for some items. It's not up to the rocket builders to say that rocket pollution is less or more important to UPS than fuel cost, for example. -t

If a rocket engineer gets a specification for a certain payload, vibration, peak acceleration, pollution, noise, performance, cost profile, that engineer can select from among models and methodologies that allow him to effectively predict and control these properties. Of course, that control doesn't necessarily result in an optimal rocket - in part because the specification may be ridiculous (users want magic ponies), in part because BadCodeCanBeWrittenInAnyLanguage (a good tool in inexperienced hands), and in part because the best models/tools might be inadequate for the peculiar case (requiring some R&D and further model development - rocket science). This prediction and control is the property that matters when comparing LTPT. You can't compare LTPT by the end product except insofar as it is part of the end product (e.g. for user-extensible systems, continuous maintenance, runtime upgrade). Why would an educated and intelligent engineer pick one LTPT over another? because it lets that engineer more effectively predict and control properties relevant to the end-user that the other does not. The end-user's specification is relevant because there is no universal "better than" ("better than for what?" any intelligent engineer would ask). But there is no direct relationship between the end-user product metrics and the tool user's LTPT metrics. It doesn't make sense to compare rocket-building methodologies in terms of end-user specification metrics. To do so is a category error.

You have changed the scenario. We are not comparing custom software to custom software, but rather existing LTPT's, such as FP, OOP, bag-touples, set-touples, etc. -t

I described the actual scenario. Your scenario is non-sense based on an overgeneralization, ignorance, category error, and WishfulThinking.

My scenario is generally based on existing technology, or at least does not require anything revolutionary. If it's too difficult for you to make the leap, let me know and I'll search for a different scenario. -t

You are mistaken about your own scenario, TopMind. The 'bases' for your scenario are your ridiculous belief that "better than" has no context, your WishfulThinking that LTPT are compared in terms of end-user criterion, your overgeneralization that ability to compare rockets means ability to compare other things, and your fantastic ignorance at anything related to logic, reason, or computer science. Any references to existing technology (or even the revolutionary sort) are just red herrings, poor attempts to distract everyone from your lunacy.

I didn't say it has no context. I suggested the customer determine the context. The developers/constructors are too biased. They want stuff to make their job easier and/or pay more, but it is not necessarily the same as what the owners want. (That would be an interesting topic in itself. See AreWeBiasedTowardLaborIntensive.)

And no, you don't use logic and reason to show how X is more economical or better; you use questionable "evidence devices" such as ArgumentByElegance or long chains of (questionable) logic. GoodMetricsProduceNumbers, but your round-about logic is not numbers. If you want to attempt numbers for developer-driven metrics, though, I'd be happy to take a look. If you don't believe in numbers all-together, then we have an irreconcilable view of science and evidence. Which is it? -t

Why is Software Engineering somehow special that it doesn't "need" numbers-based metrics to claim a GoldenHammer? It does not get a Get-Out-Of-Science-Free card from the Monopoly board game. It is not a special or exempt discipline! It has to ride the long yellow bus just like the rest of the disciplines and deliver number-based tests. It behooves me why some of you think ArgumentByElegance or long chains of (questionable) logic are sufficient to claim GoldenHammer. Nonsense! --top

[You appear to think that logic is a metric. Where in the world did you come up with that? In addition, what makes you think that the other engineering disciplines don't have the same issues when they discuss their design methodologies? I can recall, when I was learning computer engineering, some discussions over various methodologies for improving the testability of the circuits. It wasn't that different from some of the discussions seen here. I suspect the biggest difference is that they don't have those discussions where anyone whose soldered a circuit board can join in. They're a little more exclusive than that.]

Where did I say that logic was a metric? I will agree that it usually can be "turned into" numeric metrics, or at least a practical approximation.

As far as being exclusive, creating a complex WalledGarden GoldenHammer claim does not make you immune to numbers. As far as circuit boards, I don't know enough about circuit boards to comment, but surely it would be possible to create metrics over testing times under different scenarios. If you get a set of scenarios that both sides agree is a decent representation (test case sample), then they can be tested against the time-to-test metrics. "If X broke, this design would take Y minutes to diagnose." The score sheet would look something like:

  1...|Bad Encoder..|...15%...|..12min...|30min.|.18min|
  2...|Bad Capacitor|...10%...|..22min...|18min.|..9min|

(Dots to prevent TabMunging)

The times are then multiplied by the frequencies, and we then get a score for each scenario. Of course there may be other metrics besides repair time, such as equipment cost.

Typical procurement documents and contracts used in the industry use "feature scoring lists" (for lack of a better term) of some sort with desired features, and weights and/or priorities. They are often called "deliverables", and for the rocket case we'd have the deliverables already mentioned above such as price, speed, fuel economy, accuracy, safety, pollution, etc. The owner(s) of the company generally gets to put the weights on each of the deliverables.

Software contacts are not much different. Engineers may have a say, but ultimately the owners pick and decide.

It's true there are things like database design, OS dependencies, etc. that should be considered for software, and these will generally require a technician to evaluate, but you'll often get a different answer from different technicians. And from my experience the owners or end-users generally ignore these if they like the user interface and/or kind of reports it generates, because that's what they look at. Expert opinions are considered second-class citizens to their preferences. (Perhaps in a close call, an expert opinion will sway them.)

In capitalism, the buyer is sometimes naive above about certain things, but we generally consider this a cost of competition. The Soviet-style was to have mostly technicians and experts sit around and think of the "most elegant" design, make it, and then pretty much force it on the users (largely because they have no choice). But this didn't work so well for most things. (DesignByCommittee, which is similar, has a shaky track record. Committees are fairly good at generating ideas, but don't seem to glue them together very well.)

The "naive buyer" problem thus appears to be a smaller problem than lack of competition. Capitalism also provides variety so that users can pick among multiple options. Sometimes "getting features right" is merely a happy accident, not from grand thinking. (It's roughly analogous to mutation-and-selection in evolution.)

In summary, end users and/or buyers calling the shots appears to be the most successful selection technique even if it's not perfect in all aspects (hopefully with advice from experts). Expert-first tool selection is generally inferior. If this news pisses off experts, so be it. Reality can be a b*tch.


That appears to WishfulThinking (particularly about "Soviet-style") and an example of one possible (probably problematic) project structure, rather than pervasive reality. There are indeed projects where all decisions are made by non-technical management (e.g., "We're a Microsoft shop, so we're using Visual Basic and nothing else!") but these are either pathological cases or relatively simple, small-scale, non-innovative projects where such decisions don't matter much. Successful projects, especially complex technical ones, can be killed by misguided edicts from above. I've seen a project killed at the starting gate by a naive boss's requirement that a complex, heavy-load, multi-user application be implemented solely in MS Excel. I saw another killed by a dictatorial manager's insistence that a new, small but business-critical, professionally-built application underpinned by a properly-normalised relational database -- about fifteen or so tables -- be re-written to use no more than one un-normalised table.

Healthy projects necessarily involve a collaboration of expertise, where domain experts state domain requirements, hardware experts weigh in on hardware, and SoftwareEngineering experts provide expertise on software engineering tools, and there's no expectation that (for example) the hardware experts should dictate choice of programming language, or the domain experts should dictate choice of programming paradigm -- beyond whatever effect one might have on the other, in light of the requirements -- because that would be outside their area of expertise. Choices then reflect a rational consideration of the best way to meet the requirements, as decided by collective expertise. There is some danger of DesignByCommittee, but the solution is to have a strong committee leader with some expertise in all areas. It's why healthy technology companies typically make technically-knowledgeable and technically-experienced people responsible for significant technical decisions.

I have already agreed that collaboration is a good idea. However, somebody will need to make the final choice.

The reliability of Excel-built "databases" can be measured. If they ask for evidence that Excel is unreliable as a database, one hopefully will be able to provide that beyond "just trust me". We need to make a distinction between "measured" and "measurable". Is the reliability of Excel measurable? Yes. Has it been measured and made publicly available? Not that I know of. Now if there is no existing study(s) to answer that question, then anecdotal evidence may be the next best thing. Sometimes it's not economical to get objective studies, even if it's possible to produce such, and going with anecdotal evidence is the most economical option.[1]

The EvidenceTotemPole would look something like:

Ideally one checks with multiple unconnected experts or similar companies/projects to get a variety of opinions. Asking about similar projects from other departments and/or orgs, both successful and unsuccessful is also a good way to get a feel for what works and what doesn't, and what to watch out for.

Sometimes the customer/owner is just stubborn, and they get the GIGO they deserve. Almost every executive probably at least once tried to get a Lexus-level project for a Yugo price.

I had a similar issue recently where a department boss didn't want to pay for converting from MS-Access to a real RDBMS for a fairly big Internet project because they wanted to spend the money on some other executive toy. I made sure my recommendation against staying with MS-Access was clear, as CYA, but in the end there's nothing I could do. It turned out that when something went wrong, he/she was pretty good at blaming other people, for they were in marketing: spin experts. Thus, he/she probably planned that if MS-Access did choke, they had a blame plan ready. However, after too many stupid decisions they couldn't cover with blame they finally got shuffled to somewhere else where they are not in charge of such decisions. Eventually idiots self-filter themselves, but the initial rides are a pain.

I agree that GoodMetricsUseNumbers, but sometimes measurement of ludicrousness is a pointless effort. Some bad ideas are akin to demanding chocolate hammers or car engines made of wood. We don't need to do stress analysis on chocolate hammers, or performance profiles of wooden engines, to reject them. Every aspect of trying to use Excel as a multi-user DBMS is similarly doomed to failure; so much so it will quickly break in testing if you can even be bothered to take it that far. Having to go beyond the obvious and generate numbers to support that view is evidence of managerial mistrust of IT expertise, which unfortunately does happen. I'm not sure (though I can easily speculate, but won't) why some senior managers will trust their auto mechanics, doctors, plumbers, accountants, lawyers or carpenters implicitly, but will second-guess or contradict their IT experts.

Also, some self-evident IT truths are very difficult (read: expensive and time-consuming) to turn into trustworthy metrics. For example, we know that reducing explicit state in programs -- whether through discipline in imperative programming, or use of idempotent operations and pure functions without side-effects, or use of ValueObjects, or all of the above -- makes them easier to write and more reliable because they're simpler and more provable. That's essentially obvious and uncontroversial, but demonstrating it via numerical metrics is very difficult.

This gets back to our usual "WetWare dispute" as seen in ArgumentByElegance and GreatLispWar. The "traditional" approach just better maps to how existing humans think. If enough architects and developers with your opinion are out there, form a company and kick everybody else's ass: IfFooIsSoGreatHowComeYouAreNotRich. Even elegant ideas should be field-tested.

Actually, that's your usual "WetWare dispute". I derive some amusement from debating with you about it, but otherwise I don't care. And you made a mistake, above: You wrote, "the 'traditional' approach just better maps to how existing humans think" when you clearly meant to write, "the 'traditional' approach just better maps to how I think."

I did form a company and kick everyone's ass. I started out creating alternatives to ExBase that were much, much better than ExBase and used them to competitive advantage. I retired in 2000, emigrated to another country and went into teaching and research.

This is perhaps similar to WebStoresDiscussion. A small group masters the ability to leverage high abstraction to beat the competition. However, as time goes on, the approach fails to catch on with the next generation of developers and/or large teams, who are either lower-skilled or mixed-skilled. One theory is that under ideal conditions or happenstance, the original small group is coincidentally or self-filtered-wise like-minded, and thus relate to each others style and code well. It's often the case that techniques that work for small tight-knit groups don't scale.

Sounds good to me. If it's a competitive advantage that relatively few of us can leverage, that's excellent. I hope tools that facilitate it for us continue -- as is the trend -- to be increasingly prevalent.

You haven't proven it a repeatable experiment (especially the team side).

I haven't proven what in a repeatable experiment?

That one can take a group of random HOF/FP or high-abstraction fans, throw them onto a team, and have instant success.


[1] Essentially, the cost of failure (risk) from having imperfect information (anecdotes only) is estimated to be lower than the cost of getting a good study done.

See also TraitsOfGoodScientificEvidence, BookStop
CategoryMetrics, CategoryEvidence, CategoryMetaphor, CategoryEconomics

View edit of August 7, 2014 or FindPage with title or text search