Rocket Analogy Problem

This is about a "rocket" analogy I sometimes use to describe a possible relationship between theorists, practitioners, and/or users. However, there seems to be a problem translating this analogy into software engineering terms for yet-identified reasons. A rocket is used here because it is a complex tool with lots of parts, not unlike software.

Suppose you are the CEO of UPS (package delivery) in the year 2070 and are trying to select a rocket brand to deliver packages (unmanned) to the new space colonies in orbit around Earth and on the moon.

The UPS managers and staff are not rocket experts. However, they created a list of criteria that will be use to evaluate the rocket brands. These criteria include rocket price, speed, fuel economy, accuracy, safety, pollution, etc. (There's already a more-detailed list somewhere on this wiki.)

Even though the UPS staff are not rocket engineers, they can measure the effectiveness of rockets without having to be rocket engineers and theorists. This is largely because most of their criteria either use numbers or could be restated to use numbers which can be verified without having to be rocket experts. GoodMetricsUseNumbers.

The numbers are generally related to issues that are important to UPS. If the rockets are not accurate, then packages are late or lost, for instance. You don't have to be a rocket scientist to know that your ship never arrived at or near the space-port. If they have to hop on their space scooter to go find the off-target load, they can use their GPS odometer to measure how far away it is. If the GPS is broken or not reliable, then the fuel costs needed to retrieve the way-ward load can be the metric. The emphasis here is that the metrics are close to the heart of the domain.

If a rocket pollutes too much, it may hurt UPS's public image. UPS doesn't need to be rocket scientists to check for pollution. They can buy detection kits and/or hire environmental specialists.

True, it would probably be helpful to have a rocket expert on staff; and in practice a big company would probably hire at least one. However, because rockets are expensive, management is still going to want objective criteria, such as the numeric metrics listed. There is still going to be a "communications interface" between the rocket experts and the decision makers.

Use the analogy of a shipping company much smaller than UPS that cannot afford an in-house rocket expert if it simplifies the scenario for you. Or consider a company that uses rockets on a limited basis.

I don't see why software should be different from rockets in this regard. Yet some imply it is somehow different. I don't get it. If somebody has a personal or pet theory that says that language/tool/paradigm/technique X is "better than" other language/tool/paradigm/techniques (LTPT), they should be able to present their case in terms of numerical tests that LTPT users can relate to.

The GoldenHammer "vendors" shouldn't expect the LTPT user to become a theorist (general or an individual's pet theory). Yes, it would be nice, but if we apply the rocket scenario, it is not realistic. Many of us merely use "rockets", we don't build them.

If a candidate rocket company told UPS, "forget such numeric metrics, just trust us because we are far smarter than you about rockets", they'd be shown the door pretty quickly.

A similar situation would happen if the metric was too far removed from UPS's direct needs/business, even if it was numeric. "Trust us, the Heisenbooti Zero Load Zenith Vibration metric is the best metric there is".

It's true that the weights on each measurement may be subject to debate, but we are not even at that stage yet. For example, how much rocket pollution may affect UPS's public image and/or how much a bad public image on pollution would affect their shipment sales may be highly debatable. However, at least the measurements give a starting point for that discussion. Different customers may apply their own weights based on their experience or personal preferences. It's not really the rocket-builders job to tell the customer which factors are more important. (If they do, hopefully they have a clear justification beyond "just trust me".)

So, what's going on with software engineering?

--top

"Perhaps the fact that it's difficult to perform repeatable tests with humans in the mix - you can't be sure that Test Group A has the same skillset and levels as Test Group B. Since programming has social aspects, a team of smart jackasses using Methodology A might be outperformed by a team of cooperative average guys using Methodology B" --PeteHardie

I have suggested that WetWare plays a large role in software design success, but have been sharply rebuffed by some, who appear to believe that the "truth" of software engineering is largely outside of the human mind, that is, principles of "pure" math and logic. --top


PageAnchor: End_Users

Saying a layman can compare the end-product (rockets) does NOT imply he can compare (or even grok the differences between) the production methodologies (rocket-building-tools). Maybe you can compare end-product software without being a software engineer. That doesn't mean you'll grok the relevant distinctions between languages, tools, paradigms, or techniques. You overgeneralize your ability to have an opinion on rockets to assert you will develop a reasonable opinion on LTPT if given good information. That is fallacy. To understand the information, you need to get an education, TopMind. You aren't bright enough to fake it.

This does bring up the issue of who will be doing the measuring. Or, more specifically, from whose perspective will we be creating metrics from? The end-users could count bugs, time to develop/fix, wages and/or hours spent by developers, etc. And it doesn't take a PhD to count source code bulk, if we do include that metric. (Although there can be disagreements over how to score code size, it's probably only an issue if there is a big difference. And it can be explored deeper if it's the final pivotal factor.) I don't believe there is a need to get into the realm of the developer's world at this stage. Measuring can still be from the end-user's perspective. After all, they are the ones paying for the software. Thus, your point is so far moot. We can still explore the developer's perspective as a convenient shortcut, but that's still not true road-testing. No matter how complicated the tool, the tool user still knows what they want as far as the outcome. True, sometimes consumers are oblivious to certain factors, but decent research can clue them in that it may be an issue. But ultimately they assign weights to the factors. We can create a scoring sheet with the scores of the metrics and leave the weights up to each consumer/user. The weights may be zero for some items. It's not up to the rocket builders to say that rocket pollution is less or more important to UPS than fuel cost, for example. -t

If a rocket engineer gets a specification for a certain payload, vibration, peak acceleration, pollution, noise, performance, cost profile, that engineer can select from among models and methodologies that allow him to effectively predict and control these properties. Of course, that control doesn't necessarily result in an optimal rocket - in part because the specification may be ridiculous (users want magic ponies), in part because BadCodeCanBeWrittenInAnyLanguage (a good tool in inexperienced hands), and in part because the best models/tools might be inadequate for the peculiar case (requiring some R&D and further model development - rocket science). This prediction and control is the property that matters when comparing LTPT. You can't compare LTPT by the end product except insofar as it is part of the end product (e.g. for user-extensible systems, continuous maintenance, runtime upgrade). Why would an educated and intelligent engineer pick one LTPT over another? because it lets that engineer more effectively predict and control properties relevant to the end-user that the other does not. The end-user's specification is relevant because there is no universal "better than" ("better than for what?" any intelligent engineer would ask). But there is no direct relationship between the end-user product metrics and the tool user's LTPT metrics. It doesn't make sense to compare rocket-building methodologies in terms of end-user specification metrics. To do so is a category error.

You have changed the scenario. We are not comparing custom software to custom software, but rather existing LTPT's, such as FP, OOP, bag-touples, set-touples, etc. -t

I described the actual scenario. Your scenario is non-sense based on an overgeneralization, ignorance, category error, and WishfulThinking.

My scenario is generally based on existing technology, or at least does not require anything revolutionary. If it's too difficult for you to make the leap, let me know and I'll search for a different scenario. -t

You are mistaken about your own scenario, TopMind. The 'bases' for your scenario are your ridiculous belief that "better than" has no context, your WishfulThinking that LTPT are compared in terms of end-user criterion, your overgeneralization that ability to compare rockets means ability to compare other things, and your fantastic ignorance at anything related to logic, reason, or computer science. Any references to existing technology (or even the revolutionary sort) are just red herrings, poor attempts to distract everyone from your lunacy.

I didn't say it has no context. I suggested the customer determine the context. The developers/constructors are too biased. They want stuff to make their job easier, but it is not necessarily the same as what the owners want. (That would be an interesting topic in itself.)

And no, you don't use logic and reason to show how X is more economical or better; you use questionable "evidence devices" such as ArgumentByElegance or long chains of (questionable) logic. GoodMetricsProduceNumbers, but your round-about logic is not numbers. If you want to attempt numbers for developer-driven metrics, though, I'd be happy to take a look. If you don't believe in numbers all-together, then we have an irreconcilable view of science and evidence. Which is it? -t

Why is Software Engineering somehow special that it doesn't "need" numbers-based metrics to claim a GoldenHammer? It does not get a Get-Out-Of-Science-Free card from the Monopoly board game. It is not a special or exempt discipline! It has to ride the long yellow bus just like the rest of the disciplines and deliver number-based tests. It behooves me why some of you think ArgumentByElegance or long chains of (questionable) logic are sufficient to claim GoldenHammer. Nonsense! --top

[You appear to think that logic is a metric. Where in the world did you come up with that? In addition, what makes you think that the other engineering disciplines don't have the same issues when they discuss their design methodologies? I can recall, when I was learning computer engineering, some discussions over various methodologies for improving the testability of the circuits. It wasn't that different from some of the discussions seen here. I suspect the biggest difference is that they don't have those discussions where anyone whose soldered a circuit board can join in. They're a little more exclusive than that.]

Where did I say that logic was a metric? I will agree that it usually can be "turned into" numeric metrics, or at least a practical approximation.

As far as being exclusive, creating a complex WalledGarden GoldenHammer claim does not make you immune to numbers. As far as circuit boards, I don't know enough about circuit boards to comment, but surely it would be possible to create metrics over testing times under different scenarios. If you get a set of scenarios that both sides agree is a decent representation (test case sample), then they can be tested against the time-to-test metrics. "If X broke, this design would take Y minutes to diagnose." The score sheet would look something like:

  ID..|..Descript...|Frequency|Scenario-A|Scen.B|Scen.C|
  ------------------------------------------------------
  1...|Bad Encoder..|...15%...|..12min...|30min.|.18min|
  2...|Bad Capacitor|...10%...|..22min...|18min.|..9min|
  Etc...

(Dots to prevent TabMunging)

The times are then multiplied by the frequencies, and we then get a score for each scenario. Of course there may be other metrics besides repair time, such as equipment cost.

--top
See also TraitsOfGoodScientificEvidence
CategoryMetrics, CategoryEvidence, CategoryMetaphor

EditText of this page (last edited March 4, 2012) or FindPage with title or text search