Normalization Repetition And Flexibility Discussion

A continuation of NormalizationRepetitionAndFlexibility because that is getting TooBigToEdit.


They are sufficient when dealing with already formalized computational models, such as programming languages and CPUs. We know exactly how these things interact, down to the hardware level and up to the high-level programming languages - i.e. we can assert that the mathematical models of these are as close to perfection as any model can achieve, and those cases where it isn't constitute manufacturing flaws or compiler bugs.

Comparing hardware design to software design has always been problematic because they are different beasts. For one, design maintenance cost is usually a low-ranking factor in hardware, largely because the largest cost is manufacturing, not design. It is sort of comparable to software design in the late 50's: machine efficiency trumped code design by far. Model maintenance has since overshadowed model-execution-engine maintenance. This is not the case in manufacturing yet. It is still closely bound to the hardware. --top

The fact remains that software and programming languages are entirely defined and subject to complete and 100% correct mathematical analysis (internally, at least). [removed content-free rudeness]

Knowing this, we can, in fact, use math and logic for 80%+ of all observations. The only ones it can't touch are psychological and statistical observations over human users and politics (e.g. noting that humans lack perfect foresight, humans rarely both know exactly what they want and how to communicate it without wanting to later change their mind, and that humans make typographical mistakes and logic errors, and that every human ever observed does so - we've never, in fifty years, found a programmer that gets everything perfect on the first try, and noting that businesses prefer 'plug-in' programmers, and noting that most humans don't like learning many different languages creating StumblingBlocksForDomainSpecificLanguages). These observations are necessary to help shape policy and language-decisions, and they can't be diminished... but they also rarely need to be quantified (qualification is generally sufficient to make appropriate decisions - even if we DID quantify these things, there is simply NO place we could ever plug in the numbers).

If there is no dispute about the qualitative decisions, then of course there is no "problem" with qualitative approaches. But that does not help with disputes or conflicting choices.

Unless you can demonstrate or prove that quantitative analysis would do us one better (since you cannot just assume it) we are left only with the conclusion that disputes or conflicting choices are inevitable. What matters is that qualitative analysis gives us enough information to make a better informed decision regarding properties of the language, not that everyone agree that those are better properties. A language-designer, even with perfect information on rates of typographical errors and the relative number of logic resulting from creation of on-the-fly variables in dynamic scope, will be unlikely to make a better design decision than a language-designer who simply knows that such errors happen enough for people to complain about them.

Your own 'ideal' regarding the use of empirical observation in ComputerScience fields is one I consider entirely naive. You need to use more math and logic, and limit observational science to the places you can demonstrate it will help.

What is the alternative? Self-elected "experts" who dictate "standards"? (I doubt they would agree anyhow.) That is the dark-ages. You may like this state of affairs, but some find it primitive. We still have DisciplineEnvy.

No, the alternative is proper use of math and logic and qualitative analysis.

This partially means creation of tools and languages that allow better, cheaper, easier automated analysis of programs. While you and people like you squabble over such dark-ages nonsense as whether a change better 'optimizes psychology' or 'brings your programming closer to God', the hardcore logicians will create languages that provably make being 'correct' or 'safe' or 'secure' by objective, qualitative measures the 'easier' thing to do. ComputationTheory? analysis of EssentialComplexity can help tell us what it is possible to do easily and how close we're getting.

And, perhaps by SapirWhorfHypothesis, we'll eventually pull some generation of young students out of the mental quagmire that is the dark ages of computer programming. We don't know what 'optimizing psychology' means, so we can't touch it... but if it means making people think in manners more suited to solve the problems they're faced with, THAT we can do because we CAN prove that certain approaches to problems have better properties (efficiency, correctness, opportunities for error, ability to detect error or prove correctness once the solution is written, etc.) than others. Indeed, that's what mathematics is all about.

ACM has even shied away from software design issues because they are very difficult to quantify, generating a full mailbag for them. As software becomes a bigger part of our lives, finding better ways to measure "better" becomes more and more important. Otherwise, charlatans and inadvertent MentalMasturbation will rule the coop.

Ah, you mean the people that shout "I am great!" and offer advice or opinions that lack any reasonable backing and preach faith-based beliefs regarding the future of computing. People like yourself, perhaps. Indeed, having them rule the coop would be terrible. I imagine the metaphorical 'dark ages' would go on and on and on...

If you're willing to let your DisciplineEnvy motivate you, perhaps you should stop thumbing your nose at academics and actually learn the discipline in which you claim expertise. Read papers on research and applied theory. Learn enough to have a decent comprehension of 'System F' and 'Y combinators' and other arbitrary bits of the common vernacular without having to look it up. Actually create and fully implement a statically typed language - even a simply typed one.

Sure, there will always be a bit of aesthetics, artistry, personalization, and changing requirements in software and HCI just as there is for people building houses then adding paint, porches, and swimming pools. But that doesn't make the plumbing, electricity, structural integrity, cost to build and maintain, security, resistance to damage from quakes or insects or water, heating ventilation and cooling, support for information access or cable television, etc. any less of a full and true engineering discipline. There is enough in software engineering to constitute as much an engineering discipline as any other engineer or architect. Aesthetics are important to being a successful architect, but so is actually getting the building up and keeping it there.

You've said you focus on business reports. That certainly puts you much closer to the 'artistry' side than me (who handles data and pubsub middleware, communications infrastructure, safety testing, etc.). I really don't feel much DisciplineEnvy.


I've never used psychological arguments as a metric, just to help illustrate a reason you can't casually dismiss a valid metric that you had been fastidiously ignoring.

What specific objective metric did I ignore?

Specifically, you ignore two entire classes of corrections required for post-column removal of the wide-table solution that simply don't exist in the narrow-table solution: those resulting from sub-case 1 and sub-case 4. I.e. you blinded yourself to problems so that you wouldn't have to think about them.

I suggest giving these metrics working names in this topic to avoid pronouns etc. Give the scenarios specific names and the metrics specific names.

Suggestion noted.

My metric is asymptotic cost of change per scenario, measured in both potential and necessary (potential >= necessary). In practice, this is closer to volumes of code that need changing - not absolute number, but relative portions.

Please clarify. I find the above obtuse. How is it closer to "volumes of code"? How are you determining "necessary"? Are you sure it's necessary? I can't verify that without knowing what you are looking at in your mind's trekkian main screen.

A 'necessary' cost is one you always pay to achieve a specific purpose (in this case to maintain a working application base). It is determined by the definition of necessary and logical analysis, especially of the trivial and obvious cases. E.g. if you remove a column from a table, it is necessary that you track down and remove explicit references to that column from application code and application queries if those queries and code are to continue correct operation. A 'potential' cost is one that might not exist based on one's ability to discipline the code; e.g. if you avoid use of 'select *', there is no risk of paying the 'potential' cost of fixing fragile code that, while never referencing a particular column, breaks when that column is removed; similarly, if you can guarantee that all application code that touches the database is 100% robust and immune to breakage, you don't need to pay that potential cost.

The cost of having to edit code in one solution that would not need edited in the other is a great deal higher by this metric than is the cost of having to delete one additional "d," along with at least one explicit use of said 'd' in application (indeed, the 'd,' is cheap and has no effect on asymptotic cost at all). You are, perhaps, used to thinking in 'finger-movements' so you attempt to translate what I'm saying into it - I suppose if you only have that hammer, everything looks to you like a nail.

That is measurable and separate from psychological issues. I am not claiming it should be the only metric; in fact I claim that psychological issues are the primary issues to be optimized. But we can't objectively measure these.

What does it even mean to "optimize" a "psychological issue"? How do you measure or determine whether you've accomplished it? Can you even demonstrate that it's possible, or are you possibly claiming that you want the impossible?

However, if somebody claims that it can be *objectively* shown that thin tables makes programmers more productive etc., they are obligated to show the objective metrics.

Technically, they're only obligated to offer proof (or what they consider sufficient evidence to convince a reasonable person of sufficient education whose mind is open to change; the ability to convince fools, children, and stubborn fundamentalists is not required). That is what BurdenOfProof means. Objective metrics are just one possible means of achieving such proof. You might believe they're the 'best' metric, but I disagree; I happen to believe that objective metrics THEN require you to prove that you were measuring the right thing in the first place (e.g. what does 'productivity' mean? how do you measure it? how do you distinguish experience from methodology?).

I happen to find logical case analysis to be a stronger and more accurate proof mechanism in most cases. I don't attempt to prove 'productivity', but there is a very wide slew of objective properties that can be analyzed in this manner (including code change analysis after a change, resistance to programmer typographical fault (% chance of locating it at compile time), ability for an application to continue operation in a disrupted network, etc.

I am not obligated to show objective metrics because I believe psychological factors to be the most important.

If you ever claim that 'optimizing psychological factors' provides ANY objective benefit (including productivity, average increased programmer satisfaction, improved readability (unlikely), etc.), then you ARE obligated to show objective metrics. Similarly, if you ever make a claim that says doing any particular thing helps optimize psychological factors, that is ALSO an objective claim that requires objective evidence.

At the moment, your claim is about equivalent to: "I claim programming should be optimized to get us closer to God." I.e. at the moment, you're not obligated to provide objective evidence. BUT, at the moment, your claim is an infantile fancy and wish that has no real meaning whatsoever.

And, psychology is by definition "subjective". Make sense?

Psychology isn't, as a study (even of individuals), wholly subjective. But I can see why you'd think so. Keep in mind that even psychology has its soft (clinical, talk therapy, etc.) and hard (memory analysis, reaction and response, behaviorism) divisions.

You cannot obligate somebody to objectively prove subjectivity. That's like dividing by zero in obligation-land.

You are obligated in reasonable debate to be willing to meet your burden of proof for ANY claim you make. You are obligated, in reasonable debate, to NOT make claims you cannot or are unwilling to prove. So, if as you say, your claims are purely subjective, then you should not be making them - it's like a fundamentalist shouting his faith-driven beliefs on a corner without a scrap of evidence to back up a claim.

In short, if you make objective claims, you need to show objective metrics. The actual choice of metrics is initially yours. But if I can counter them with objective metrics of my own, such as number of statements that need changing, I will. This does not mean I am endorsing a given metric as being important, only pointing out the there are objective metrics that support my point of view. If you want to demonstrate that the weights of my objective metrics should be less than the weights of your objective metrics, be my guest. I would happily welcome such.

Actually, you have burden to demonstrate that your 'counter' metric is a 'valid' counter. As is, your choice of 'counter' metrics thus far has been to drop some classes of problems from one side of the equation to make it balance in the other direction.

The cost of changing a volume of code internally is, in post-deployment situations, often less expensive than achieving the capability to change it (which may require contacting all sorts of people who have written queries for the database).

Agreed. This is why I encourage CodeChangeImpactAnalysis via scenarios if we want objective metrics.

(... Which happens to be exactly what I had started doing before being utterly sidetracked by you.)

In pre-deployment (or non-deployment situations), the cost of changing each volume of code and performing all unit-tests and passing everything is pretty much the sum of the costs (albeit not minor). I do assume that queries are written to go along with application or other code that utilize them. As a general consequence of that, fixing the queries themselves is a tiny fractional cost of fixing the application code, firing up the appropriate development environment, and testing the change via any unit tests or analysis.

Please clarify. I don't know why this would be the case. Code is code. I see no reason to rank SQL code changes lower than the app code changes.

Perhaps you misunderstood; I am not ranking SQL changes lower OR higher than application code changes. However, in all my experience, fixing and testing changed application code - and even firing up the development environment to run the unit tests and the rest - generally requires far more time, tweaking, and testing than merely deleting a column from a query. I.e. code is code, but there is (in my experience) usually a lot more app code to change for each piece of SQL code, and said app code is (again, in my experience) more difficult to fix. The lowest ratio I've ever seen is ~50:50 where the SQL code and app code were about equal in size and change impact difficulty (e.g. just deleting "d," from the SQL, and just deleting "print(d)" from the application), and it typically only gets worse from there. Your experiences may be different. Have you often encountered situations where changing queries are more than fifty-percent of changing the related application code?

Not that it is pivotal; that changes to queries constitute only a 'fractional' cost of the total code change would remain true nonetheless, and thus fixing the query code will never have an asymptotic cost effect unless there are cases where you need to fix queries without touching the application code (which could happen with views, I suppose).

When you keep tooting your horn about a tiny savings in a tiny fractional cost of the total change, I keep rolling my eyes and yelling at you; it's penny-wise and pound foolish. The relative potential cost of having to change application code that breaks that never even touched 'd' is far, far greater than any such savings.

You have not clearly shown any objective "biggies" yet. A one-eyed man is a king in the land of the blind. You are not clear about what exactly you are counting.

I have clearly shown two objective "biggies" for removal and addition of columns (sub-case 1 and 4), but perhaps only the other people with at least one eye who actually face the evidence and analyze can see it. I can't seem to help you with your inability or unwillingness to take the effort to comprehend. Even after many attempts to explain, you keep coming back with: "I still don't get it."

A similar scenario: just because you don't understand a proof of, say, the Halting problem, doesn't mean it wasn't proven. Technically, it is your job to comprehend any attempt at proof well enough to say why it is invalid or unsound, or to ask specific questions for clarification. Repetitions of "You have not shown X" actually put burden of proof on you to prove said claim (that I "have not shown X").

So, please provide your evidence that the issues I have raised are not valid.

Your proofs are cryptic. A drunk perler could have done a better job. I am not obligated to spend 3 days to decipher your mess. I'll even show you how to document it better once I figure out what you were talking about.

Pffft. You just want information to magically be formatted just for your brain to absorb regardless of essential complexity or your weakness at formal reasoning and math. You doom yourself to ignorance with almost every decision you make. Not my problem. I'm no mental slouch, but I'm also no grand genius; one thing I did learn is that doing my homework thoroughly is a very effective way to learn - it often takes me weeks to grasp a concept, showers in the morning, dreams at night, music off in the car, lunch break after lunch break, pencil and paper in hand sketching out scenarios and testing an idea to eventually make it 'click'. You give up before mere 'days' are up if not hours, and you probably quit after similar lack of effort when confronted with the various concepts that would have helped you comprehend what I was saying as I first said it. I may as well be explaining geometry proofs to a person who barely groks the difference between areas and volumes. I suppose I no longer need to wonder why you can't keep up in conversations with people that actually think, learn, and understand for weeks or months before they talk - you aren't stupid; you're just choicefully, due to your arrogance, uneducated. No wonder you've begun to seek magical solutions to make the world and domains you work in simpler, starting with 'EverythingIsRelative'.

Grow wiser, TopMind. Become an EternalStudent instead of a HostileStudent. If you have some extra time, consider taking a few college courses that will challenge you, both to learn something and to knock that ego of yours to a size you can better manage. Grab a book on type theory or category theory and read it. Actually do the exercises at the end of each chapter instead of lazily and arrogantly pretending you could if only you wanted to - and don't fear being wrong so much that you can't stand to test your answers and face truth.

I'm not your damned student. You should learn why science is important so that you don't mistake clever ideas for actual results. The problem is you, not me. --top

And don't hesitate. At the moment you have nothing at all to contribute - not unless you can figure out some objective and formal approach to 'optimizing psychology'. For now, I'm going to stop acknowledging you exist until such a time as you open your mind, kick your ego into submission, and start using your real name. I waste too much of my time explaining stuff to you when you either aren't ready to understand it or simply don't want to try.

"I'm not your damned student." --top

{Yes, but you should be.} "You should learn why science is important so that you don't mistake clever ideas for actual results." --top

{And you, Top, should learn why the other components of science -- logic, mathematics, theory, and models -- are important so you don't mistake personal opinion for theoretical foundations.}

They are only ingredients, NOT the muffins that come out of the oven. You have a problem understanding this.

"The problem is you, not me." --top

{No, it's you.}

[No, it's me. --bottom]


Just to bring some closure to this, if change counts suggested in CodeChangeImpactAnalysis were done for both table styles, do you feel that thin tables would score noticeably or significantly better? Myself, I am skeptical. I believe it would be roughly even. --top

[ This doesn't make sense that they are roughly even. If the normalized solution is roughly even then it has already won - because normalized tables can be queried more modularly, maintained easier (and a whole bunch of other advantages, history repeats itself (flat files versus databases)), and hence it isn't even if you say it is roughly even. Maybe a bit hard for you to grok. It's like saying if I have two choices of cars, and they both cost the same money - they are even. Except one of them, has better organizational compartments on the front dash and in the trunk. Which one to pick. They are even, surely.]

To be perfectly clear: for post-deployment refactoring, my opinion is that narrow table solutions will score much better in some situations, some of which have been described. Further, they'll score no worse in all other situations. The combination of these is strictly in favor of the narrow-table solution for CodeChangeImpactAnalysis. Additionally, narrow-table solutions offer greater flexibility with regards to meta-data and a wide array of other features that cannot be effectively achieved in the wide-table solutions. These additional features affect CodeChangeImpactAnalysis for all cases where these features suddenly become desirable, and they further affect the basic value of the solution (more options, more flexibility, at no significant cost). The main costs the narrow-table solution has is a practical requirement for a few additional optimizations, and a practical requirement for more advanced TableBrowser utilities.

Most of these are unsubstantiated "brochury" claims in my opinion. Maybe one of these days I'll go about documenting change metrics in more detail than our first effort to see if there really is a numeric advantage. At this point, I couldn't find any after reviewing your example a third time. You appear to be shifting toward psychological assumptions but not realizing it. If I am simply too dumb to understand your writing, as you flamefully suggest, then it will remain that way and I'll have to reinvent the wheel to see for myself. (And maybe teach you documentation techniques in the process.) Further, asterisking provided at least a few areas of objective numerical advantages, and thin-tabling makes asterisking difficult. --top

You requested an opinion in closure, you offered yours, and you received mine. I have already indicated why your point on 'asterisking' was a non-advantage at best and a strict disadvantage at worst in the scenarios we covered, and this is not the place to re-issue your arguments.


Let's review:

 // Snippet A - wildcarding
 qry = select * from ...; // 1
 print(qry.d); // 2
 ...

// Snippet B - explicit column qry = select ...d... from ...; // 3 print(qry.d); // 4 ...

If column "d" and all references to it are removed, then snippet "B" needs to change statements 3 and 4, while "A" only needs to change statement 2. Thus, wildcarding objectively reduces the number of statements that need changing for this scenario. A similar situation plays out for adding. (If this is not the place to re-issue my arguments, then I'm open to suggestions about where the place is.)

Snippet A does not offer any algorithmic cost advantage over Snippet B,

"algorithmic cost advantage" is not the metric being applied here. (Not sure what it means anyhow.) and you've not covered all wildcarding scenarios;

No, because it is focusing on a specific scenario on purpose. If we did this full-out, we'd have lots of scenarios. you've once again ignored the issue:

  // Snippet A.2 - wildcarding
  qry = select * from ...; // 5
  ... application code without 'd' ... // 6

Given that fragile application code can and does exist, statement 6 can break even though 'd' is not used, thus objectively increasing the potential cost-of-change to the entire set of queries using 'select *' rather than just those queries with application code that uses column 'd'.

Perhaps, but that's a different scenario. And, not a common one in my experience.

And you shouldn't re-issue your arguments... what ever happened to 'OnceAndOnlyOnce' and 'DontRepeatYourself' that you were, in CriticizeDiplomatically, offering as reasons that particular forms of discussion are 'bad style'? Please resist such hypocrisy.

I was talking about name-calling there, not examples. And I restated in a slightly different way because it appeared to be disputed or not understood. This is proper in my book. In fact I've encouraged it around here. Besides, I was just helping 'bring some closure to this'; I certainly don't plan to get involved in a repeat performance of your argument again and again because you keep feeling need to repeat yourself. But if you must, certainly don't fire up new arguments or repeat old ones (complete with structure) 'in review' or 'to bring some closure'; it is remarkably poor style, and somewhat impolite if intentional.

I have a right to change my mind about closure. I was hoping for a simple "yes" answer about the metrics based on prior statements, but got an unexpected response, so had to change.

Your excuses don't make it better style or any more polite.

I find it a valid "excuse". And to be frank, it is too minor a thing to nit about in my opinion. I've suppressed about 90% of the complaints I wanted to make about you because it would become a nagfest if I did otherwise. For example, your usage of "excuse" is inflammatory and unnecessary. But I didn't mention it (except as an example later). It appears to me what you are doing is trying to find something to complain about, anything, as revenge for things that upset you in prior debates. I cannot read minds and so don't know for sure. But, that's my working guess. --top

EditText of this page (last edited July 9, 2010) or FindPage with title or text search