Improve Databases Or Else

A discussion that grew out of FearOfAddingTables about alleged proper ways of designing databases and application type systems that are not being followed.
Page name changed from goodDatabasesAreForGeniuses to ImproveDatabasesOrElse. Still arrogant.. but took away "genius".

I command everyone to improve the software industry through force, by using threats in the page title. I think imperatively.

The problem today (year 2008) is that a genius (or really really competent person) can barely create a strong, reliable, and well thought out database with the tools available. Someone less of a genius has it worse - people throw everything into one table, use nulls, lack foreign keys, use plain text files, XML databases (pfff), multi valued databases (pfff), and other crap.

We need to have a migration path available.. so that the unreliable databases like config files and unnormalized tables (myIsam) can become better relvars (tables). A semi genius would have built a good reliable database on day one, if he had the proper tools available and if the current tools didn't permit him to do such silly things with his database. We need to take it further and make it accessible to even the laymen to build databases that are really relational and properly designed. Documentation and websites for the products needs to educate users instead of providing just hype and incorrect information.

The good news is that most data stored in poor form can actually be normalized.. and there really are partial migration paths popping up (such as products inproving their DB's each year). But that isn't enough - some of it is getting worse, or it is improving at a slow pacee. It isn't the end of the world, but it could be a lot better. The relational model.. most people JustDontGetIt or even know of it.

Database designers, engineers, and laymen need to be encouraged with proper products, docs, and websites (and education, it is all related).

Today, people maintain unnormalized designs, abuse nulls, are afraid to create foreign keys. They are lazy or they simply don't have time at that moment in attention. Or they see it as too much overhead and too much work, because our current databases are heavy handed bastardized SQL. Our tools encourage us not to design databases properly - and what we are encouraged to do, is what we do (even smarter competent people are still encouraged by their tools).

Main problems: current databases permit us to be lazy, permit bad design, allow unconstrained data, have poor type systems, and permit sloppy integrity (if any).

Maybe what we need is something like TrueRelationalFileMaker? or TrueRelationalBento? with TrueRelationalToPseudoRelationalMapping capabilities

That can help and also languages/systems inventions that make the database more accessible and easier to just use and design things correctly.. the database should be thought of as a data structure or variable right in our programs (relvar vs table). RelProject and TutorialDee seems like it is headed this direction.

Yes but TheThirdManifestoNeedsToRideaHypeWave? somebody has to to create an TrueRelationalToPseudoRelationalMapping that surfs right in the middle of the somene elses hype (or create its own) now, TheThirdManifesto was written a lot of years ago, and so far it has not created its own HypeWave?, and since we need one of those to force the market in to including this kind of technology in mainstream databases and frameworks... I think the best practical solution would be to add a TrueRelationalToPseudoRelationalMapping to RubyOnRails, or to AptanaJaxer? (with or without GoogleGears on the client side) (in fact adding it to AptanaJaxer? might be the best way since I rememeber reading somewhere in the Wiki that prototype based languages suffer from a much milder form of the ObjectRelationalMappingImpedanceMismatch?)... What do you guys think? I think it may be the only practical way to force the market in to ImproveDatabasesOrElse --LuxSpes

Personally, I think the best way to ImproveDatabasesOrElse is to create slicker, easier-to-use administration tools for true relational databases, and provide conventional application interfaces like ODBC and JDBC so they can be used by virtually any application development environment. At the same time, work should be done on creating integrated, easy-to-use application development environments that incorporate both client-side and DBMS functionality into a single, cohesive, unified whole.] -- DaveVoorhis

Use a product and create a company to educate people and offer proper documentation and education with the product. The problem is simply time, luck, and other such things. Looking at what Dataphor has done will be a good start.. and heavily researching RelProject and other things are on my to do list (along with reading as many PDF files from Darwen/Date et al).

{By migration, I don't mean keeping all backwards compatibility and offering the perfect migration path.. as that isn't possible - I mean we simply need to make it easy for someone to try the relational model (instead of just PDF files that explain why everyone is wrong, etc.) We need also to tell people that yes, they can export their existing data from their current SQL database, plunk it in a relational product and try it out - even if that means they have to change some of their current data (such as changing nulls to defaults). Data is migratable by its very nature. I do not mean keeping nulls, keeping SQL flaws, or even trying to follow Dataphor's path exactly.}

{One idea is for example to compete with SQLite.. offer a nice alternative to SQLite and say why don't you try our relational database that is embeddable and easy to use?. When people choose tools like SQLite they aren't looking for a huge setup like Oracle.. so easing them into the relational model first is what I meant by migration. i.e. not requiring them to just take all their huge Oracle databases that they've invested into and magically change over to a new paradigm. First, ease them into it and let them know about it by making a nice SQLite style product which many people can use without any fear. Tell them on the side that there is also a relational alternative to MySQL and Oracle planned, in addition to the SQLite style small starter database. The idea is not to just continually fight people with academic revolutoinary fire, as Fabian does.. but offer a real solution to get them going. It needs to be easy to setup, easy to use, or people won't convert/migrate/try it. For example, right now, all we have is a bunch of PDF files, and no actual product to prove the relational model can work. The easiest way to get people started is not to yell at them and tell them all their Oracle and MySQL databases need to be immediately converted over to the relational model. Rather we have to offer solutions to satisfy people's curiosity. Try this.. download this.. no .Net requirement, no existing database requirement, no hard stuff to setup, no hosting company required to try the DB, it just works. Might be hard to accomplish, but that is a good way to get people interested.''.}

At the risk of tooting my own horn, the RelProject is very close to this. It's trivially installed, requires no configuration, is easily embeddable in Java applications, and is entirely self-contained except for requiring a Java run-time environment. What it lacks are several things: I believe the primary target competitor should be Microsoft Access; the product should be a user-friendly desktop DBMS that works equally well as a DBMS server. This represents a significant opportunity, as there is really nothing like Access in either the OpenSource or commercial arena, and Access is often a beginner's introduction to database technology. Access also containts a bucket of flaws that could be addressed at the same time. -- DaveVoorhis
This is utter arrogant and insulting bullshit, not science. ArgumentFromIntimidation. I protest such accusations without real evidence and metrics. And, you are not only table happy, but topic happy. This belongs in FearOfAddingTables. I think somebody made this just to get my goat. --top Arrogant and insulting is swearing in this wiki and using BadWord. This was not intended to attack you Top.

Geniuses don't abuse nulls, but lazy people do.

An example genius design is something like: He gets rid of the nulls in a clever design.. but regular people just go ahead and abuse the nulls since they aren't genius enough to avoid them..

Instead of creating a table with some definitive descriptions in another relvar (table) like:

We have people instead doing:
  TRUE  (employed)
  FALSE (not employed)
  NULL  (not sure, self employed possibly, or not applicable, or not entered yet)

This abuses the NULL because it is available to us.

A genius would not abuse a null because he is already a genius and knows not to abuse a null.

         TYPE workStatus = (employed, unemployed, selfemployed)

The problem with the above enumeration though is that it is not something we can query and change as part of the database.. it is more like a BOOLEAN in some respects in that it is more fixed (but not as fixed, it is more like a byte or integer). But some times we don't need to query it. Pure relational wheenie wouldn't allow a enumeration type (which just maps to a byte or integer at an implementation level) since it could lead to three value logic.. but it may also be something that stops people from using nulls when they don't have to, and avoids creating several different tables as in Hugh's page.

Since not all people are geniuses, the idea is that we shouldn't give them too many chances to be silly.. we should hit them with a whip and ban them from doing silly things. NULLS give people too many chances to be silly. Date, Darwen and all have several articles that debate the usefulness of NULLS. Codd at one point (the original relational model maintainer) almost appeared to be okay with 3 valued and 4 valued logic in his later writings.. but I personally think that is because he never understood what an enumeration was.. in my opinion enumerations are superior to some silly 3 or 4 valued logic.. and in my opinion even skinny tables are better than NULLS

A genius like FabianPascal says to someone else that YouJustDontGetIt so and so's design is wrong wrong wrong wrong.

How do we make them get it, easier, then? How do we make database design something that people can get?

Geniuses can also multi task and see the multiple relvars at once or multiple tables with foreign keys at once.. while others JustDontGetIt and just see values, rows, columns, in the form of an HTML table.

Yeah right, and I suppose geniuses can use Go To's also because their huge multitasking brain can handle all those wild jumps. Indented blocks is for single-tasking limited brains. Enjoy your Go To's, dear genius. Forget relational even, and go pure graphs. Didn't they used to say to GraceHopper: "High-level languages are for girls, real men use assembler"? --top

A smart programmer uses the high level language and visualizes the Go To statements in his mind or in his assembler window only if he needs to dissect the code into what the CPU is really doing. High level languages were invented to ban people from Go To, just as banning NULLS bans people from doing bad things.

Big Change Pivoting on The Trivial

The problem with the skinny-table approach given is that table design hinges on null existence. If we change a column to "required" per business rules such that it can no longer be empty, the above suggestion would dictate changing the schema via adding or merging tables. This makes the schema anti-change-friendly. The existence or lack of a required column is too trivial to hang a design on. It's like designing a house layout based on hair length. "Required" should be a simple switch that one can toggle on and off without having to refactor the entire schema for up or down. --top

Is there a SkinnyTablePattern? on this wiki? I think it is a pattern (serious, it is). I assume you mean normalization causes skinny tables.. which it does.. but at the same time if you don't see them as tables and consider them just lists of stuff, tuples, etc., it isn't as bad that it is skinny (i.e. people fear creating more tables because that's too much overhead with too many ID fields? An enumeration type solves that issue but the problem is then you cannot query the actual enumeration and find out what enumerations are available.. but do we need to?)

It is not normalization, it is *excessive* normalization. Skinny tables are frighteningly close to NavigationalDatabases: pointer pasta city. "Relational" means "based on tables". Skinny tables are less table and more like lists. Don't call it "relational", but Listational or something. Lispers may dig it more.

[Well Top, foreign keys, relvars, et al. are to reduce the duplication, but I think it somewhat reinvents/overlaps the idea enumerations and other types and constraint forms. Creating a new skinny table just for something that could be defined as a type like below can get on the nerves:
  type bitmask = (bmRead, bmWrite);
As for lispers digging it.. I don't much adore lisp too much.. I understand lisp but I'm more of a modular/procedural guy as you know. I dig the relational model but I find it a bit complex when simple tasks need to be accomplished immediately (i.e. it is verbose and time consuming to create another relvar (table) just to emulate an enumeration). That's why I created this page because a genius with unlimited time and skills COULD potentially create the perfect database according to purism.. but what about compromises? ]
Data modeling is a discipline and a craft. Like any other, it requires practice to become skilled, and dedication to acquiring expertise and knowledge about the craft, including its history and theory. Alas, like any other discipline, it attracts its share of cowboys, crackpots and incompetents. Many excellent data modelers are not geniuses, but have simply made the requisite commitment to becoming skilled.

And personal preference also plays a role. Some people can grok sparse tables easier than large table-space with skinny tables, and vise versa. Everybody's psychology and physiology are different. What bothers some eyes, brains, and hands will not bother others and vise versa. I've personally found skinny tables annoying and wanted to bop the designer.

There's something quite frightening in the notion that your selfish personal preference might play so large a role in a field that is fundamentally about meeting user requirements, especially the implicit requirements of reliability, stability, consistency and accuracy. Likewise, a cabinetmaker might have a personal preference for jagged edges, exposed splinters and spindly legs; or a cook may favour the gamey stank of slightly-off meat; or a toy designer may like intricate and easy-to-swallow small parts. That does not mean such dangerous abberations should be foisted on naive and unsuspecting users! That you find "skinny tables annoying" -- as if it reflects some generally valid principle without any apparent reference to industry best practices or user requirements -- smacks of professional incompetence. I do wonder what kind of ill-conceived rubbish your users have been forced to endure. -- DaveVoorhis

I believe you to be an over-educated idealist/purist who wishes to compile the world up-front to clean out all the bad logic and bad data so that the world runs like clock-work. But reality is different. LifeIsaBigMessyGraph whether we want it to be or not. Our abstractions are merely tools to help us *force* some artificial sense onto chaos so that we at least have a frame of reference. Abstractions are just frames of reference, a common communication platform. But TheMapIsNotTheTerritory. Politicians, Bureaucrats, Owners, and Marketers don't give a flying shit about purity of concepts, and thus make up messy graphy rules with only teasing hints of reliable patterns and logic. Pretending there is some underlying sense or simple equation or GoldHammer? under all this chaos is just lying to yourself. You are a religious zealot trying to purge the heathens. I used to be like you, but I wised up. There is no GoldenHammer. --top

"I believe you to be an over-educated idealist/purist who wishes to compile the world up-front to clean out all the bad logic and bad data so that the world runs like clock-work." You say that like it's a bad thing. Ideally, yes, that's exactly what I'd like to do. Sometimes, in database-driven applications, it's even achievable. Surely that is preferable to your approaches, which (apparently) endorse not attempting to clean out any bad logic and bad data? Your peculiar advocacy of, for example, "type-less" databases and whatnot are exactly that. -- DV

You demonstrate it bad on a practical level, and I promise to consider it. Is that asking too much? Note that I am not the only "lite type fan" on this Wiki. The others have just been quiet lately. --top

Top, there are a lot of fans in this world.. the issue is.. 90 percent of the world doesn't know what they are doing and is lacking in genius skill.. hence the title of this page. Doing programming correctly is hard. Very hard. It's hard. That's why people choose sloppy solutions like what we are offered with mysql and sqlite (incomplete products that were created in a rush because we just don't have time for the perfect soltion). Using it a better or safer product is really hard when there aren't many to purchase or download, especially if it is not a mass product known by the industry. This is the problem with TheWest. Just look at Microsoft and see all the fans that support that funny company. Then look at all the haters. Or the fans that support the bizarre perl language, and all the haters. Millions of people are fans. Doesn't mean it is necessarily more right, just because of fanboyism that is there.

I don't think it is that simple. A lot of the less technically-gifted developers often have social skills that are very helpful in understanding and related to the user and managers. And also, globalism is shifting "static" services overseas, leaving rapidly-changing (dynamic) services to us. Under this, nimbleness is more important than purity. Change is our comparative advantage, not technology per se. Thus, there is a bigger need for flexible and cheap techniques. --top

And you don't see the need for agility being met by better or more fluid administration/development tools? It can only be met by reducing constraints? -- DV

[Cheap techniques, cheap programmers, cheap languages, cheap solutions like PHP and MySQL 3.0. That is what this world is coming to Top. It is sad. What we need to do, on the contrary, is smarten these people up who are doing cheap product programming with cheap laws and cheap fantasies]

And the good ones program circles around red-tape loving anal types bloaters. Up yours!

The weak/dynamic/loose typing database and application programmers write their own anal checks using something called a function, Top. A function checks to see whether things are valid. Sometimes you forget, and it should be automated.. by your system. That saves you time. It takes a lot of time for people to figure this out, that the amount of time they saved by not declaring things is wasted in writing the checks in their code which are more error prone when you are doing it.. versus a common tested automated system written by other smarter people. They are smarter than you and I (the relational database engineers and the type system engineers). Worse, many times these run time checks that you write (bug potential hazards) with your own code aren't reused across several apps and you end up reinventing the checks each time.

The entire purpose of placing integrity into the database and into the type system is so that you don't spend time writing these checks yourself in each application with functions like "if isNum(data) then error('this database column only accepts numbers') which would have been checked for you automatically by the DB if you'd enforced it with an integer column. People are too lazy, by nature, to write these checks for every data item.. and it bloats up your code. Irony: bloating your code is what you claimed a type system and a database with integrity checks would do. What happens is you end up bloating your application with the checks yourself, or worse yet, and most often, you simply don't write any checks at all or forget since you are human.

Furthermore, if you forget to filter data and someone injects a malicious string into your database (or even binary data) - this won't be protected in an integer column without integrity checks at the DB level. Since your database allowed strings (or blobs) to be input into an integer field (making zero sense), this can be a huge security hazard where people can take over your entire system or inject odd things into your websites. Worse that could happen with an integrity checking database is "not a valid integer, bye bye now". As a lot of database programming these days involves some form of internet, this means that in an insecure internet world.. integrity checks (automated ones especially) are always beneficial in the long run for programmers. If your application is being used by one single person as a throw away app - then go ahead and use a text file. Why bother with a database at all? --Lars

Perhaps "for a genius" means overengineered for job security for the overengineer. I thus coin a new term: "overengineer". --top

And that's why instead of bitching, I discuss such simpler solutions on this page as the enumeration.. which no one replies to since more people are interested in flaming instead of actually talking about implementing solutions to solve the problem. --Lars

Seems almost like a ConstantTable with some direct DB support.

A good many of the problems are caused by people who take authoritative evidence as Gospel without understanding the implications and variety of tradeoffs and the difference between subjective preferences and objective evidence. --top

Why don't you continue to use the products you are using? Do you have a purpose here? What exactly are you trying to stand up here and protect? Do you have a product to sell us or a product that you use that you wish us to know about which isn't relational, or something? i.e. what do you plan to gain or what do you plan to protect by being here and saying statements like the above? Is it that nothing needs to be improved in the industry, people just need to get more familiar with the products like ones you are using? In that case, once again, if you are happy, why do you even bother on this page? This page was made to discuss options on how to improve relational databases to make them easier to use.. as it appears only arrogant geniuses can currently use relational databases... otherwise people like Fabian wouldn't be so angry. The very idea that people are yapping instead of discussing solutions to the problem shows who wants to make progress and who wants to just continue being the way they are.

PageAnchor: Finding-Good-Choppers

An anecdote: I'm currently dealing with a system that has lots of skinny over-defined tables. One can tell that they got their abstractions wrong either due to not seeing the whole picture or via changes that came later. One cannot tell what caused the failure because the original authors are long gone. Everybody complains that its hard to get data, not just me. It is one reason why they are looking to replace it in a few years.

Perhaps if there were an army of database programmers around to help get stuff out and describe where everything is, it may be easier, but there are not. We are stuck with a big ball of tables. Does skinny tabling assume the army of experts will stay around?

Wider tables are at least easier to study when trying to figure out what is where because you don't have to hop from table to table. I'll take a wide-table mess over a skinny-table mess any day. It depends, if you system is only for OLAP (better to have big tables) or OLTP (better to have a very normalized database)

Maybe there is a "proper way" to do skinny tables right, but this is clearly not one of them. It appears the problem is that its hard to create divisions (classifications) of things that fit the domain well and stay fitting the domain. It is a form of LimitsOfHierarchies. Skinny tables create (at least) a 3-level hierarchy:

  dataabase -> table -> sub-table
Domain classification hierarchies are difficult to get right, perhaps impossible if the domain is not really tree-shaped. True, it's not really a hierarchy because one can have cycles via use of foreign keys, but the problems of shoehorning a domain that you do not fully understand or will change into these three items appears to be an artificial endeavor. Wider tables make fewer grouping assumptions, and thus flex more. If you want specific narrow views, then make them, but don't gum up the originals to get it. I believe in PutClassificationsIntoMetaData rather than hard-wiring them up front because almost nobody gets them right up-front, and even if you are the one genius who does get them right, nobody can really tell that up front without tons of onsite visits to prior installations, and even that assumes the list is has not filtered out the misses. People usually don't put their failures on a resume. Most of the time such an expensive background check is not worth it. I am not dismissing the possibility that there are good choppers out there, but I know of no sure way to find them. And, all of the skinny-table installations I've encountered didn't find them (or all their documentation was burned such that we never knew why they sliced and diced the way they did, and so it appears somewhat random). BetterOnlyIfDoneRight syndrome?
Or Else


See also FraudulentMindset

View edit of May 22, 2008 or FindPage with title or text search