As has been colorfully pointed out, in discussion of this page on other pages, clearly I misframed discussion of this topic, and so I intend to refactor this page. (Comments and such aiming at assisting with refactoring are of course welcome, no need to wait for completion.)
Meanwhile, the short version is: DatabaseIsRepresenterOfFacts
is something well-grounded in theory, whereas DatabaseIsRepresenterOfEntities
is not based in theory, it seems to be an "anything goes" pragmatic approach that includes DatabaseIsRepresenterOfFacts
as a subset (since, after all, anything goes) mentioned (but not clearly explained) by JoeCelko
(and very briefly on DatabaseIsRepresenterOfFacts
under "Database is a Filing Cabinet" and "The database is where dead objects live" subtopics, humorously-intended or not, and in the recent page DatabaseIsRepresenterOfEverything
, which I think is perhaps arguing for an ad hoc approach), and I'm unclear if he even exactly advocates it, as opposed to simply recognizing it as an approach often taken in the real world, flawed or not.
As CC has pointed out, this "anything goes" point of view destroys important properties of the relational model; the thing about the "anything goes" point of view is that they don't care, apparently
I personally do not morally
condemn ad hoc approaches, even though I believe they are inherently less sound than theoretically grounded approaches, on any topic, in part because it is very clear to me that theory is inherently always a subset of pragma. I.e. what we know with precision and accuracy is always a smaller set than what we know in a rough, fuzzy, pragma sense, as classically exemplified by the arts versus the sciences.
It is also IMHO a matter of historical fact that theoretical precision always grows out of fuzzy pragma, and that the latter is in fact always the starting point for the former, and that therefore fuzzy pragma should never be considered shameful...especially before it grows into precision (chemistry from alchemy roots), but to some extent, even afterward (medicines from folk botany).
Others may (well, clearly do
) feel differently. (I'm still wrestling with how to phrase such things; the reasons I have for being interested in any given topic are apparently often different than how people usually view the topic, which seems to lead to a rich source of misinterpretation/misunderstandings.)
this page to be interpreted as above from the beginning, but more than one person made it clear that my intention failed.
I see that my interest in ad hoc practices can (and has been) sharply misunderstood, something I didn't understand quite as clearly before now.
Part of the following page is simply paraphrases/quotes; that part I don't see a need to change; quotes are (bear with me here) relatively objective ("Person A said X" is true even if X is false). What needs to change is the interstitials. -- DougMerritt
Two major points of view about databases are DatabaseIsRepresenterOfFacts
(advocated by e.g. ChrisDate
, originally by DrCodd
, and locally by CostinCozianu
) and DatabaseIsRepresenterOfEntities
(more fully, entities, classes, and/or relationships between entities, advocated by e.g. JoeCelko
and other SQL standards committee (ex-) members, and apparently also by DrCodd
advocates the Fact approach because it is scientific, firmly grounded in theory, and disciplined, whereas the Entities approach is ad hoc ("based on intuition rather than principle") and typically suffers from lack of discipline.
On the other hand, ChrisDate
describes (pg721) a "well-understood body of data: all relationships known, all vars known as to whether dependent or independent variables, where the independent variables form an addressing scheme and the dependent variables are values in cells." ...but goes on to say that "most bodies of data are not
well understood", which sounds like the point of view of advocates of the "Entities" approach, which Date nonetheless deplores. (This is one of the few apparent inconsistencies with his argument, which is otherwise quite strong.)
: "The fundamental question is, what are you modeling in a table? Both DrCodd
take the position that a table is a collection of facts...once one states that a fact is true, it serves no useful purpose to state it again." From that point of view, databases should contain sets with no duplicate rows, not multisets/bags.
: "When the question came up in the SQL standards committee, we decided to leave duplicates in the standard. The example we used in committee discussions...was a cash register receipt with multiple occurrences of cans of cat food on it. This is how this came to be called 'the cat food problem' in the literature."
I'm not aware of anyone who says that DatabaseIsRepresenterOfFacts
is always the wrong approach, whereas advocates of DatabaseIsRepresenterOfFacts
say that it's the only correct approach and things like DatabaseIsRepresenterOfEntities
are just plain wrong, and that SqlLanguage
support for multisets rather than sets is just wrong.
Rather, supporters of DatabaseIsRepresenterOfEntities
tend to view the topic pragmatically, and may say that although it can be great to have only facts in a database, that real world needs are sometimes more complicated than that.
himself eventually decided that there was a real-world need for duplication, and added a "degree of duplication" operator to his model. ("RV-6 VIEW Updating." The Relational Model for Database Management: Version 2
, E.F. Codd) I'm not entirely clear if this meant he shifted from supporting "Of Facts" to "Of Entities" or not, but it seems to be implied.
explains that with the entities, classes, and/or relationships between entities approach, 'a duplicate row means more than merely one occorrence of an entity. This leads to a more object-oriented view of data where I have to deal with different fundamental relationships among "duplicates", such as:'
- ' Identity = "Clark Kent is Superman!" Only one entity, multiple expressions, not substitutable for each other (Clark doesn't fly until he changes to Superman)'
- ' Equality = "2+2=4" Only one entity, with multiple expressions that are always substitutable.'
- ' Equivalency = "You only use half as much Concentrated Sudso as your old detergent to get the same cleaning power!" Two distinct entities, substitutable both ways under all conditions.'
- ' Substitutability = "We are out of wine; would you like a vodka?" Two distinct entities, whose replacement for each other is not always in both directions or under all conditions. You might be willing to accept a glass of vodka when there is no wine, but you cannot make a wine sauce with a cup of vodka.'
The above is quoted or paraphrased from pg 18 of SqlForSmarties
, 2nd ed, JoeCelko
A related issue is the ClosedWorldAssumption
, which with DatabaseIsRepresenterOfFacts
means that not only is each relation in the DB a true predicate, but furthermore that all truths are present in the DB, so if a predicate is not listed as true, then it is definitely false.
For many things that's obviously attractive, and is part of the argument against NULLs; there isn't even a need for their direct base relvar representation if everything is either true or false but never e.g. unknown. (For derived relvars more discussion is apropos but OffTopic
Conversely, if someone believes that the ClosedWorldAssumption
is false for their database, and that an unlisted predicate doesn't imply anything, then....[to be continued]
Since there was a request for non-anonymity: This page was created by DougMerritt
. A careful reader will note that nowhere
do I offer an opinion on the subject, I merely quote what JoeCelko
said, figuring that was a fair balance to the ChrisDate
side of things.
I also quite carefully said that I'm not aware of anyone who thinks that DatabaseIsRepresenterOfFacts
is not a theoretical
authority, but he's an ex-member of the SQL standards committee and has written well-received books and columns, so I believe he's a perfectly good representative of the pragmatic
side of things (which may well be wrong; I never said otherwise).
A careful observer would note that today I made quite, quite extensive quotes from ChrisDate
on other pages; it's not that I think JoeCelko
is the last word.
As for my personal views, I have a strong interest in various forms of SemanticMapping
and such for application in AI, and in AI the "just the facts, m'am" approach has been explored for even longer than it has for business databases (since the 1950s, easily), and in that
domain it is known to have some apparent problems (as reflected by the glaring lack of strong AI today), although variants on it are still heavily pursued, e.g. by the Cyc/OpenCyc
project. So I'm interested in exploring alternatives, not in finding out what some (any!) authority thinks is the last word on the subject. AI is still a primitive field; there is no final answer in any aspect of it. And AI is not utterly different than the question of relational databases and business applications; relational DBs have in fact been used in AI projects, and conversely, principles of AI have been applied in the other direction.
Even more deeply, I apologize to no one for being interested in alternate approaches without even explaining myself.
For a contrary view on this topic and page, see page CostinCozianu
P.S. I wanted to create this page back in 2004 but I feared displeasing CC. It appears my fear was not unfounded.
It seems to me there is a rather straightforward dual in the possible uses of any Database. It is one that I mentioned briefly under DataManipulation
. Either a database is reflecting
an external world, or simulating
an internal one. Either way, the database is full of facts. Yet in reflection
, each fact is an accepted truth about a world (and changes to the database are for the purpose of remaining consistent with changes in the world), while for simulation
, each fact in the database actually creates
a truth about a world (and changing the database changes that world).
A dual (in Category Theory) exists when all the arrows are reversed. For reflection, the cause-of-fact arrows are from world into database; for simulation, they are from database into world. For reflection, the purpose-of-manipulations are for consistency with a changing world; for simulation, the they are to change the world. A few other sets of conceptual arrows also end up reversed.
Conceptually, this is similar to what is being said with DatabaseIsRepresenterOfFacts
(though, strictly speaking, a Database only ever carries facts, even if the facts are of the sort "there exists an object with the following properties..."; see WhatIsData
Agreed, and that's an interesting way to put things. -- DougMerritt