The
RelationalModel was the first theoretically founded and well thought out
DataModel, proposed by
EfCodd in 1970. It has been the foundation of most database software and theoretical database research ever since.
It is ironic, however, that largely because of historical circumstances, its faithful implementations haven't yet succeeded in the marketplace. Early on, computers were thought not powerful enough to support it, and later on users got used to the shortcuts and the compatibility with previous implementations was a good enough excuse. While research in database theory built upon the foundation of the relational model, the DBMS industry has yet to faithfully implement the ideas that Codd laid out in the 70's.
Briefly, the relational model structures the logical view of data around two mathematical constructs: domains (i.e., data types) and relations. The name relational comes from "relation" as known and widely used in mathematics, although in database theory the definition of relation is slightly extended.
A domain is simply a set of values, together with its associated operators. It is equivalent to the notion of a type in programming languages.
A relation over the domains D1, D2, ..., Dn is simply a subset of the cartesian product; the usual notation is R
"included in" D1 x D2 x ... x Dn. An element of the cartesian set is called a tuple. A database is a a collection of "relation valued" variables (aka
RelVars, variables whose value at any point in time is a relation), together with the set of integrity constraints that the data must satisfy.
In order to facilitate programming, a named perspective has been introduced. Each domain that defines a relation is associated with a string label (that will be called column name). A column is then the association between a column name and a domain. A relation header is then a set of columns. A tuple becomes then the mapping between each column in the relation header and a value. And a relation is a set of tuples, all corresponding to the relation header. Because column names are unique in a "relation header", the positional ordering in the mathematical definition becomes inessential, and we can therefore identify each data value in a tuple by its column name. This is essentially a programming convenience and the two definitions are essentially equivalent.
Besides the structure of data, the relational model also defines the means for data manipulation (relational algebra or relational calculus) and the means for specifying and enforcing data integrity (integrity constraints).
That's the basics of the relational model. Despite its apparent simplicity, the relational model is very rich and powerful, and is a wonderful tool for doing real software engineering as well as theoretical research.
A
RelationalDatabase as implemented today (with tables, rows, SQL as query language) is much more complicated and less powerful than what a database should be in the
RelationalModel. Tables and rows aren't equivalent to relations and tuples, because SQL doesn't support user-defined data types and because tables are bags, not sets. What is
good enough varies with the complexity of the problem you are facing, and for some problems, the implementation of the relational model by current SQL DBMSes becomes really annoying. It is, unfortunately, one of the many cases of
SoftwareEngineeringVsComputerScience.
Advantages of the relational model:
- It is extensively studied, proven in practice, and based on a formal theoretical model. Almost all of the things that are known about it are actually proven as mathematical theorems. The data manipulation paradigm is based on first order logic and is in full support of DatabaseIsRepresenterOfFacts.
- It offers an abstracted view of data. It was among the first major application of abstraction as a way to manage software complexity. It basically abstracts the physical structure of data storage, from the logical structure of data.
- It offers a declarative interface (relational calculus) for the specification of data manipulation, that is actually translated to an efficient (sometimes the most efficient) implementation, given a physical data layout and within reasonable heuristic limits.
Drawbacks:
- It's never been fully, faithfully implemented. This is by far its biggest handicap.
- In spite its simplicity, it's likely you'll find lots of developers, architects, DBAs, book authors, committees who have no clue, but pretend that they have. After all, it would be quite embarrassing for someone to admit that he doesn't know what the RelationalModel is.
Number 1) and 2) usually generate a vicious circle, because DBMS vendors react to what the market demands and spend money and time implementing purportedly useful extension, which are in fact not only less useful than having a true implementation of the relational model, but they are actually harmful. These extensions tend to be generically called Object/Relational features. The most glaring example is Oracle RDBMS, which introduced "objects", "references" and "collections", together with other essential accompanying features like the cool sounding operator IS_DANGLING. IS_DANGLING is supposed to rhyme with data integrity. Project managers, CTOs and other staff can read some very nice brochures, PDFs and even be entertained with cool sales movies on O/R features. IS_DANGLING hasn't made it to any marketing presentation as yet.
References:
- EfCodd's initial paper is available at http://www.acm.org/classics/nov95/
- For a substantial exposition of the relational model and the reasons it is so powerful and valuable for the modern software technology, see AnIntroductionToDatabaseSystems.
- For the more mathematically inclined people, FoundationsOfDatabases is an absolutely delightful reading, although the connection between the theory and the practical engineering values is to some extent left as an exercise for the reader.
- For the layman software engineer who won't spend money on database theory, http://www.brcommunity.com/cgi-local/x.pl/commentary/b006.html [BrokenLink] and the subsequent articles should be convincing enough to make him buy a book (free subscription to the site is required, but won't hurt).
- Any standard text book on databases (like ElmasriAndNavathe, O'Neil and O'Neil, Ullman). Each of these book tend to elaborate more on some aspects, while taking some shortcuts on others.
For funny examples from really unexpected sources, you can read books like "Oracle XXX, The Complete Reference" where XXX in (7,8,8i), right from Oracle Press.
The most spread misconception about the relational model is that "relational" comes from primary keys and foreign keys that represent a "relation" or "relationships", "relating" rows in different tables. It is not the case; relation is a well-established mathematical concept and the relational model builds upon mathematical properties of relations.
See also
DbDebunk
CategoryRelationalDatabase
FebruaryZeroNine