I am not sure relational by definition excludes tree-friendly operations. Some vendors have added tree-friendly operations.
Question: Did DrCodd
exclude tree operations from ever being added to root relational operators? If he did, can he be "overridden"? Can tree-friendly operators be derived from the original set? It does not look possible to me.
No Dr. Codd most certainly did not exclude anything. Actually, by ChrisDate's definition a proper relational database should include a transitive closure operator. Which is all you need to operate on trees. SQL 99 goes further and provides WITH RECURSIVE syntax, that is more general than transitive closure, and it is even implemented partially in the current IBM DB2 version (8.1).
There are various ways to represent trees in relational tables. The simplest is a "parentID" to refer to the key of the parent. (Root has zero or null parent.) Another technique is to have a path column where paths are separated by a special character such as a dot. But navigating these requires some string parsing and comparing. There are yet other approaches which I will try to provide links to later.
In addition to the methods mentioned above, there is also the "nested set model". This model allows retrieval of an entire subtree in one step, without any joins. Of course, it has its own set of problems too.
Possible tree-friendly operators
(entity, parentColumnName, parentID, orderType,...) - return all children under parentID as named in parentColumnName. orderType is the traversal type, such as left shallow, left deep, right shallow, right deep (do we need another ordering parameter?). Original relational operations did not define ordering of result sets, so how to deal with this is controversial. Perhaps just have it add an extra virtual column that provides a sequence, and maybe also indent level. If you later want to sort on that info, you can. A null or zero parent ID would mean start at the top. Here is a fuller set of possible parameters:
- Entity - The table, virtual or real to be traversed. (Note that joins can be performed to produce a virtual table out of many. The virtual table just has to satisfy the necessary parent-pointing column convention.)
- parentColumnName - The parent indicator, as described above.
- parentID - The "node" at which to start
- orderType - The traversal order qualifier, as described above.
- nodeSequenceColumn - Column name to put the node sequence number into. If this is not supplied, then perhaps a default can be used, such as "nodeSequence". Or perhaps this should be replaced with a SEQUENCE() function.
- levelColumn - Column name to put the level value into. Zero for the top-most, 1 for the nest level, etc. If not supplied then no such column is created. Perhaps this can be replaced with a LEVEL() function.
- orderClueColumn - A column used to determine the order of traversal if there are multiple candidates. If not supplied, then primary key is used.
- criteria - like a WHERE clause
The node sequence will not be a unique value, for there may be many nodes at the same branch and level. For ordering of those nodes under the same branch and level, use the node sequence in conjunction with other operators. See TopsQueryLanguage
for some possibilities. It may require two passes through orderBy operator, one before and one after.
I realize this is a lot of parameters. If you can think of a simpler way, I am all ears.
Maybe something similar can be devised for graph traversal.
TheThirdManifesto, which can be considered in many respects the definitive documentation of the RelationalModel, defines a transitive closure operator in the RelationalAlgebra for this purpose. (...which I noticed after I wrote this, is mentioned at the top of this page. Muy bad.) I believe practical application of this low-level operator is still a matter of some research, however.
, it may be argued that such is not in the spirit of declarative languages.
Why is a transitive closure operator not in the spirit of declarative languages?
In declarative form perhaps it would look the same. I am not sure what kind of interface you have in mind. It is any kind of recursion that I feel doesn't fit the flavor. To me, recursion is "how", not "what". (However, the difference may be a matter of interpretation.)
(I second that opinion - recursion and transitive closures are mechanisms, not declarations of intent. E.g. with the basic 'Select' operator, you're saying "I want things from this table that meet these constraints". However, with transitive closures and recursion and the sort of pointer-hopping you typically see with trees, you end up saying: 'take the set R, apply transitive-closure to R, apply join, apply filter to result, return result'. Constraint-based programming is the only 'true' declarative programming. You should just be saying: 'Viewing this table as representing a set of trees with root-nodes described by property R and [... more view-stuff here that could be associated with table in meta-data ...], give me all all node-triples where the first node is a root of a tree and the two sub-nodes are descendents where the left node has properties X and the right node has properties Y', or even just 'Give me all root nodes of trees that have one descendent with properties X and another (different) descendent with properties Y'. I.e. you name the constraints, and the query processor figures out how to fulfill it. This isn't a description in terms of what to do (e.g. transitive closures and filters and joins) to find those requirements, but rather just what you want at the end.)
In the RelationalModel described in TheThirdManifesto, a unary transitive closure operator is defined. Should a single, self-contained operator be considered declarative or imperative? Does a series of linearly-executed declarative statements, as is frequently used in SQL, represent imperative programming or declarative programming? Is the RelationalModel best leveraged via an imperative language, a functional language, or a declarative language? Is it possible to unambiguously define clear boundaries between these? I think these are partly philosophical questions, and partly subjects for further empirical research.
That said, I am increasingly of the belief that effectively representing and manipulating graphs & trees is outside the RelationalModel, and would be better handled by a "graph algebra" and associated structures built for the purpose and embedded in DBMSes along with complete implementations of the RelationalModel, plus operators for (where appropriate) converting between relations and (sub)graphs and/or trees. However, I freely admit this is idle speculation.
A series of linearly executed
declarative statements is imperative in nature.
- Not necessarily. A declarative representation may have an imperative equivalent, but does not require it. See toward the bottom of TqlChainedJoin for more on this.
- Yes. Necessarily. It isn't the presence of an "imperative equivalent" that determines whether an expression is imperative; rather, any expression that something is to be executed is imperative in nature. All declarative statements that communicate that action should be taken are imperative in nature... but it's the expression, not the communication, that is discussed here. Your example of TqlChainedJoin is not a counter; it is not a series of declarative statements to be executed linearly.
- Well, let's deal with the classification of "linearly executed declarative" on a case-by-case basis as encountered and see if it really has been "imperatively polluted" or just looks that way.
It is possible to define clear boundaries between imperative, functional, and declarative expressions
, but language necessarily crosses these boundaries. That is, a particular language expression might describe action vs. calculation vs. result, but working with that expression will demand the others: describing results of actions, describing results of calculations, performing actions in order to complete calculations, calculating actions, planning (result-of-action), and constraint (result-of-calculation)). For example, to describe what you want, you'll need to describe an abstraction. Abstractions are equivalent to patterns, which are equivalent to functions over propositions in a logic language. Thus, describing what you want will require describing a calculation. Always. Even "find me all dogs" requires calculation of sensory-data or object-data against the pattern associated with "dog", and I described said pattern with said word.
Relational model aside, the ideal way to express a query
is declarative: describe what you want to learn from that data (e.g. "which breeds of dogs are most common Paris?" or "what hair-care products are popular in Rome?"), and whatever database management system you're using will evaluate that expression of what
you want to learn, will compute an efficient mechanism on how
to accomplish that goal, and then will proceed to accomplish this for you. Ideally, one can also subscribe to something like a query, allowing for both immediate feedback and updates when things change ("keep me up-to-date on the life of DaveVoorhis
") which ultimately requires that the database keep data about what it's already told me (or that I accept a lot of redundant data).
You express as growing opinion that "effectively representing and manipulating graphs & trees is outside the RelationalModel"
. With this, I half-agree. It is very important to remember that the relational model was designed for representing data
, with each tuple representing a datum
. Representing values
is completely outside the domain of the RelationalModel
. Values can include trees and graphs; those sorts of values should be tuple-attributes in the RelationalModel
. However, when representing data about things that are by nature
hierarchical (e.g. fact:
"the U.S. Fish and Wildlife Service is part of the United States Department of the Interior") then this would be within the domain of the RelationalModel
- the representation of data. If the RelationalModel
is weak when working with hierarchies (and it is) this represents a real weakness in the model.
I would like to see what the declarative competitors are to relational before dismissing relational for tree and graph handling. Is the argument that no satisfactory declarative approach can be found for trees & graphs, or that only relational, and not all of declarative space, is the problem?
- "It is very important to remember that the relational model was designed for representing data, with each tuple representing a datum. Representing values is completely outside the domain of the RelationalModel." That's a curious pair of sentences. Could you provide a reference in any of the standard texts on the RelationalModel (e.g., TheThirdManifesto or AnIntroductionToDatabaseSystems) that confirms your assertions? I'm not saying you're wrong, because I think I know what you're attempting to convey, but your application of the terminology appears to differ from the usage with which I'm familiar. -- DaveVoorhis
There are no good competitors. You'd know about them if there were. The closest you can get is in research on knowledge representation and data-mining, but nothing in those fields is ready for the sort of workhorse labor the RelationalDatabase
s receive. Of course, "There's nothing better" is a fine argument for continuing to use a weak model, but it's hardly a defense of the model. A weakness indicates something to fix, perhaps by creating a competitor, or perhaps by fixing or adding to the language.
To perform proper declarative statements over graphical and hierarchical data (where it represents real data) requires a language that understands such things as hierarchical or graph relations both within the current table and across tables. This sort of thing must be expressed in the meta-data associated with the database, much like 'primary key' and 'foreign key' are at the moment. To perform proper operations over recursively structured values requires proper pattern-description languages for them, which are functional in nature. Both of these things should be fixed in whatever becomes the next generation of database technology. I don't imagine knowledge-representation and the associated inference engines getting into the DBMS arena (for actual management of data) for at least a full generation beyond that, and beyond even that is recognizing patterns in the structural and temporal associations between real data items and queries upon them.
Rather than wait 2 generations, there are at least two good reasons to work on refining relational declarative traversal:
- A sufficient competitor is a long way off, as you agreed.
- A good many apps are not likely to be pure traversal-oriented, but rather a mix of "graphiness" and traditional queries. Even though the additions may not be the ideal traversal technology, it does allow apps to live in both worlds in an integrated way fairly well. The New Thing may not do the "traditional" stuff well, making mixing difficult (limiting it to very narrow niches).
The first steps I mentioned (Both of these things should be fixed
) are refinements to the relational model (a little) and the query language and language processor (a lot). It also wouldn't hurt Relational to shift the query language up a declarative level (so you're declaring what you want, and the query processor figures out which tables to join, access, etc.). OTOH, that would add a mountain of a burden to the DBMS.
- The "declarative level" isn't a "Relational" issue, because query languages aren't a component of the RelationalModel. The RelationalModel defines tuples, relations, and an algebra, and that's essentially the lot. It's unfortunate that the model and query language design (which usually means some reference to SqlFlaws) are so often conflated. I would be curious to know what refinements you suggest to the model, however. -- DaveVoorhis
- One could argue that a mathematical expression is both declarative and could also serve as a query language. Early relational languages were indeed more math-like, but IBM decided to COBOL-tize it to make it (hopefully) more marketable.
- Mathematic and functional expressions (ignoring typing and computability) describe a value, not a consequent. Think about it. What did you 'declare' with a statement like: '1+1' or 'f(a,b)^2' or 'table1 joined with table2 joined with table3'? You declared nothing; you just described something that can be calculated by programmatic application of formally defined rules. Ideal declarative expressions describe consequents, and the programming style that most correctly embodies them is constraint-based, not functional programming. This is well understood by people who have actually worked with both sorts of languages. E.g. instead of 'given a,b, find f(a,b)^2', you say 'give me one pair (a,b) such that f(a,b)^2=81'. You describe the consequent (the value (a,b) and its properties), not the expression that evaluates to it. (Note: the declarative axis with imperative programming describes goals for state (or effect) rather than how to obtain it.) The best known of constraint-based languages are Prolog and Maude. It'd be well worth figuring out how to express queries in a more declarative manner than existing languages currently provide.
- But is it safe to say that the distinction is probably blurry and/or continuous? No. It is safe only to say that there is a singularity point of contact between the two expressive forms: when one desires a value that is exactly the result of calculating a functional expression (e.g. x = 1+1). In this sense, you can consider constraint-based declarative programming to be superior (in expression) to functional programming. Every functional expression qualifies as a constraint-based expression, but there is no natural way to transform or consider constraint-based expressions as functional ones (e.g. give me one pair (a,b) such that f(a,b)^2 = 81; this might or might not be optimizable by attempting to solve for (a,b), but optimization is beyond the issue of expression.)
- The "declarative level" is an issue for Database Management Systems in general, so it is an issue for RDBMS. The RelationalModel does define a bit more than that regarding the use of keys and consistency constraints, though I guess that depends on which iteration of the model and DrCodd's work you're looking at. Anyhow, you say "it's unfortunate that the model and the query language design are often conflated", but I say "tuples, relations, and an algebra are a language, and don't you forget it". SQL has its flaws, and while some of them don't come from the RelationalModel, some certainly do, including the difficulty seen with RelationalAndTrees. If you wish to fix the flaws, you need to get down to modifying that language... perhaps not so much on the representation side (saying how different things relate to one another), but on the algebra side (saying how to connect things together).
See Also: RelationalDatabase