Relational And Trees

I am not sure relational by definition excludes tree-friendly operations. Some vendors have added tree-friendly operations.

Question: Did DrCodd exclude tree operations from ever being added to root relational operators? If he did, can he be "overridden"? Can tree-friendly operators be derived from the original set? It does not look possible to me.

No Dr. Codd most certainly did not exclude anything. Actually, by ChrisDate's definition a proper relational database should include a transitive closure operator. Which is all you need to operate on trees. SQL 99 goes further and provides WITH RECURSIVE syntax, that is more general than transitive closure, and it is even implemented partially in the current IBM DB2 version (8.1).

There are various ways to represent trees in relational tables. The simplest is a "parentID" to refer to the key of the parent. (Root has zero or null parent.) Another technique is to have a path column where paths are separated by a special character such as a dot. But navigating these requires some string parsing and comparing. There are yet other approaches which I will try to provide links to later.

In addition to the methods mentioned above, there is also the "nested set model". This model allows retrieval of an entire subtree in one step, without any joins. Of course, it has its own set of problems too.

Possible tree-friendly operators

traverse(entity, parentColumnName, parentID, orderType,...) - return all children under parentID as named in parentColumnName. orderType is the traversal type, such as left shallow, left deep, right shallow, right deep (do we need another ordering parameter?). Original relational operations did not define ordering of result sets, so how to deal with this is controversial. Perhaps just have it add an extra virtual column that provides a sequence, and maybe also indent level. If you later want to sort on that info, you can. A null or zero parent ID would mean start at the top. Here is a fuller set of possible parameters:

The node sequence will not be a unique value, for there may be many nodes at the same branch and level. For ordering of those nodes under the same branch and level, use the node sequence in conjunction with other operators. See TopsQueryLanguage for some possibilities. It may require two passes through orderBy operator, one before and one after.

I realize this is a lot of parameters. If you can think of a simpler way, I am all ears.

Maybe something similar can be devised for graph traversal.

TheThirdManifesto, which can be considered in many respects the definitive documentation of the RelationalModel, defines a transitive closure operator in the RelationalAlgebra for this purpose. (...which I noticed after I wrote this, is mentioned at the top of this page. Muy bad.) I believe practical application of this low-level operator is still a matter of some research, however. -- DaveVoorhis

In QueryTraversalVersusRecursion, it may be argued that such is not in the spirit of declarative languages.

Why is a transitive closure operator not in the spirit of declarative languages? -- DV

In declarative form perhaps it would look the same. I am not sure what kind of interface you have in mind. It is any kind of recursion that I feel doesn't fit the flavor. To me, recursion is "how", not "what". (However, the difference may be a matter of interpretation.)

(I second that opinion - recursion and transitive closures are mechanisms, not declarations of intent. E.g. with the basic 'Select' operator, you're saying "I want things from this table that meet these constraints". However, with transitive closures and recursion and the sort of pointer-hopping you typically see with trees, you end up saying: 'take the set R, apply transitive-closure to R, apply join, apply filter to result, return result'. Constraint-based programming is the only 'true' declarative programming. You should just be saying: 'Viewing this table as representing a set of trees with root-nodes described by property R and [... more view-stuff here that could be associated with table in meta-data ...], give me all all node-triples where the first node is a root of a tree and the two sub-nodes are descendents where the left node has properties X and the right node has properties Y', or even just 'Give me all root nodes of trees that have one descendent with properties X and another (different) descendent with properties Y'. I.e. you name the constraints, and the query processor figures out how to fulfill it. This isn't a description in terms of what to do (e.g. transitive closures and filters and joins) to find those requirements, but rather just what you want at the end.)

In the RelationalModel described in TheThirdManifesto, a unary transitive closure operator is defined. Should a single, self-contained operator be considered declarative or imperative? Does a series of linearly-executed declarative statements, as is frequently used in SQL, represent imperative programming or declarative programming? Is the RelationalModel best leveraged via an imperative language, a functional language, or a declarative language? Is it possible to unambiguously define clear boundaries between these? I think these are partly philosophical questions, and partly subjects for further empirical research.

That said, I am increasingly of the belief that effectively representing and manipulating graphs & trees is outside the RelationalModel, and would be better handled by a "graph algebra" and associated structures built for the purpose and embedded in DBMSes along with complete implementations of the RelationalModel, plus operators for (where appropriate) converting between relations and (sub)graphs and/or trees. However, I freely admit this is idle speculation. -- DaveVoorhis

A series of linearly executed declarative statements is imperative in nature.

It is possible to define clear boundaries between imperative, functional, and declarative expressions, but language necessarily crosses these boundaries. That is, a particular language expression might describe action vs. calculation vs. result, but working with that expression will demand the others: describing results of actions, describing results of calculations, performing actions in order to complete calculations, calculating actions, planning (result-of-action), and constraint (result-of-calculation)). For example, to describe what you want, you'll need to describe an abstraction. Abstractions are equivalent to patterns, which are equivalent to functions over propositions in a logic language. Thus, describing what you want will require describing a calculation. Always. Even "find me all dogs" requires calculation of sensory-data or object-data against the pattern associated with "dog", and I described said pattern with said word.

Relational model aside, the ideal way to express a query is declarative: describe what you want to learn from that data (e.g. "which breeds of dogs are most common Paris?" or "what hair-care products are popular in Rome?"), and whatever database management system you're using will evaluate that expression of what you want to learn, will compute an efficient mechanism on how to accomplish that goal, and then will proceed to accomplish this for you. Ideally, one can also subscribe to something like a query, allowing for both immediate feedback and updates when things change ("keep me up-to-date on the life of DaveVoorhis") which ultimately requires that the database keep data about what it's already told me (or that I accept a lot of redundant data).

You express as growing opinion that "effectively representing and manipulating graphs & trees is outside the RelationalModel". With this, I half-agree. It is very important to remember that the relational model was designed for representing data, with each tuple representing a datum. Representing values is completely outside the domain of the RelationalModel. Values can include trees and graphs; those sorts of values should be tuple-attributes in the RelationalModel. However, when representing data about things that are by nature hierarchical (e.g. fact:"the U.S. Fish and Wildlife Service is part of the United States Department of the Interior") then this would be within the domain of the RelationalModel - the representation of data. If the RelationalModel is weak when working with hierarchies (and it is) this represents a real weakness in the model.

I would like to see what the declarative competitors are to relational before dismissing relational for tree and graph handling. Is the argument that no satisfactory declarative approach can be found for trees & graphs, or that only relational, and not all of declarative space, is the problem?

There are no good competitors. You'd know about them if there were. The closest you can get is in research on knowledge representation and data-mining, but nothing in those fields is ready for the sort of workhorse labor the RelationalDatabases receive. Of course, "There's nothing better" is a fine argument for continuing to use a weak model, but it's hardly a defense of the model. A weakness indicates something to fix, perhaps by creating a competitor, or perhaps by fixing or adding to the language.

To perform proper declarative statements over graphical and hierarchical data (where it represents real data) requires a language that understands such things as hierarchical or graph relations both within the current table and across tables. This sort of thing must be expressed in the meta-data associated with the database, much like 'primary key' and 'foreign key' are at the moment. To perform proper operations over recursively structured values requires proper pattern-description languages for them, which are functional in nature. Both of these things should be fixed in whatever becomes the next generation of database technology. I don't imagine knowledge-representation and the associated inference engines getting into the DBMS arena (for actual management of data) for at least a full generation beyond that, and beyond even that is recognizing patterns in the structural and temporal associations between real data items and queries upon them.

Rather than wait 2 generations, there are at least two good reasons to work on refining relational declarative traversal:

The first steps I mentioned (Both of these things should be fixed) are refinements to the relational model (a little) and the query language and language processor (a lot). It also wouldn't hurt Relational to shift the query language up a declarative level (so you're declaring what you want, and the query processor figures out which tables to join, access, etc.). OTOH, that would add a mountain of a burden to the DBMS.

See Also: RelationalDatabase, QueryTraversalVersusRecursion, RelationalTreesAndGraphsDiscussion, RelationalTreesAndGraphsDiscussionTwo
CategoryDataStructure, CategoryRelationalDatabase

View edit of June 22, 2013 or FindPage with title or text search