Relational Trees And Graphs Discussion

Different ways to provide different structure "types" in relational, or whether it should be done.


Things to consider:


Based on a discussion in MagicEverythingMachine and [insert later].

You do agree that tree-oriented relational operators can be added to a relational operation set (perhaps call it "extended relational"), correct? Suppose the tree operations only have to know what the "parent key" is.

And you do agree that the "domain math" can be made separate from the relational system, as long as certain criteria are met and we are not overly concerned about "machine efficiency" (for the sake of discussion). Correct? Thus, there's no problem with comparing imaginary numbers since "imaginaryX = imaginaryY" is processed by the domain math engine and not the relational engine. The relational engine only has to know whether expressions evaluate to True or False (or perhaps whether equality is met).

Thus, there is no logical reason whey tree operations cannot be added and keys cannot be imaginary numbers. Agreed?

Eh? That is a very NonSequitur argument. I agree that relational can be extended with tree operators (transitive closures and such), and I certainly agree that keys can be imaginary numbers, but you have utterly failed to argue or defend any useful point in defense of your policy that tree values and such 'should' be represented in relations.

In pages such as CrossToolTypeAndObjectSharing, what we asked is the ability to use trees in joins and 'where' clauses by structural equality, and perform other "domain math" on trees. You insist these trees should be represented in relations but, so far, you have consistently and persistently proven unable to demonstrate how one might go about supporting "tree domain math" with "tree-oriented relational operators".

The 'best' argument you've ever offered in defense of your policy - that trees-nested-in-cells can't "play in relational reindeer games" - is bogus: there is no reason that "domain math" operators cannot be extended to return relations in much the same way as relational aggregation operators return non-relations. And that's the most logical argument you've presented, albeit predicated on a clearly invalid assumption that the only tables available to relational operators are those initially in the database.

I'm tired of your fallacy, sophistry, HandWaving, and NonSequitur arguments on this subject. If you had the education and intellectual capacity to recognize valid points when you stumble across them, I'd tell you to come back when you have one. As is, I can't even offer that suggestion, because I strongly suspect I'll get more of your usual piffle.

[Top, I want to create attribute values that can be complex numbers, polynomials, trees, and graphs. I wish to be able to query and manipulate them with the same ease that I can manipulate integers and strings. This is not achieved with "tree oriented relational operators". This is achieved with an appropriate user-defined type system.]
As far as how to join on the equality value of trees or branches, the "domain type system" could tie into the relational system (having tree operations also) to calculate equality. It would create a dependence between the two systems, but would at least provide the necessary functionality. We'd be sacrificing some independence as our "cost". (And I'd still like to see a real-world scenario for such because there may be a more direct way to achieve it.) -t

... or we could just support trees as domain types and relations as domain types allowing full independence and functionality but sacrificing Top's pet principle as our "cost".

Hard-borders between "types" has downsides, especially in volatile domains. IS-A is less flexible than HAS-A.

[Nonsense. You make this stuff up as you go, don't you?]

Projection again. I've given requirements at the top. If your approach is free of down-sides, then show how it satisfies these.

No, you are indeed speaking nonsense, especially regarding IS-A vs. HAS-A (which is an inheritance relationship issue but not a concern for types or structured DomainValues in general). And the requirements you presented at the top are far from complete. I'll add a few more things to consider.

Re: "Query specification - ability to describe, say, a tree to match against as part of a query vs. entering a new tree into a table and using its surrogate ID as part of a query."

Variables aren't the values they reference, doubly true when mutable. It isn't a matter of perspective.

Thus, how is one measuring this?

The above was explicit on that point: use a query that involves comparing against a tree described for just that query, see if it involved entering a new tree into a table and using its surrogate ID as part of the query. If you need an example rather than just an answer on how to measure it, consider:
 'myStuff WHERE treeA = tree(1,tree(2,3))'). 
vs.
 'newTreeSurrogateID = makeNewTree("tree(1,tree(2,3))");
  myStuff WHERE tops_handwavy_treecomp_op(treeA,'=',newTreeSurrogateID)'

vs. performing 'makeNewTree' procedurally on application side or adding some features to support describing tables with autonum variables built in.

In all the latter cases, one ends up with a bunch of new 'nodes' in the dedicated tree table upon ending the query. This is a significant qualitative difference, and is reasonably among the "things to be considered".

That appears to be almost entirely a linguistic issue. If you note on SMEQL's opening page, it allows virtual tables, including nested virtual tables. Also, it leaves things like comparison operators to the "domain math" interpreter. Thus, the comparison system/syntax is up to the domain expression processor, not SMEQL. One could use BrainFsck in the WHERE clause (or equiv.) if they wanted to. You appear to be mixing up mostly unrelated topics. If one wanted to add shortcut syntax for tree specification, they can. SMEQL doesn't prevent it. (I agree that some kind of name-space management or dummy-name-generation technique may have to be specified.) -t

They aren't unrelated. Why not? Because you violated independence of 'domain math' and 'relational' by forcing complex domain values out of the little 'domain slot' provided by relational. Virtual tables won't help all that much here, especially given you can't get away with reusing those surrogate IDs in a virtual table. Give the problem a real try rather than using words and HandWaving to pretend it away.

Yes, I already admitted that one would have to violate *some* independence already. The heavier we want to integrate, the more we have abandon independence. That's life. It's a matter of balancing fundamental trade-offs.

No... in this case, "That's top's pet principle demanding sacrifices that we really don't need to make."

You face a similar problem with "types". If they share/integrate well, then they lose some or all of their "type-ness". And I'm not sure what surrogate ID's you are referring to. -t

What the? How does supporting effective integration and composition of types make them lose "type-ness"? As far as "surrogate IDs" I refer to those "foreign keys into the 'TREE_TABLE'" or whatever table you are using to maintain complex tree or graph structures.

I don't know what this is in reference to. -t

Perhaps a line of reasoning would clarify it for you: I'm (slowly) working on a pseudo-code illustration of joining on tree equality. But before that, as far as how "embedded" trees could work, my approach would be similar to how TqlColumnTable's work: the parameter is really a table, but in different clothing. A "shortcut" syntax is created to generate a dummy or virtual table, sort of like an anonymous function in FP. Or it could be a pre-processor command.

For example, suppose we did this to SQL:

  SELECT * FROM $tableShortcut("tree","1,2,(3,4,5)")

Before the SELECT statement executes, the "tableShortcut" pre-processor or "generator" function command "runs" and sends the two parameters to a special function that may resemble:

  function tableShortcut(env[], params[]) {
    var indic = params[1];  // process indicator
    var shortcut = params[2];  // shortcut syntax
    var rt = env['result_table'];  // result temp table name
    if (indic=='tree') {
       while(row = treeParse(shortcut) {
          runSql("insert into % (%,%,%)", rt, row[1],row[2],row[3]);
       }  // we assume 3 columns here to simplify the example
    } 
  }

We could make a dedicated tree command, but this approach is more flexible and can be used with graphs or what-not. If your domain is tree-heavy, then perhaps a dedicated function would be warranted.

Can you clarify above the target schema, where all the surrogate keys are being produced and forwarded into other parts of the tree, and perhaps use a slightly less trivial input example? Also, it is unclear by looking how 'rt' is returned; is that name special in this MyFavoriteLanguage pre-processor?

As far as "rt", I've used different languages that returned system-generated temporary file names. I don't know how they worked, I just know that such technology exists. Incrementing a sequential number is one approach. In this case it could keep track of the temp names and clean them up after the end of the query or query set. (Since they are virtual, there may not be a disk-cleanup step anyhow.) I'm assuming our "kit" provides that service, but we could have a dedicated function:

  rt = makeTempTableName();
As far as how to turning a string expression into a data-oriented tree, there are many ways to do it and I don't think we need to get into that here. Get a compiler/interpreter book (BookStop, he he).

Saying "there are many ways to do it" still leaves among the "things to consider" which approaches can be achieved effectively and the development of a language to actually achieve it. However, I'll humor you on the issue for now; I prefer you focus on demonstrating the join for tree equality.

Parsing text to tree structures is a "solved" problem. I don't know why that step seems to bother you a bit. Is it how the parent-ID's are computed?

Keep in mind that we're doing a comparative analysis here, not a capability analysis. You say parsing text to tree structures can be done, and that's true, and relevant, but it is only a partial answer. There are a number of properties that make big differences in terms of performance and convenience. These include: (a) how are fresh surrogate keys obtained and associated with nodes?

(b) can nodes with fresh surrogate keys be entered all at once or must they be entered one at a time?

(c) do nodes need to be entered in any particular order?

(d) how are reference loops handled? (e) how many message trips between application and database to enter tree? (f) if tree is temporary for query, does it dirty database (leave 'data' behind if not explicitly deleted)? (g) how much needs to be rewritten per type or per variation on a structured DomainValue type, to support the parsing? (h) do you support variables in structured values filled as part of the query from a nested query? (j) how will errors be detected and reported?

Now, I keep bringing this point up because I believe this is a point of severe, almost crippling, weakness in your approach relative to supporting structured DomainValues in a FirstClass manner. Data entry is required on a very regular basis as part of queries, for updates, for inserts, and for deletes, so there is no denying its relevance. I feel you keep trying to sweep this issue under the rug called TuringCompleteness (which is completely unfair unless you also sweep your presumed advantages under the same rug), so I keep pulling it back out from under that rug, dusting it off, and making certain it is clearly visible.

That said, I'm willing to put that big issue aside for the moment while you work on another big issue I have with your approach: joins for structural equality between DomainValues.

I'm gonna take my sweet time just to make you keep asking.

Another entry for ObjectiveEvidenceAgainstTopDiscussion?

Finally maybe you found one with merit ;-P


(Based on list above)

Here's my list: the cons include:

1. Construction of trees since one needs to associate each node with surrogate IDs, the cons include deletion of trees since such garbage collection becomes the job of application code.

One can add a tree deleter to the "tree kit". Whether it's part of the DB or not is mostly a political/economic decision, not a root paradigm/philosophical issue. What does this have to do with "surrogate ID's"?

Regarding your first point: adding a tree deleter doesn't make the problem go away. The problem is figuring out when to apply such a device and what can safely be deleted, especially when dealing with "sharing" of nodes. If you attempt to solve this problem generally, you'll learn you need to mark some tables as garbage-collected and others as 'root' - or at least that was my experience. This division of tables as collected or root only enforces the idea that GC is a dividing "practical" property/requirement between DomainValue and WhatIsData.

As far as: "what does this have to do with surrogate IDs", well, there were two separate 'cons'. The only reason you're confused is because you butchered the original format of the list (by somehow merging the first two entries and the second two entries) when reformatting it for your response. The first 'con' with your approach is that you need to associate each node in the tree with a surrogate a IDs, by which I refer to the typically auto-numbered 'key' for each node. The second con is that you need to explicitly manage garbage collection. The third con is "any case where a 'tree' or part of one might need to be described as part of a query...". The fourth con is "comparison and equality and join operations between tree structures". I haven't a clue how you kludged it so badly.

2. Any case where a 'tree' or part of one might need to be described as part of a query in the same way we describe integers and strings in queries, the cons include comparison and equality and join operations between tree structures.

I will agree fixing such mucks up the "separation of domain math from relational engine" ideal, but giving up that ideal to make this possible will not result in anything significantly less that what you propose. (I agree a tree-only DB may make expressing some such things easier, but the flip side is that you don't get to integrate them with relational tools/operations without reinventing the wheel. It's an inescapable trade-off between specialization and generalization. Specialized things do specialized things better and generalized things do generalized things better. That should be a no-brainer.)

Nobody here has argued in favor of a "tree-only DB", so the rest of your argument in that line seems to be a StrawMan.

As to whether a specialized system might do a specialized thing better, that seems intuitive but intuition is often incorrect. Related: InventorsParadox. Example: the recent re-implementation of OCaml concurrency synchronization primitives via a generic message-passing facility in Haskell (the result was more flexible, more composable, and far more efficient).

3. the cons include asymmetry between operating over simple types and working with trees and the need for workarounds.

Relational has DiscontinuitySpikes in the dichotomy between "cells" and relations. That I will agree to. But ADT's have it with regard to sharing information inside the types between types. It's a trade-off.

DomainValue types in Relational may not be mutable. As I understand it, only when working with 'mutable' abstract data types does the dichotomy you speak of exist.

4. the cons include significant extra complexity to specify that a schema is keyed on the structure of a tree, the cons probably include a great deal more.

Example?

I'll give an example problem:

 CREATE TABLE my_table (
    // field1 TREE - oh, bleep, that doesn't work
    field1 INT // foreign key into 'tree_node' table
    field2 STRING
    PRIMARY KEY (field1); // oh, bleep, that isn't quite enough!    
    // I need a relation where the primary key and indices are
    // over the structure of a tree, not its surrogate ID (field1)
 );

Compare:
 CREATE TABLE my_table {
   field1  {type T: tree(left:T right:T node:INT) | leaf}
   field2 STRING
   PRIMARY KEY (field1);
 }

If you don't believe your approach doesn't have significant extra complexity, then make your example (the top one) as simple as mine.

I focus more on usage simplicity, not declaration simplicity. If you crank the "definitional simplicity" weighting all the way up and all the other factors all the way down, then perhaps what you suggest here would be simpler.

But to achieve usage simplicity, "tree" is simply one of many possible views of a "structure". I want a relativity engine, not hard IS-A's. Encapsulation hurts sharing, plain and simple. You have not been able to escape that. And in general, nothing stops one from putting pretty syntax on top of a relational system to simplify common domain-specific things. If you do X often, then create a sub-language or API that makes X easier and shorter. Further, here's a declaration in DynamicRelational:

  // declaration follows

There, nothing, nada, zilch. Assignments create columns and tables automatically. Dynamism doesn't require definitions. Thus, if you really want to reduce definition verbiage, then go dyno.

So, you failed to accomplish every single task I set forth ('key' on tree structure, 'index' on tree structure) and called it a victory for DynamicRelational? Well, you certainly have an impressive imagination. When you return to reality, perhaps you can argue something that is less delusional, and show me how to index on tree structure in DynamicRelational.

I don't even know 80% of what the hell you are talking about for last few days. Your tongue took a trip to the Bahamas. As far as dynamic, YOU gave "simplicity" as your criteria. As best I could interpret your little metric, I bettered your example. So what's the problem?

Perhaps you should focus on understanding. Instead, you toss a few insults then offer opinions when you, by your own admission, don't understand 80% of what was said. Offering an opinion on any subject before you know the subject is an act of idiocy. Let's collect a dozen or so semi-realistic semi-real-world examples and test it against those. Your examples tend to be contrived, seemingly as tools of exaggeration to magnify some real or perceived weakness. While perhaps a useful tool probe things, it's often a poor test of the tool/idea in reality. -t

How about you prove you can solve the 'contrived' example that has been pared down to its simplest form before you take on anything more challenging? Your attitude seems to be "if it ain't CBA, it ain't realistic", which I find petty and narrow-minded.

However, a few realistic examples: (a) a database tracking SCTP messages allowing queries over sub-chunk data. (b) MetaModelling? databases where one is representing both data and inference rules (this is an especially big field), (c) a database of databases where each inner database represents a possible future (used in AI), (d) mathematics number-crunching databases where one wishes to perform joins over pairs of matrices or graphs that meet particular combined functional properties, (e) sets as identifiers (e.g. set of short strings) while still needing to join on sub-features like matching all but one element, (f) associative memory databases, where indexing and key access on graph structures is critical, (g) problem-solution tables in functional memoization, which requires the key be a problem consisting of a function and some arbitrary values, (h) behaviors databases for units in a video-game where one needs both variants, conditionals, structure, (i) source transforms and equality descriptions for patterns in data-driven optimizers, (j) a hardware database supporting drivers that needs to describe and classify hardware in terms of IO formats and the memory structure for DMI, (k) a hardware database that needs to contain complex-valued modifiers to optimizations or usage rules for specific pieces of hardware - putting in a string is option, but makes ad-hoc queries more difficult.

That's a short list of semi-realistic semi-real-world examples, many of them quite realistic (seeing as I've needed them but had to go other routes like OOP). Regardless, until you can solve a simple, contrived example, I am seriously not going to believe you'll solve anything more complex. My impression right now is that TopMind is looking for any excuse to close his eyes and ignore the white elephant already stomping all over his ideals. Well, that 'contrived' white elephant is still there, smashing your arguments to pieces. It doesn't need any help from the ten elephants (a)-(k) listed above.

I've actually compared trees before by serializing them into text and then comparing the text for equality. Fancy relational or type systems are not needed to do that. I agree it may not work in all cases, especially large trees, but for a specific little side problem, it did the job. -t

Arguments of that form, where you appeal to completeness and complex workarounds as though it were a point in favor of your current system, are remarkably unconvincing.

Complex? It depends on what tools are already available. That was not a reason to build a full tree kit/DB from scratch. I already had a serializer in that case.

Continued at RelationalTreeEqualityExample and RelationalTreesAndGraphsDiscussionTwo.


See Also: RelationalAndTrees
MarchZeroNine

EditText of this page (last edited August 20, 2009) or FindPage with title or text search