Relational Weenies Embrace Oo

Some RelationalWeenies still deprecate all things ObjectOriented. This typically appears to derive from some fundamental misunderstandings about ObjectOriented approaches: The RelationalModel and ObjectOriented programming can be highly complementary. The RelationalModel provides an effective way to manage collections of instances -- i.e., it replaces the usual container classes & instances with a clean, powerful, provable, optimisable, composable model -- whilst ObjectOriented techniques can effectively define attribute types (especially those instantiated as immutable ValueObjects) and manage imperative (e.g., application) code.

Perhaps those who categorically reject OO would have less of an objection if they better understood the differences between using ObjectOriented approaches to: These days, the typical RelationalWeenie who accepts OO: -- DaveVoorhis
One problem is that user-defined types are more difficult to use and share by multiple languages. Existing RDBMS tend to offer a limited set of types, but at least most apps can use them without too much cajoling and complex adapters. Types don't cross language boundaries very easy, and until somebody "solves" this, composing complex types with simpler built-in types has generally been smoother. In fact, I think RDBMS should not support a direct Boolean type, just use 1 and 0 integers, because it's fairly difficult to share with different languages. For example, some languages do not have a "null" Boolean. (Related CrossToolTypeAndObjectSharing). I'd welcome experiments with complex user-defined types in RDBMS, but am skeptical they will succeed at this point. -top

I remember when similar scepticism was expressed about computers, "high level" (i.e., not assembly) languages, structured programming, PCs in business, GUIs, object orientation, C++, the Web, Java, DotNet, and so on...

From a sharing stand-point? Problems with X and problems with X sharing may be apples and oranges. Regardless, ideas should be adopted because they prove themselves in sufficient pilot projects of a relevant/comparable niche, not merely because "new equals good".

Indeed. However, a number of us who have implemented complex applications which require complex types and complex code -- and have struggled to shoe-horn these into existing tools including DBMSes, current application languages, and the mechanisms that painfully integrate these -- can easily envision better ways of doing things. Implementing them is merely (!) a matter of time and effort.

I agree that existing RDBMS are limited, but the solution is not necessarily judicious use of "types". There may be many different kinds of solutions (including extending relational). Without specifics, I cannot make recommendations here.

I don't recall asking you to make recommendations. It's notable that DateAndDarwen explicitly do not propose extending the RelationalModel, but they do go into considerable detail on what form a RelationalModel-friendly type system should take. Codd was clear that attributes must have types (i.e., attribute values belong to domains), but not explicit about what form the type system should take.

As argued in DoesRelationalRequireTypes, I suggest that the type system ("domain math") be considered a logically separate system (as far as efficiency allows). Thus, the "types" would not be something the relational engine has to concern itself with as long as the domain math has the necessary hooks (interface requirements). But an unfinished issue is when "cell values" have expandable structures in them, not mere scalars or fixed-quantity sub-components/sub-slots, like phone numbers. Such structures would gum up what relational is about, especially if one has to use domain-specific accessors to manipulate such structures. This is the "encapsulation problem" again. -t

I'm curious: What do you think "relational is about", such that it would "gum up" with complex user-defined types? Also, I'm not clear what you intend to gain by separating the type system from the "relational engine", other than introducing the apparent, uh, "encapsulation problem" you've noted above. However, sharing complex types is highly attainable using ideas suggested by mechanisms already in existence, such as JSON, YAML, and the techniques employed by CORBA, WebServices, et al.

[The idea that 'domain math' can be separated from relational is well accepted. In an RDBMS, of course, efficiency is an issue (especially for cluster-based queries - likeness, relative distance, etc.) so an RDBMS may need a lot more 'knowledge' of the domains it is expected to index. The idea that variable-sized structures or domain accessors "gum up" relational seems to be Top's belief alone, and seems to be based on little more than his distaste for types.]

Indeed, Codd's original paper shows that in terms of the abstract RelationalModel, domains (types) exist and need to support (minimally) tests for value equality, but are otherwise irrelevant. Practical DBMS implementations, however, are a different case. I note that Top's apparent distaste for types is a peculiar one; his writings suggest that despite his protestations, he actually likes types (or he'd be an assembly language or FORTH programmer) -- he just likes using them the hard way.

I agree that implementation practicalities complicate clean logical separation. But over time more powerful hardware and practice tuning open-source component interfaces often allows us to move incrementally closer to the ideal. As far as the benefits of heavy type usage, until I see it helping for my domain, I'll remain a type-lite fan. (It may indeed help in other domains. I won't argue that. Best tool for the job.) -top

Top: Relational is mostly about operations on sets of tuples that return sets of tuples. The scalars, or simpler types ("fixed types") are the elements of the tuples. Collections are managed via relational, not custom functions with ADT wrappers (encapsulation). Encapsulation neuters relational. It is very difficult to have in a given collections both encapsulation and "access" to the relational system. It forces a hard decision that is difficult to just back out of when requirements for the collection change. In our common example, a "stack" is no longer a "stack" if the nodes within are available to relational operators. In relational the relationship between operators and operands tends to be viewed as complex (potentially many-to-many), not tightly bound (one-to-one or hierarchical). This reflects the set theory philosophy in relational and nested little state-machines in ADT/objects.

PageAnchor: stack_example

RE: apply a function, and, behold! you have something other than a stack

How about an illustration of some kind for the million-node report. (Something tells me you will plan to do so only after my join example.)

The stack report is a contrived example, of course (fundamentally, a stack isn't a DomainValue unless you're relating stacks to things as opposed to relating elements-of-stacks to things). But, supposing your goal is an 'X-ray' of a million-node stack into a million-element relation, I'll offer an example using TotalFunctionalProgramming to guarantee termination. But you could drop that syntactically guaranteed termination and translate the following easily enough into OO using 'pop' 'peek' 'isEmpty' and a FunctorObject instead of an HOF:

 define StackFold = 
    {fn Fold: push(X,S) => {fn: HOF => {fn: Seed => {Fold S HOF {HOF X Seed}}}}
            | empty     => {fn: HOF => {fn: Seed => Seed}}
            } // implemented to take advantage of TailCallOptimization
 define SequentialUnion = {fn: Elt => {fn: (Counter Rel) => (s(Counter) {union &[(Counter Elt)] Rel})}}
 define Report = {StackFold MillionNodeStack SequentialUnion (zero &[])}

The above is a left-fold, but switching to a right-fold would allow one to leverage LazyEvaluation rather than TailCallOptimization.

I'd need clear goals to run further comparisons. The elements can be shared (useful if they are large strings or whatever). Other problems, like asking for all stacks that contain a particular element, would need the same sort of domain-specific indexing support necessary to ask for all strings containing specific words. That particular issue can be solved by generalizing the indexing mechanisms to support HOFs on what aspects of DomainValues need to be indexed, and can be written in the same functional language as the above.

Not sure I follow your notation here. But it looks like you are inventing an FP query language. And searching for a particular element would not be an uncommon request. One cannot know up front all possible tasks/operations asked of a non-trivial collection. And the user of the data has to learn your different little query language here. And these are just a basic WHERE and ORDER-BY operation at this point. Why do we need 2 different query languages to do the same thing?

  SELECT * FROM stackNodes WHERE color='red' ORDER BY sales_region

That is by no means equivalent. If I had stacks as DomainValues, that means I probably have hundreds, if not millions, of different stacks, and the operation you describe would return nodes from every single one of them. Now, perhaps that is your goal; it could be done with functional+relational easily enough if it is, but we really need to compare systems that are functionally equivalent (i.e. produce the same information, even if it is represented differently) before we can reasonably compare non-functional properties.

RE: not know up front all possible tasks/operations - that's fine. There is no issue with that. Functional and OO are both incredibly flexible when it comes to specifying and composing operations (though I'll amend that: traditional OO really needs a companion language for constructing object configurations ideally with support for DependencyInjection, whereas functional composes readily without that extra effort).

RE: user of the data has to learn your different little query language here - the user needs to learn the representation of the stacks and how to perform domain math operations over them in any case; in your case they need to learn how to join on equal stacks, data entry, and have a tough time performing operations that involve more than one node at a time (e.g. return just the stacks with three red-nodes followed by three blue-nodes).

RE: these are just a basic WHERE and ORDER-BY operation at this point - How so? Please clarify/justify this claim.

RE: Why do we need 2 different query languages to do the same thing? - I'll return the question to you: Why, in Top's approach, do we need 2 different languages for testing equality between and otherwise handling DomainValues (based on whether they belong in a "cell" or not)? As far as "needing 2 query languages", the idea is to support both RelationalModel of data while allowing queries to restrict, aggregate, and perform other operations based on domain math.

In any case, I suspect you're imagining the use of stacks as containers for 'data', but in the sense of DomainValues, stacks (according to the RelationalModel, as opposed to TableOrientedProgramming) shouldn't be treated any differently from short strings or integers. They don't contain data - that is, individual nodes say nothing about the world. Instead, it is the relationship between stacks and other DomainValues that is data. One needs the ability to operate over selected stacks the same way you'd operate over integers. Your approach fails to do so with even a modicum of convenience.

Being able to view a stack as a "cell" element is not really the problem here. The real problem is *only* being able to view it as a cell element.

I agree, that would be a problem if one's hands were tied and one had no mechanism to view stacks in other ways. Fortunately, it isn't a real problem, because it isn't a problem at all.

To get a flexible system we also want to use our existing query operators and DB infrastructure on them (elements of the collection) without having to code these features by hand for each and every new "collection type" or copy them in and out of various "containers" to use those container features. You can have your cell view as long as you don't hide "the structure" from the collection-oriented side of the DB. Any given record/node should be able to belong to a million dedicated structure "types" if need be. Think set theory: any given node can be a member of lots of different sets. -t

[So what if an item can belong to lots of different sets? The approach you are arguing against doesn't prevent the items in the stack from belonging to different stacks. In fact, in exactly the way an item can belong to different sets, it can belong to different stacks. And nothing about the approach prevents an item from belonging to a stack, set, or any other collection. However, your approach does prevent a cell view. Each and every time the "cell" as a whole needs to be accessed, we have to explicitly manage that structure.]

RE: "To get a flexible system we also want to use [...]" - Top, you have either mixed up the goal with the strategy, or you have been willfully negligent in your failure to recognize how other strategies on the table for achieving flexible systems accomplish the same goal. Either way, it's irritating. GOAL: We want an efficient, convenient, and ad-hoc flexible system that supports DomainValues from a variety of domains (possibly to better allow CrossToolTypeAndObjectSharing). MANY STRATEGIES:
  SELECT myFoldLikeOp(...) WHERE foo='bar' AND blah=7

Strategies need to be evaluated. Not all of them actually achieve their goals. For example, I don't believe Top's approach offers efficiency or convenience, and I don't believe one can justify a claim that it offers more flexibility than the other strategies. I don't even view it as easier to implement: InventorsParadox seems to favor the broader strokes offered by ZF SetTheory or CategoryTheory. Thus, (between this and other observations from SMEQL/TableOrientedProgramming) I evaluate it as: looks more promising than SQL, looks less promising on almost every measure than other options on the table.

RE: "You can have your cell view as long as you don't hide the structure" - this is another statement of strategy, not goal. The goal you aim to achieve here is composition (which supports ad-hoc flexibility). Achieving this goal doesn't require that "structured" values be directly exposed to relational operators (though various approaches, including yours and the one derived from ZF SetTheory, allow that); it only requires the ability to view a value as a relation and perform relational operations over DomainValue's. All three of the above strategies allow one to compose relational operators with cell DomainValues; the OO and CategoryTheory approaches simply make the composition indirect through an extra primitive. (related: PrimitivesAndMeansOfComposition.)

RE: "Any given record/node should be able to belong to a million dedicated structure "types" if need be. Think set theory..." - this is a statement of goal, and does not imply your strategy. As noted above, there is no issue with the same node or value existing in many different structures (stacks, lists, trees, etc.). If the goal is even greater efficiency, such that representations are shared, that goal is favored by all approaches except yours, Top, and actually counts as a point against your approach. Sharing of structured values is performed quite easily, with O(1) time on construction by interning of large and otherwise structured values through an intermediate hashtable (though I'll amend: some special efforts are needed for even-more-optimal interning of large strings and lists, usually involving a behind-the-scenes representation strategy called ropes). Relevantly, for ZF and CategoryTheory/FP approaches, this can happen entirely behind the scenes, applying to all types and values at once, requiring no special efforts, nor any explicit GarbageCollection, to achieve. However, Top's approach requires special application-side efforts to share nodes/values/etc. across structures anywhere near so optimally. E.g. if treeZ has treeX as its LHS and treeY as its RHS, one must design the data entry helper to search for treeX and treeY to see if it is already in the database. And then one must be careful to keep treeX and treeY around for as long as treeZ is around. Cascading deletes can help here, but only if explicitly noted, and they'd require some fairly complex specifications; another approach is application-side nightly GarbageCollection (which is okay, I suppose, since I didn't list continuous service among the goals, though I probably would for my own work).

It's a bad argument form to raise a goal in defense of a strategy unless one can both justify the goal, and justify that the strategy is the best option to achieve said goal, at least from among the listed alternatives; raising a goal in defense of a strategy indicates without said justification indicates that the speaker isn't considering other strategies that have been presented to him, which indicates he is not listening, which everyone else reasonably finds rude and irritating. It is also bad discussion form to raise a strategy as though it were a goal, especially if the speaker should be aware of the alternative strategies from prior discussion, because it indicates the speaker really is not listening, which everyone else reasonably finds rude and irritating. Can you please, Top, try to be more aware of what you are presenting as goals vs. strategies? Keep it straight. Consider how each goal holds up in other strategies that have been presented. It would lead to more civil discussion. Seriously.

You are HandWaving, claiming that Z-foo or whatnot solves everything magically without demonstrating it and without trade-offs. My suggestion builds on an existing and common tool, RDBMS. Why toss out something that's tested in the market-place for some obscure untested academic math? I agree that such deserves experimentation, but for the shorter term the safer path is to build on what's out there unless you can show some magic way to share collection-orientation and have hard-walled "types" at the same time. I suspect they are conflicting goals, that EverythingIsRelative will hit face-first against encapsulation, but you are welcome to demonstrate otherwise. -t

[User defined types are obscure academic math? I'm fairly certain that more people are comfortable with UDTs than with RDBMSs. Especially since I see more poorly used RDBMSs than UDTs in spite of seeing many more UDTs than RDBMSs. As for tossing out tested products, it should be noted that the parts that the RDBMS currently do well are being kept in, it's only where the market-place has discovered that RDBMSs are lacking that are being tossed. It should be noted that you have also noticed that lack as you have proposed another solution to fix it. It should also be noted that your approach (special operators for particularly popular structures) has been tried repeatedly and so far hasn't shown any significant success on that front.]

[As for that "magic way" you define equality operators on your "hard-walled" types. You define a fold operator on collection types. That's it. Supply those to the RDBMS and it can now use those types in the same manner it uses integers, strings, etc. An example has even been supplied on this page, without the goals conflicting, so your suspicions are suspect.]

[''As for the costs, you have to define, at most, two operators per type; the RDBMS has to be implemented to allow arbitrary UDTs; and you have to have some way of communicating the types between applications and the RDBMS, Your approach requires that the operators are implemented every single time they are used.]

Where did I put that limit?

[It requires that diverse applications agree on how the exposed structure is to be used. It requires database writes whenever one wishes to search for a particular collection.'']

Security in relational between operators and operands would be via something like an AccessControlList or update constraints, not IS-A modeling. A "push" or "pop" stored procedure could work on any or no tables depending on it's configuration and security access lists. Put another way, relational tends use a subtractive approach to associate operators and operands, while ADT's/classes use an additive approach. It is comparable to a Cartesian join being the default join, and filters (WHERE) are supplied to provide any limits we want to place on it. -t And the JSON/YAML comments generally look like issues already raised in CrossToolTypeAndObjectSharing. -t
EditHint: Merge relevant sections of OopBizDomainGap and ComputationalAbstractionTechniques.
See HierarchicalRelational

View edit of July 8, 2010 or FindPage with title or text search