Many To Many Challenge

Let's get down and dirty with a very simple example. Here's the SQL definition that matches the information model used for Unix users and groups. A quite trivial many to many relationship. Let me see how many lines of code it takes you, and at what complexity cost, to represent it as an ObjectModel of "live objects", in your OO LanguageOfChoice, and having the same builtin functionality that any decent SQL database will offer you from only these statements. -- CostinCozianu (Oh, by the way, you may also try the EjbTernaryRelationshipExample if you have time.)
The Challenge (copied from previous discussion, and tightened up slightly) Disqualifications

The challenge was intended to demonstrate that SQL has useful functionality that is not readily available, nor trivially implementable in OO environments. This is in contrast to claims of OO developers (DbasGoneBad, PrevalenceLayer, etc) that if we could only get SQL databases out of the picture then we'd have all milk and honey. Since many books on OO talk quite a lot and quite without substance about OO "relationships" (aggregation, composition, ownership, directionality, etc), and since I know from experience that managing OO relationships is a frequent source of bugs in OO programs, I put this challenge with the intent to make some OoWeenie?s out there painfully aware of the limitations of their own tools. In particular this page debunks a pervasive misconception (see quotes in ObjectRelationalPsychologicalMismatch) that RelationalModel is not as good at representing "relationships" as object models. Which claim should be judged ridiculous on the face of it, but it is amusing how many "important names" in the OO camp make it. -- CostinCozianu

I would restate the above as: This challenge was intended to demonstrate that RDBMS's have useful functional that is not readily available, nor trivially implementable, in OO environments.

And you know what? I would agree with you. Implementation of a DBMS (relational or otherwise), or something with equivalent functionality, in any environment is a non-trivial problem. As is pointed out by others below, much of this challenge deals with persistence and transaction semantics, an issue with is (or ought to be) orthogonal to relational-vs-whatever else.

Unfortunately, as SQL generally isn't useful outside the context of a database, an apples-to-apples comparison (SQL vs OOPL) is difficult. I suppose another apples-to-apples comparison that could be undertaken is SQL+RDBMS vs OOPL+OODBMS, but the requirements above seem to exclude use of any canned solutions for transaction semantics and persistence.

Given that most OO environments don't come with a database engine built-in, nothing here is unsurprising.

I think further clarification of the challenge is in order. I have no idea what is being asked for. It appears to be how complex a job would it be to rewrite Oracle from scratch. What is being asked in this "challenge"?

Definitely not to rewrite Oracle from scratch. Only what is written above. Realize one of the much touted and advertised ObjectModel. More simple than that one can hardly asked for. Since a lot of developers complain they have an "impedance mismatch with their databases, I'm asking for them to show what they're capable of if they leave the database aside. Imagine you write with an magical VM that persists everything that's needed automatically, no more need for databases OO or otherwise. So put your code where your mouth is.

After lengthy discussion, it turns out the challenge is to write a significant portion of Oracle from scratch. The requirements are to provide all of the behavior a standard RDBMS provides through SQL. To meet those requirements one must address sharing, concurrency, transactions, etc. It is a straw man. Anything you submit that isn't a full blown RDBMS will be dismissed as not meeting the requirements.

No, the challenge was/is to realize a simple many to many object model. Of course a well implemented, production level object model should support concurrency because a majority of today's systems are concurrent and multi-user to begin with. Operations like lookup and remove should not take inordinately amount of time (i.e. should be better than linear). I didn't ask for full relational algebra support, nor for query optimizers, cache and log management recovery, etc. So no, I don't think a many to many relationship management is anywhere near a full DBMS implementation.

Read the challenge again, especially this phrase:

having the same builtin functionality that any decent SQL database will offer you from only these statements.

It excludes no RDBMS functionality. Any solution that is not an RDBMS is disqualified.

Ok, I was imprecise. I meant functionality with regards to managing the many to many relationship. Kind of obviously the statements were related to setting up the basic many to many relationship, and I even said somewhere but it got heavily refactored who knows where it is that the clients should e one statement away from create/read/delete of user, group or user-group pair. Now, are you happy with the challenge definition?

When you said concurrency was part of the requirement you lumped on most of an RDBMS. With concurrency comes multi user support, which brings security, caching, scalability and who knows what along with it. I suspect that it will be easier for you to list the features of an RDBMS you don't expect than it will to list those you do.

What I wanted to show is that the most widely used OO platforms don't offer even the ready made functionality so needed for realizing what in OO book is called (quite imprecisely) OO relationships, and user would have top go through considerable pain to implement what should be a basic feature of any ObjectModel. The language models that they go through are not even particularly helpful in this regard. So often times, disregarding other considerations (like relational algebra, query optimization, declarative data integrity) users will resort to RDBMS to implement and sustain a basic feature of their ObjectModels for which they do not have proper support in their OO platform of choice. -- CostinCozianu

Why should relational algebra be a basic feature of every object model? In the range of problems OO languages are used to solve, only a small slice require relational algebra. You won't find it in most OO programs because fast ad hoc queries aren't that important to most OO programs. The apps that need fast ad hoc queries usually use RDBMSs to provide that service. We don't "resort" to them, we chose them. It makes sense to build servers that specialize in fast ad hoc queries. Most OO apps don't use them because they don't need ad hoc queries, fast or slow.

This is turning into a classic YagniAndDatabases HolyWar.

Perhaps someone could write a UnitTest or AcceptanceTest that would define the problem? A couple of WikiZens asked for tests way back when this discussion began on DbasGoneBad and the response (paraphrased) was that there was no need for tests, as the SQL was correct by implication. -- StevenNewton

I was one of the WikiZens asking for UnitTests because my instinctive response (like many programmers) is to write code. Then I realized the challenge doesn't call for implementation details. No-one is volunteering to show how a given RDBMS implements this. The original challenger only showed how a user tells the RDBMS to create the tables. That's all any other solution needs to show. -- EricHodges

Since when UnitTest have been declared and glorified the SineQuaNon? of SoftwareEngineering ? Just recently KentBeck declined to write unit tests for the most trivial concurrent problem: mutual exclusion, justifying that he doesn't know how to do concurrent programming in XP/TDD settings. You want to take the challenge , be my guest. Complaining about the lack of unit tests is in my opinion whining and nothing more. What is required is written above, I think it is quite clear. And if you look at the four SQL statements I posted it should be clear enough for any software engineer worth its title at least for a start. Ask for relevant clarification if needed, but I'm not gonna write any tests for you, this is your job. -- Costin

Unit tests are on code are they not? How can you have a unit test on code that is not written? How can you write the code before their are requirements so you know what to write. Unit tests can't be requirements.

Yes, they can. (A failing unit test means I haven't yet implemented the requirement. See NeverWriteaLineOfCodeWithoutaFailingTest). But this is beside the point. Likewise, if you want to see source for an rdbs, go find an open source rdbs (MySQL is, isn't it?). I believe I understand any frustration described by those in the relational crowd... are we denying that there are rdbs's that understand sql? But, in any case, may this count as the user story documenting the requirement: A User can belong to several Groups. A Group may contain several Users. The user should be able to make an arbitrary query of any aspect of this relationship while the program is running without needing to consult a programmer to do it. -- cwillu

That user story is different from the original challenge. The challenge was for OO programmers to show an ObjectModel. There was no requirement for users to issue ad hoc queries without consulting programmers. To do that we'd have to write something like SQL.

I think some folks want to argue that it is harder to write an application than it is to use one. If that's the goal of this challenge you'll get no argument from me. It is harder to implement an RDBMS and SQL interpreter than it is to use them.

-- EricHodges

"having the same builtin functionality that any decent SQL database will offer you from only these statements." Until I've seen some of the same builtin functionality implemented, I'm unsatisfied. I'm an Object Oriented Guy. I'd really hate to 'win' this on a technicality. Indeed, it is harder to implement a database... it's interesting that the first several attempts at the challenge were doing naive implementations of just that. In any case, I don't think this is going anywhere until that's been provided, so look's like I've got work to do. -- cwillu

I'm confident that relational tables can be (and have been) implemented in Java. -- EricHodges

Perhaps, but what use is that over RDBMS written in C or C++ for native speed? Or the fact that they are already written. Why reinvent the wheel?

I'm not arguing in favor of reinventing the wheel. Please use the wheel; I do. I was only showing a Java solution to this challenge. -- EricHodges

So is this violent agreement?

Perhaps. I think there's an unspoken agenda in the challenge.

I'm not sure I understand this page. Is the question in essence something other than "Is there a language in which this problem can be solved in fewer lines than in SQL?"

Solved within an order of magnitude, I think.

Perhaps a RelationalLanguage better than SQL.

I'm sure OO is useful for implementing SQL. So what's the problem?

Can be done in assembler or BrainFsck also.

I don't get what this challenge is about either. Most of the functionality can be implemented in a couple pages of Python, but it's obviously more complicated than the SQL to create the schema. But so what? Using an RDBMS every single time you need a many-many relationship is dumb. Yes, an OO API on top of this users/groups model would probably end up implementing DatabaseVerbs (Top, are you listening?).

I used to use NimbleDatabase products and creating a table was a snap. Oracle may be verbose and formal, but that does not mean that all of relational or table-land must also be. See SimplifyingRdbms.

What's up with this disjunction between relational theory and OO? You can model relationships between objects just as easily as you can relationships between anything else.

I will believe it when I see it. Generally in OO you end up reinventing a lot of database wheels and encapsulation conflicts with the idea of a central and consistent collection-handling system. (There are other topics on this, I'll include them as I encounter them.)

More to the point, I use OO to model 'behaviors', while I use databases and SQL to model 'relationships'. The SQL to perform the queries is all well and good but it's useless in a vacuum - the code that implements behavior dependent on the relationships returned from the query is what I care about most. I'd really love an OO API that implemented relational algebra, preferably without transforming it into SQL behind the scenes. Anyone have one? -- ChrisMellon?

In my observation, many of the things that OO proponents call "behavior" can be transformed into more data-driven techniques. They just don't know how or don't want to. A large part of OO stuff appears to be CollectionOrientedVerbs done the hard way. RDBMS simply factor collection-oriented commonalities into one coherent spot/interface/convention. In other words, "reuse". -- top

From my reading of your various contributions, I get the impression that you aren't interested in implementing anything that your RDBMS doesn't already do, so I don't think you and I will agree on what I consider "behavior". What's wrong with implementing CollectionOrientedVerbs? An RBDMS is not a catch-all solution! The need to search an array does not imply the need for a complete database solution!

Balderdash! I just know when and how to use a RDBMS to reduce the grunt-work that has to eventually be done in app code.

That's fine, and I like to think that I do too. The difference is that I don't leap to an RDBMS as my first solution any time I need a data structure. I also tend to implement things that are much lower-level than you do (from my understanding), like GUI widgets, and have to care about things like algorithmic complexity and memory usage.

Higher level abstractions can indeed be memory hogs at times. I won't dispute that. I would rather let the machine slave away than me. MooresLaw will eventually favor my approach in more and more areas. But, I am not sure what you mean by "algorithmic complexity".

Certain implementations and certain data structures have different performance characteristics - you could model a stack by using a raw C array, but it's really wasteful, independent of Moores law. Pushing everything off to a database means you're stuck with the performance characteristics of your database (a very complicated thing to figure out, generally) rather than being able to select a model efficient for the task at hand. Consider the performance difference between a simple linked-list stack implementation vrs implementing a stack in terms of relational tables. Moores law isn't a magic bullet, especially as processor speeds start topping out and performance increases come in other areas, like parallelism.

Often the database is GoodEnough. If there are performance problems and index tweaking does not help, then you isolate them and explore alternative implementations. If you have to only do this to 10% of the app, then the out-of-the-box collection handling still results in a net improvement in terms of code simplicity and consistency. If you are in a domain where 50% or more needs to be reworked away from the database, then maybe a database is not the right tool for the job. Perhaps this should be moved to AreRdbmsSlow.

Incidentally, I don't believe that you could implement the more complicated Windows NT user/group/capability model using only SQL (there's some Oracle extensions that would let you). Anyone want to take a shot?

See AccessControlList. Once the schemas are set up, the rest is just run-of-mill CrudScreen work. -- top

I should have been more specific, but it really was an aside. You can model the relations just fine, I don't think you can implement the logic(behavior) without resorting to vendor-specific SQL extensions or post-processing the query in some other language.

I never said the DB does the entire thing. It is about DivideAndConquer. If I can get the DB to do 80% of the work, then I only have to do the last 20% in app code. On a few occasions it will be 20%/80% (reversed proportion), but I still win on the total. There is a similar discussion under IsDeclarativeLessExpressive.

I understand that (and I believe that most people actually do the same thing, arguments over what is more efficient aside), but it goes to what the challenge purports to prove - of course you can model a schema very quickly. Implementing the lookups (not done for this challenge, note) adds to the complexity, and in some cases isn't possible, at least using currently available tools. When I implemented a security model (ACL based in concept, slightly different model than NTs), I wasn't able to write a query that could implement a yes/no answer to whether or not a user could access a specific resource, because I couldn't model the collapsing of allow/deny records without using Oracles decode() functions. Might just be that I didn't know enough about SQL, of course, I'm a much better OO programmer than I am a relational one. I've also written a hand-rolled version in Python (which did some neat stuff with weakrefs to keep the cost of deletes cheap) which I far prefer from an elegance point of view - I really, really, really hate code generation as strings.

I agree that SQL can be an annoying relational language and should be overhauled. However, you seem to be falling into the trap of wanting to do the whole thing in a single query. Often that is not possible and one must do further processing. Often the query will reduce the size of what is being operated on in the least. If you cannot do that, then often it is a sign of bad schema design, although I cannot inspect your particular scenario. But even so, other techniques such as server-side cursors may be in order as a last resort. -- top

Perhaps the dialog topic should be shifted from 'database' to 'relational model'. 'Database' usually implies a heavyweight monster program that needs a dedicated datacenter and three specialized dba, whereas 'relational model' implies only an interface to a, well, relational model, which could be easily added as a library to your language of choice, for instance C++ Views At this point, the discussion diverges between how easy is to extend a language to support a new paradigm (relational model), in which OO might help, and how useful a 'behaviour is more important than data' tenet of OO evangelists view really is, IMHO not much. -- cristipp

I believe the problem is that we want to believe that "ObjectOriented" is the only way (or that Relational is the only way) the truth is that "BalancingParadigms?", or "MixingParadigms" or the "MiddleWay" is the only real way... we need an ObjectRelationalMixer, an language that can many capabilites (ObjectAndRelationalAndFunctionalAndProcedural?). Relational is not the best way for all problems... Object Oriented is not the best way for all problems... Procedural is no the best way for all problems... but Procedural and Object Oriented have already become friends... even Functional is already a friend of Procedural and ObjectOriented... so why not make Relational a friend of ObjectOriented? Let's have ConceptuallyOrientedProgramming!

I think part of the disagreement is about *when* to use what. A RelationalWeenie will often accuse the OO'er of trying to hide relational behind OOP instead of using relational directly where it is allegedly more appropriate or natural. In other words, sometimes direct relational (such as SQL) is the best way to express part of a given problem. It is an issue of mixing versus hiding. I don't think a RelationalWeenie will balk at using an OO API to talk to the file system or FTP. But using OO API's to search, sort, filter, and join bunches of attributes is another matter.

And the industry has made it easier for the RelationalWeenie to complain, because they have assumed that DatabasesHaveToBePersistent and RelationalOperationsAreOnlyForDatabases, and alse DatabasesAreAnOutOfProcessThing?, and by doing that, they limit the power of the local language (Why do I need to go and store my data in to a foreign program just to be able to apply a RelationalOperator? to it?)... It should be possible to perform RelationalOperations? on local data structures... and I think we are seeing the beginning of that, thanks to .NET and the LINQ ( LanguageIntegratedQueryProject )

ManyToManyChallenge | ManyToManySolutions | ManyToManyDiscussion | ManyToManyReflections?

View edit of July 25, 2007 or FindPage with title or text search