costs time and money. Some feel that these costs represent a problem that partially comprises the ObjectRelationalImpedanceMismatch
costs schedule time on a software development project that is already using GemStone
as an application server, due to the (substantial) additional effort required to:
costs runtime response time due to the additional inter-process communication, network I/O, object instantiation, and processing that is required to execute SQL statements that move DomainObject
state between the RDBMS and the address space of the process running the software written in the ObjectOrientedProgrammingLanguage
. This runtime response time cost has been anecdotally observed by some, with caveats, to be anywhere from twice to an order of magnitude more, for certain applications and use cases, than the runtime response time observed for the same applications and use cases running on top of GemStone
costs money. Capital expenditures are required to procure additional third-party products if the project is already using an application server that has ObjectOrientedDatabaseManagementSystem
capabilities, such as GemStone
. Recurring labor expenditures are required to staff the additional development efforts of administering an RDBMS and developing and maintaining schema and mappings.
The potential benefits that offset these costs include:
- the possibility of maintaining DatabaseApplicationIndependence;
- SQL-based access to the data that comprises the state of the objects; and
- greater confidence in vendor viability.
In support of these conclusions RandyStafford
offers the following anecdotal evidence from his professional experience:
For whatever reason, I have, over the course of my professional experience, seen a number of DomainModel
s that are simply not amenable to ObjectRelationalMapping
because of poor runtime response time performance, and because of prohibitively high development-time effort to map them.
In 1992, I worked for Ascent Logic Corporation (http://www.alc.com
), vendor of the CASE tool RDD-100, which is written in the SmalltalkLanguage
. RDD has an incredibly complex, meta-modeling oriented DomainModel
of about 150 deeply inherited types and very large, deep, heterogeneous aggregation hierarchies whose hundreds or thousands of nodes must all be in memory simultaneously to support its rich drag-and-drop diagram editors. Ascent Logic spent some $1.5M, which is an expensive project in a capitally-funded startup, trying to get RDD to store its data in Sybase over a homegrown mapping layer (this was before the existence of any commerically-available ObjectRelationalMapping
layers). They scrapped the project after only being able to achieve an object retrieval rate of one object per second. They now ship the product on GemStone
In 1998, I directed the development of Synxis Corporation's (http://www.synxis.com
) hotel reservation system named Synxis Agent. We had a requirement to build an "Attribute Inventory System", in which hotel rooms are not described by "GDS code", e.g. KNS for "King Non-Smoking", but by attributes - e.g., "not on first floor", "away from ice machine", etc. And of course we had to be able to answer availablity queries, and book reservations for rooms, using this system. I had a guy working for me who came up with a brilliant solution to the problem, based on classification theory from the philosophy literature. His solution had about two dozen types, nested in an inheritance hierarchy, that also participated in very deep heterogeneous aggregation hierarchies. Once again, these hierarchies had to be in memory to support availability queries and reservation bookings. We used GemStonej
to build this system because that was the fastest path to market (this was also a capitally-funded product development effort in a startup). That system has been in production since January 1999 and is currently taking thousands of reservations per week.
When we built the FoodSmart
example application at GemStone
in 1999, we intentionally chose a problem domain whose DomainModel
had similar characteristics (deep heterogeneous aggregation hierarchies). FoodSmart
can run in either a relational persistence mode (using an RDBMS) or an object persistence mode (using GemStonej
's OODBMS), according to a property that is read at startup time. I measured comparative response times for the application's use cases, including data loading use cases, and typically saw 2X better performance in object persistence mode (caveat: these were anecdotal observations, not a scientific study, and no effort was spent doing tuning and optimization in either persistence mode).
At my current client, Traveler's Express / Money Gram, I have seen even more dramatic response time performance differences between object persistence mode and relational persistence mode for an internal J2EE information system we're building - anywhere from 3X to an order of magnitude for database loading use cases (the same caveats apply) - and this system doesn't even have very deep heterogeneous aggregation hierarchies. On this project we have most definitely hit the ObjectRelationalImpedanceMismatch
- see TopLinkForJavaUsageExperiences
And there are more. These latter anecdotes don't even begin to address the additional man-months of effort, and capital expense for RDBMSes and ObjectRelationalMapping
layers, that are required to develop a relational persistence mode.
The conclusion I've reached is that there are certain types of DomainModel
s for which object persistence is clearly better than relational persistence. The interesting questions are what percentage of applications have such a DomainModel
, and what are the characteristics of such a DomainModel
. On the latter point, I certainly feel that deep heterogeneous aggregation hierarchies are a key element. Clearly not every application has such features, and relational persistence is just fine for those types of applications. But I've experienced at least a half-dozen of the other kind over the years. My summary is that if you intend to implement a DomainModel
using an ObjectOrientedProgrammingLanguage
(because you believe in the advantages or whatever), then you need to think very hard about how to persist your objects.
I fully admit that runtime response time performance, and development effort required, are only two factors in a big trade-off. Historically, the two biggest issues with object databases are vendor viability and SQL-based access to the data. The first issue is a SelfFulfillingProphecy
- if the market doesn't adopt ObjectDatabase
s then their vendors don't get the revenue stream they need to compete more effectively (see CrossingTheChasm
). On the second point, GemStone
has solved this problem in both its Smalltalk and Java product lines.
This problem is not unlike domain object state distribution between EJB VMs and client VMs in a J2EE architecture (see DomainObjectStateHolder
). What's painful, both at development time and at runtime, is moving domain object state between address spaces. This is why I have become a fan of "undistributed" architectures over the years. GemStone
enables undistributed architecture, and the savings are huge. Another way to "undistribute" your architecture is to implement your DomainModel
's business logic in stored procedures in an RDBMS - but then you lose the advantages of using an ObjectOrientedProgrammingLanguage
to implement your DomainModel
Of course object/relational mapping costs time and money. More, the mapping tools have an arguably not so good reputation.
But the wrong premises is that you start with a well-rounded ObjectModel
which is lower level than a logical data model, and then you want to put that exactly as it is in a relational model. It's not going to work well, it will be painfully slow, costs time and money and so on. If you're going to do it this way, instead of Object/Relational mapping and a relational database you're way better off just using an object database. I can enumerate a lot of disadvantages of approaching an OO aplication against a relational database this way (generically called Object/Relational Mapping).
Of course a relational database is a poor representer of run-time computing structures, the relational model was not designed for that.
You do not want to use relational database to store object models, at least not at the flexibility level offered by a OO runtime, or you may do it because of political reasons. But as Allistair Cockburn just pointed out (to an older paper of his) a well-designed logical data model and the structural aspect
of a well-designed OO model should converge, therefore you shouldn't have problems.
What should be considered instead is to start with a logical data model, somewhere along the principles of DatabaseApplicationIndependence
. Then you use an object oriented programming language to implement use-cases (transformations or just simple reporting on data). Then all of a sudden it doesn't cost that much time and that much money, and you also get some other benefits along the way. -- CostinCozianu
the wrong premises is that you start with a well-rounded ObjectModel which is lower level than a logical data
This is not the premise I start with. As I said on ArgumentsThatTheObjectRelationalImpedanceMismatchDoesNotExist
, I believe that a conceptual domain model, expressed as a class diagram, and a logical data model, expressed as an E-R diagram, are essentially the same thing (from a structural perspective) - so I completely agree with Alistair. But that is where the divergence begins. OO developers will typically create an implementation-perspective domain model that is isomorphic to the conceptual domain model. Schema designers may or may not create a relational schema that is isomorphic to the logical data model. At the very least, for a one-to-many relationship, an OO developer will use a collection held by the object on the "one" side, whose members are the objects on the "many" side, while a schema designer will put a foreign key column in the table on the "many" side to refer to the entity on the "one" side. This divergence is one of the issues that ObjectRelationalMapping
tools have to deal with.
What should be considered instead is to start with a logical data model
I do. I start with a conceptual domain model which is structurally the same as a logical data model. And guess what? I still hit the ObjectRelationalImpedanceMismatch
, and I still see that ObjectRelationalMappingCostsTimeAndMoney
Then you use an object oriented programming language to implement use-cases (transformations or just simple reporting on data)
You also use an object-oriented programming language to implement DomainObject
s that are manipulated in the use cases, and those domain objects have to be mapped to the relational database if you're using one. In the abstract, I won't disagree that use cases can be viewed as resulting in transformations on data. But in the domains I've worked in, the transformations have been complex, and the reporting has not always been "simple" either, and I've found that the best way to manage the complexity in an understandable, maintainable, extensible way is to have an implementation-perspective domain model that is isomorphic to the conceptual-perspective domain model, and allocate meaningful behavioral responsibilities to the domain objects in the implementation-perspective domain model, and use inheritance, polymorphism, and delegation to achieve OnceAndOnlyOnce
Then all of a sudden it doesn't cost that much time and that much money
Oh bullshit. You can't just wave your hands, or some magic wand, and make the problems go away. My experience has been that starting from a logical data model (i.e. conceptual domain model) does not alleviate the ObjectRelationalImpedanceMismatch
or change the fact that ObjectRelationalMappingCostsTimeAndMoney
and you also get some other benefits along the way
That is true. Using relational databases for storage of data (object state) can have definite, significant benefits. So the costs and benefits have to be weighed in a big trade-off about how to store object state on a project that intends to use an object-oriented programming language to implement a domain model, and that has persistence requirements. There are a lot of factors, besides these costs and benefits, in the trade-off. One of them is the culture of the company sponsoring the project. One of them is the viability of vendors - and the lack of certain capabilities in OODBMSes. One of them is the requirements for the application. So different answers are right for different projects. But none of that changes the fact that ObjectRelationalMappingCostsTimeAndMoney
Let's start with: Then all of a sudden it doesn't cost that much time and that much money... Oh bullshit.
Ok, I didn't mean to say the costs will disapear, software development is inherently costly. I mean it doesn't cost that much time and that much money compared to
object/relational mapping. As a matter of fact I think that when technolgy and theory will become more mature, we'll be able to see in retrospective that object/relational mapping is what costed us a lot.
I, personally, never worked with some kind of object/relational mapping so maybe I'm not the most qualified person to talk about object/relational mapping. But I heard stories from my close friends about project disasters, I've seen lots of theoretical papers, pattern languages, idioms, broken technologies that all dealt with this issue. And I could also have chosen O/R mapping as the way to go a couple of times, but I didn't because it didn't make any sense to me. Once I almost did, but backup quickly.
There are several problem with object/relational mapping:
- Load object/modify/store object approach seems to be the only way taken into consideration with O/R mapping. What about if you don't want to load the whole object, what about if the modification can be simply and most efficiently resolved by an update
- Related to the first one, the only possible ways to deal with concurrency issues is either hand-written (or maybe framework written) optimistic locking, or serializable isolation level. This cuts your option very drastically and you may be not satisfied with some results
- Related to the first one we can't find an OO language that has proper support for operation on sets (other than iterators, visitors and the likes)
- Related to the first one, what-if the modification can be handled by a simple UPDATE, what if it is too complex and performance critical so that only a stored procedure (even a SP in CeeLanguage) will do it. I don't like when I don't have options.
So it is clear in my view, that instead of using an O/R mapping strategy you're way better off using an object database, because the eseential architecture of the application will be the same, but the relational model was not designed to support this approach, and you're paying tons of money for a RDNMS that you're not using , maybe you could use 10% of what an RDBMS has to offer. -CostinCozianu
I mean it doesn't cost that much time and that much money compared to object/relational mapping
What doesn't? What is the "it"?
So it is clear in my view, that instead of using an O/R mapping strategy you're way better off using an object database
Huh? Costin, is that you? You're kidding, right? Did I just go through a wormhole into a different universe or something? :-) -- RandyStafford
No, I'm allright and I say it twice to please you: instead of O/R Mapping you're way better off using an object database. At least that's the humble opinion of someone who spent lots of hours on database theory and relational model in general and enough experience in designing and programming database applications. And you have to be aware that I never used ObjectRelationalMapping nor have I ever used an object database. You might find dissenting opinions if you'd ask a O/R mapping vendor :), but for the reasons exposed above I would have a hard time to believe that.
But what if we try instead a RelationalObjectMapping? ?
What would RelationalObjectMapping?
look like / be about?
Now, having said everything I've said in this whole debate, I will now confess that my last client and my current company are using relational databases and doing O/R mapping, and for good reasons. Yes, there is pain involved, as usual - which is why I had to differ with the assertion that the object/relational impedance mismatch does not exist - but the RDBMSes won the trade-offs for these two projects.
is just a badly chosen name, but the idea is to do exactly what Microsoft does: accommodate in the best way possible the OO programmer with manipulating relations, constructing/modifying/executing queries, and only in the most stringent cases construct the usual objects out of data. Forget about object persistence
, you just have a logical data model, that you have the flexibility to manipulate in several distinct ways, and you can do that from inside an OO language. Not that big of a deal.
So you map (through a well thought out framework) the relational model to the object world instead of doing the other way around. -- CostinCozianu
Tackling a few of Costin's (very important) points...
Load object/modify/store object approach seems to be the only way taken into consideration with O/R mapping. What about if you don't want to load the whole object, what about if the modification can be simply and most efficiently resolved by an update
Toplink & WebLogic7 will allow you to group fields together so you can only partially load the object, or not load the object at all and simply perform the update on the fields in memory which will translate to a single UPDATE SQL statement.
Related to the first one, the only possible ways to deal with concurrency issues is either hand-written (or maybe framework written) optimistic locking, or serializable isolation level. This cuts your option very drastically and you may be not satisfied with some results
Toplink or Weblogic7 offers quite a few concurrency models... pessimistic, optimistic, read-only, none. For optimistic, you have further choice, you can perform conflict checks on all read fields, modified fields, a version or timestamp, or even write your own custom hand-written locking. Similarly, you can declaratively set your isolation levels on the method invocation (i.e. the first isolation level it sees at transaction begin time will be the one it uses for that database).
Related to the first one we can't find an OO language that has proper support for operation on sets (other than iterators, visitors and the likes)
Completely agreed. There's a paradigmatic chasm there.
Related to the first one, what-if the modification can be handled by a simple UPDATE, what if it is too complex and performance critical so that only a stored procedure (even a SP in CeeLanguage) will do it. I don't like when I don't have options.
Toplink allows you to map any operation down to a stored procedure, and will even generate the basic CRUD operations to get you started.
O/R mapping is about preserving the relational model & saving time and money. I agree with Randy that some domain models really don't map well to it. The Ascent Logic example would be the one that *needed* an ODBMS. The Synxis example seems plausible in an RDBMS, I've seen complex aggregations mapped to databases with relative success (assuming you have a very good DBA). I've seen Foodsmart, it was pretty complex for what it did, in my humble opinion - most web applications simple enough to be well suited to an RDBMS. I'm a fan of both object databases & O/R mapping, but O/R mapping has just been more useful to me lately on my projects. -- StuCharlton
I agree with everyone here (well why not, the sun is shining....). It is worth noting the forces which produced the relational model. One is language independence. That's the one Sun doesn't like talking about :). Another is safety in a shared environment etc etc.
It all links together, but at the end of the day, if you combine the forces which relational databases seek to resolve, you get the relational model by logical deduction. The object model weakens a number of these assumptions, and thus can use a less constrained model.
It is unfortunate that the reasons the relational model is the way it is seems poorly emphasized, and thus the horrible failures when people use an object persistence model which does not have the architectural attributes/constraints which they require e.g. sharing of a large repository. The problem is insidious because it is only at production scale that these forces come to the fore, a classic symptom of a poorly considered architecture.
It's definitely worth a page to explore these issues. I'm not sure what to call it. Something like an architectural analysis of persistence models. The idea would be to expose the forces at work, the constraints that are implied by the particular models, and how it all fits together. -- RichardHenderson
Issues with ORM's
- Partial- and Cross-Record Loads - In relational queries, it is common to only select those columns actually used in a given process. Relational will mix and match (join) entities to produce a task-specific viewpoint. This viewpoint is not tied to original entity "shapes". But OOP usually wants to keep the entity (domain noun) shape per encapsulation "rules", using links for object relationships instead.
- OO usually makes a big distinction between an individual object and a collection of objects, whereas in RDBMS there is no real difference: going from 1 to a million is seamless (outside of performance issues). In OO, the collection is usually a different object/class than the items in the collection. Cursor-like techniques can be applied, but those tend to make OO look like a (navigational) database because domain objects then must start following global collection-oriented rules, not just collection classes, losing the self-handling-noun feel common to OOP.
- Object Freshness - In a query-centric world one tends to make a snapshot (result set) and then flush it away when a particular task is completed, or use all-or-nothing AcId transactions. In OOP the duration of objects may be longer or less certain. Techniques such as "lazy loading" may be used, but the implications of these with regard to freshness, concurrency integrity, and mapping to the RDBMS world are either not well-understood, or have no consensus rules.
Long discussion titled "Object-Relational Mapping is the Vietnam of Computer Science":
It is perhaps influenced by or named after this: