Does anyone have data or even a gut feeling for the relative performance (speed, memory etc), of the following styles of system development:
- Traditional SQL/RDBMS design
- OO design with OO database
- OO design with RDBMS: OO-Relational mapping layer
See also ArchitectureAndObjectDatabases
I used to work at UBS in a TimeSeries?
database project. The database was built on top of ObjectStore
. In one application we got response times from several minutes down to < 5 seconds. Not because we were such great programmers, but because relational DB's just have problems with that kind of data. So my point is that the above question can't be answered once and for all; it depends on the kind of data and queries you have.
OK, I'll spill my guts: ;->
RDBMS Embedded SQL works well with traditional procedural development.
There is a small "impedance mismatch" because most host languages don't handle NULL values well.
But low levels of reuse in traditional procedural programs is a good thing for RDBMS performance, as the SQL queries can easily be optimized for the specific needs of the given program.
Performance suffers as reuse increases, as the four CRUD (Create, Read, Update, Delete) functions are written for each table, treating the relational database as a crude file system.
OO design with a mapping layer to a relational database suffers a higher impedance mismatch, and generally reduced performance.
The stronger abstractions and higher reuse levels give systems the performance problems of the high-reuse procedural systems.
Classes with significant inheritance hierarchies and/or flexible relationships with other classes won't map well to a relational database.
Substantial denormalization may be needed, complex mapping between OO and RDBMS may be needed, and you may still have to do many complex (and expensive) RDBMS joins.
Also, one has to consider the ticklish question of object identity: If data from the "Customer" table is read using two different queries, do they both refer to the same Customer object in memory?
The more you use powerful high-performance SQL data manipulation techniques, like "group by," the more serious the object identity problems will become.
Naive solutions can have serious performance problems.
On the other hand, OO techniques can give you great architectural flexibility:
By building database-centric OO abstractions, instead of business-centric ones, and writing in a more procedural style, an OO application can achieve the same or better performance as an equivalent procedural application, and still be more maintainable.
But the system will be substantially less maintainable than if it had a business-focused architecture.
The main strength of an OO database is that it gives substantially better performance than a RDBMS, with the performance difference increasing, as the application becomes more complex.
A historic weakness of OO databases has been weak ad-hoc query support.
OO databases are most successful in applications with significant hierarchies of classes involved in complex relationships, such as drawing databases for engineering.
RDBMS's are most successful in business systems, where the data models and processing are relatively simple and straightforward and the better support for 3rd party tools and ad-hoc query capabilities are more important.
My opinion is based on significant and varied experience with relational databases (RDBMS), in both procedural and OO development environments.
My opinions on OO databases are based primarily on research (reading and seminars), rather than direct experience.
I've tried a number of different approaches to RDBMS access, from procedural and OO programs, but have not had the opportunity to use many 3rd party tools.
I was involved in an evaluation of OODBs, RDBs, and ORDBs for AC Nielsen a few years ago. We drew performance curves for every beast available at the time - ObjectStore
, Versant, Persistence, GemStone
, Informix/Illustra, Oracle, Sybase, Kala and about a dozen smaller guys (the ones that used to promise the moon in the trades back when OODBs were the NextBigThing
The upshots were like this:
- It's VERY important to characterize your performance and scalability requirements for the DB. Just picking one and saying "Go!" may buy you big trouble when you least expect it.
- Insert speed, Delete speed and Query speed are not the same.
- Most OODBs don't actually have any DBMS functionality - they just supply persistence and related basic services. This might not suit you.
- So long as all your objects fit in memory, OODBs aren't slower than RDBs. Once paging kicks in, however, they take a severe performance dive. This issue is a moving target - RAM is becoming cheap faster than disk. On present projections, after 2003 RAM will be cheaper than disk.
- Transaction throughput might be a more important problem for you than sheer database grunt. If so you can use a ThreeTierArchitecture to solve your problems instead of buying the gruntiest server in the world.
- Some databases add bells and whistles that may be fundamental to your problem domain ... or not. And license fees may be more or less variable ...
As for Jeff's impedance issues, I feel that if you use CrossingChasms
, RDBs need not be significantly more difficult to work with from an OO app than OODBs. Anyway when I have my druthers I go with ORDBMSes myself. --PeterMerel
I guess I'm getting the answer that I expected - that it is a complex question and there's no easy 'one size fits all' solution. Many thanks for your comments.
I'm specifically thinking about apps which are in the middle of the complexity scale. Not really the simple CRUD type - almost looking directly at tables in a DB. Nor really so complex they must be done Object Oriented. If these systems have a relatively high transaction throughput (they may be hosted on small servers for reasons of cost), then which is the best solution architecture -- PeterForeman
Okay, I'll stick my oar in here too ;). If you are primarily navigating hierarchies of objects, then an OO solution works great. If you are running complex queries against large groups of tuples e.g. a batch update process, then object databases suck. If you think about the work done in querying and where that work is done the reasons for this become obvious.
Some vendors use hybrid object mappings which allow for some work to be done in the database domain but they are ugly and require yet more meta-information to confuse people. Part of the JobSecurity
You might be interested in knowing about the world's largest database at 648TB is an OODBMS http://www.slac.stanford.edu/BFROOT/www/Public/Computing/Databases/
by a company called Obectivity.
See also: AreRdbmsSlow