Are Rdbms Slow

Depends what you're using them for. They're very slow at boiling eggs, for example.

Not if using a Pentium with 99.9% CPU usage :-)

Some have claimed that RDBMS are slow for large applications/systems. The impression I have obtained over the years is that if the requirements don't change much, then a dedicated or limited custom-built structure is probably faster. However, if you want a large AND flexible system with pretty good performance, then it is hard to beat a well-tuned RDBMS.

There are also certain special tasks that existing RDBMS have proven too slow for. This includes text indexing and some CAD systems. However, if one wants to tie the CAD system in with inventory control, accounting, etc., then a RDBMS will be helpful. It just may take a bigger machine than when doing dedicated CAD.

The key is the likelihood of any given piece of data participating in multiple viewpoints/applications/user-disciplines. If you know a given piece of data will only participate in a small and fixed number of operations, then something besides a RDBMS may be more appropriate.

Databases are too slow for most soft real-time reactive systems.

Why is that?

How long does it take you to create your graph of objects from the database from a stimulus? For example, I have 500 msecs to respond to an event from a device. The database can't be in that path. Now let's say I have 1000s of these things. Now let's say they can emit stimuli at 1msec intervals. Not everything is a web page.

That's not something I would ever be likely to do, but nor would I ever connect a web page to a database.

I think sometimes we forget that people are doing different things out there under really different environments. The rules when building a payroll app may not apply to building a cruise missile.

One would hope not. But is either a "soft real-time reactive system"?

There aren't that many hard real-time systems. Most systems that use real-time OSs aren't hard real-time or very small parts of them are. They need latency and priority, but the world won't end if they miss a dead-line.

Note that there is a distinction between "slow" and "unpredictable". Automatic garbage collection and on-the-fly optimization may introduce unpredictability into response times. Thus using a RDBMS for a missile guidance system may not be appropriate not because of the speed, but because the timing variance range introduces too many unknowns into the system. It may make 90 percent of the calculation tasks faster, but 10 percent slower, for example. But that 10 percent can be a killer. We don't want 10 percent of all nuclear missiles to burp. Generally the higher the levels of abstraction in the tools used, the more this is an issue. This issue alone will probably keep C alive for a long time. See RealTime. Whether C is the fastest or not is only a secondary issue. One has more control of the details, and that is what is important. However, it may also result in a loss of abstraction because more has to be micro-managed.

Nor would I ever connect a web page to a database.

Why is that?

Simple: I don't create web pages :) But if I did, I would want to maintain separation between it and the database in the same way as I would for any other UI.

It's a thin separation when you have to do all the OR mapping crap.

You think? If I were into the O/R game, I'd want an even thicker separation, with the O/R mapping server-side.

There's now so much separation that there's no real code anymore :-)

But less is so much more. Does anyone semi-persist their object graph in xml messages client-side and persist their xml server-side? Twice the fun with half the anguish.

If it takes 100 msecs to serialize an XML message then you are very limited in your throughput. In some apps this works. In another I had to rip it out.

Hmmm. This seems to be a consequence of the "fact" that all RDBMS are large, highly abstract, resource hogging hash-happy engines.

Of course this means that if someone designs and implements a language environment that is streamlined, high-performance, bit-skinning code that can also manage data, it must not be an RDBMS.

Then, naturally, one can be comforted that an RDBMS will always be a plodding beast, since that's how it's defined.

I used the DOS version of FoxPro for all kinds of high-performance stuff that had little to do with managing large quantities of data. It helped revise my thinking on what to expect from database engines and languages.

No, it didn't do RealTime without low-level glue-on code in CeeLanguage or AssemblyLanguage, but we wrote a graphical "flyover" of a casino gaming floor using FoxPro and a graphics package, and the result out-performed two competing products written in PascalLanguage and CeeLanguage.

We used the same language/engine to manage millions of records of machine, customer, and event data and produce analyses and reports from live systems to management.

But you have to remember that FoxPro is not a real RDBMS. -- GarryHamilton

Depends how you define real. Obviously the smaller or simpler the DB engine the less features it will probably have. It is not an all-or-nothing thing.

Related: NimbleDatabase

Quoted from above I have 500 msecs to respond to an event from a device. The database can't be in that path

Why is that? Where I work, we have built a real-time application with performance requirements stating that one event should take no more than 10 CPU milliseconds. This app includes an RDBMS called TimesTen?, and during the processing of this event it makes between 5 and 7 round trips to the RDBMS. On HP PA-RISC hardware, the whole event takes about 7 CPU milliseconds, including about 3 milliseconds spent talking to the database.

RDBMS will be slow if commit involves physical writing to disk - like any other persistence mechanism will be, on this condition. Remove it, and RDBMS can be quite fast.

There is nothing in relational theory that says every transaction has to do an immediate disk write. RAM cache is for DB's also.

Exactly. My point is that relational database that caches whole dataset on RAM and writes changes to disk asynchronously to transactions can be very fast.

This database is called KDB, part of KayLanguage.

-- AlexeyVerkhovsky

I don't want to be locked into a particular database. Our data set is much larger than can fit into RAM. You've already spent a large amount of your time budget. I hope you don't have to do much else. 10 milliseconds is a huge amount of time.

Well if 10 milliseconds CPU time is an issue there than somebody screwed up by not writing a very little prototype that could have measured exactly the what the DBMS can or cannot do, at the beginning of the project.

10 milliseconds is not an issue; although it would be with, say, Postgre or Oracle.

And no, we don't have to do much else - just read a set of name/value pairs from a socket, deserialize, parse, retrieve corresponding data from a few DB tables, do some messy business logic and store results back to DB; send another set of name/value pairs to the client. Our dataset is too big for RAM, too. So we have it on Oracle and only the subset needed for this part is replicated to in-memory.

If you tell me that this wasn't the best choice of persistence layer, I may even agree. However it works. -- Alex

Well, I've seen such issues happening. And I also proved that you can make it work with a very much unoptimized database like PostgreSql (it doesn't have real prepared statements, cached execution plans, etc) and on a quite restricted processor (Celeron 500).

Here's how it happens you have a use case to which you want to respond in bounded time, say 1 second. Disregard the O/R mapping issues and get together with the developers, understand the use case and define the data that the use case needs, and the updates that need to be happening - if any. Then go write a simple demo program that doesn't go through O/R mapping at all, but execute whatever needs to be executed to get the data out and the data in. Voila, the client driver process takes 1ms CPU time and postgresql even unoptimized as it is and having to really commit transaction will take around 27 milliseconds. That leaves at least 950 ms for all the other processing to be done, and it can be done.

Well, the problem turned out to be that because of the particulars of how objects were loaded and saved in the O/R mapping layer, postgresql was taking more like 300-500 ms, so the problem was not AreRdbmsSlow but that O/R mapping can screw up big time. Not that I'd recommend RDBMS myself for soft-realtime systems but it so happened that the database was already chosen and the project was quite advanced, so I proved that it can be done. So DBMSes can be quite fast. -- Costin

By "soft" real-time, I assume you mean cases where occasional second-or-more response times (due to garbage collection kicking in, for example) won't kill anybody or lose a million dollars. Generally I would define soft real-time as something like only one in 100 transactions taking longer than the specified upper limit. In other words, a system that can live with occasional pauses as long as the vast majority are zippy.
Parallel Processing

RDBMS have the potential to benefit greatly from parallel/concurrent processing without having to recode the query text. For example, if the query involves multiple indexed columns, then each column index can be potentially scanned in parallel. (An equality comparison won't see much difference, but a "LIKE" scan may.) Similarly, if it involves scanning multiple tables, each table can potentially be scanned in parallel. Even the same table can be virtually split and scanned in parallel. For example, assuming you have a million-record table and a pair of RAID disks. Disk 1 may be scanned for records 1 through 499,999; and disk 2 scanned for records 500,000 to 1,000,000 in parallel. Of course there are potential interaction issues that may complicate things, as usual.
I have come to the perfectly reasonable conclusion that RDBMS are slow by about an order of magnitude over optimal, and that slowness is inherent to the problem domain. I know how to beat RDBMS by about 10x; however this comes at 60x development costs and the caveat that if you come back with some new report you want later or some bulk change that was not forseen at design time your development cost is 100x or worse and your delivery date is six months out. I'm serious, the 10x cost buys a lot in flexibility. --JoshuaHudson
See also: WhenAreDatabasesNotAppropriate, MinimalTable, PredictabilityVsPerformance
CategoryPerformance, CategoryDatabase

View edit of July 21, 2014 or FindPage with title or text search