Prevalence Layer

Provides TransparentPersistence, Replication and FaultTolerance for business objects (PlainOldJavaObjects) through a combination of transaction journals and state snapshots. -- KlausWuestefeld.

Prevalence layers are typically three orders of magnitude faster than traditional databases for queries and just as fast for transaction processing.

ThePrevayler is a Java implementation.

How is a PrevalenceLayer different from GemStone/GemStonej or an ObjectOrientedDatabase?

For all the details, read the papers and articles. The basic ideas behind prevalence are as follows: If you don't want the restriction of using command objects, you could also implement this transparently using AspectOrientedProgramming. Intercept every method call Prevalenced(?) objects, serialize them, and log them. In more dynamic languages, it can be implemented this way without AOP. Ruby's Mnemonic prevalence layer does something like that.

Couldn't you declare transaction boundaries declaratively, like in EJB?

For another example, see PrevalenceInMusty.

It should be noted that a PrevalenceLayer is usually language-specific. This is part of the reason for its speed, but may also be limiting in a multi-language shop and perhaps risk tying data access to one particular language. It is said by some that historically, data outlives languages.

Has VisualBasic got the equivalent PrevalenceLayer, or can something similar is needed for that language?

That's the problem with previes, you gotta make a new one for each language.

I'm very impressed by object prevalence. I think that's the right way for object persistence. Their website is -- ShiYiYing?

KentBeck, RobMee, HumbertoSoares? and KlausWuestefeld paired up for a weekend in December 2002, on the island of Florianopolis, Brazil, and implemented Florypa, a minimal prevalence layer for Smalltalk (VisualWorks) based on Prevayler:

Exactly how or where does a PrevalenceLayer differ from a DataBase?

First of all, a prevalence layer keeps all object's state in the client's memory space.

But some databases do that too?

Yes. So, second of all, a prevalence layer doesn't know about the internals of a system (classes, objects, tables, whatever). It treats the system as a black box, only logging incoming changes, plus the occasional snapshot of the entire system.

A PrevalenceLayer doesn't impose table structure on the data and it doesn't restrict access to a query language. -- EricHodges

I think you might have touched on a significant difference between a PrevalenceLayer and a relational database. The PrevalenceLayer (at least in terms of Prevayler, although I think it would apply in general) doesn't have any application independent query language. Yes, you can use Java reflection, but without application knowledge, you don't know what anything means. With a reasonable relational data model, other applications which have no knowledge of the base application can access that data in a reasonable way. A classic example is a data warehousing application that can pull data from a relational transaction system. With very little effort and just a user's knowledge of the relevant portions of the data model, one can build all sorts of data warehouse information repositories. It is this standard data representation model and the relational algebra that lets you manipulate this data that is the real power of a relational data model. I did exactly this with an multi-dimensional database data warehousing project I maintained in my previous job. -- JohnVriezen

Agreed. Anything that wants to see this data will have to know the classes that define it or ask the JVM it lives in. It isn't a data sharing mechanism. YouArentGonnaNeedThisData?. -- EricHodges

Okay, again, that may differentiate it from a relational database, but what about non-relational databases?

What's the difference whether it needs to know about java classes or not? I could say that relational databases suck because I have to know SQL and why not just have everything in XML or Excel or something else absurd. If you want your data in a relational database then simply do a nightly extract to wherever you want it to go.

Whoa there, pardner. I'm not saying it sucks because clients have to know about the Java classes, just that it isn't a robust data sharing strategy. Lots of stuff can understand SQL results. Only JVMs understand Java classes. -- EricHodges

You could just forget about normal java serialization and serialize to xml using something like skaringa Or write a little tool that uses ognl or jxpath and then some reflection to put results in table form(?)

You could, but what problem would that solve? I'd much rather have the data in a real relational database than an XML serialization of Java objects. -- EricHodges

If your personal preference is to use a RDB, then by all means use one. The Prevayler project certainly isn't saying Eric Hodges's personal preferences should change.

Does anyone prefer XML serialized Java objects over relational data for sharing data outside JVMs? What problem do they solve? -- EricHodges

Here's where I'm coming from. What I see is lots of applications that use RDBs because of some supposed need for flexibility, etc that you describe. I think we can all agree that if your requirements don't require this kind of complexity then you shouldn't use it. Prevalence might be a good option for moment-to-moment persistence. And if your requirements say you need to run a complex reporting app that requires your data be in an RDB, then perhaps a simple periodic extract is all you need.

Ok - need to hold on here a bit - you can use JXPath inside your code to query objects, and you can use webservices outside to get your data out. Also, I hear from a MS developer friend that most of his database interactions are done using XML Queries.

Applications which make use of a RDBMS implicitly and sometimes explicitly (via a published db schema) provide a very robust API for other applications and end users to access (and possibly modify) data relevant to the system in a multitude of ways using relational queries.

Applications which use a proprietary storage model (this includes serialized Java objects) realistically only provide the explicit interfaces directly available at the application's formal interfaces (UI, API, etc.)

The PerforceVersionControl system is a good example of this. They have dozens of command line, perl, python, etc. api's to extract data from their proprietary internal database (which is a fast RDBMS specifically suited to their needs.) With their latest release however, they provided an ODBC (and hence JDBC) interface to this internal RDBMS. Now there are a huge number of things you can do to find out about the system that might have been very difficult in the past. If you couldn't do it with the interfaces provided, you'd have to ask for an enhancement and wait.

Don't get me wrong, I'm not saying everything should be in an RDBMS, its just that there are some distinct advantages when you use a well-known, accessible data store technology. -- JohnVriezen

YouArentGonnaNeedIt YagniAndDatabases

Most people make their applications brutally break each other's encapsulation by "accessing (and possibly modifying)" each other's private parts via a database and still wonder why the promises of OO are not fulfilled. -- KlausWuestefeld

{"Brutally"? HolyWar talk. Some of us think encapsulation is a flawed concept. DatabaseNotMoreGlobalThanClasses}

So, for example, which application should own objects of class Customer: call center app., billing, CRM or accounts receivable? What about Employee or Order? If all of them, be prepared for a 1-2 year integration project...

I wonder about the following three stories:

In each of these three stories, response time is crucial. In the call-center case, extensive industry experience has shown that if the retrieval takes more than 3 seconds, the customer hangs up.

How does the performance of PrevalenceLayer compare to relational technologies for these kinds of problems?

If the data fits into virtual memory, then the question is how fast you can traverse a 2,000,000-element data structure. If it doesn't fit into virtual memory, then you can't use a PrevalenceLayer.

How many situations in which all of the data - not active data, but ALL data - fitted into virtual memory will also require "TransparentPersistence, FaultTolerance and LoadBalancing? of business objects"?

Conversely, in how many cases in which "TransparentPersistence, FaultTolerance and LoadBalancing? of business objects" is needed will the data all fit into virtual memory?

Don't know about LoadBalancing?, but all the applications I've built the last five years have needed a database of some sort. But I don't recall any of those databases being more than 500 MB on disk, even with a fair amount of historical data in them. I think a lot of us have experienced the same, hence the interest for object prevalence. -- AndersBengtsson

I wouldn't use a prevalence layer on virtual memory. You need physical RAM. -- KlausWuestefeld

If I can fit all the data into virtual memory, I don't need a PrevalenceLayer OR a relational DB.

Only if you don't mind the data being wiped out when the machine boots.

I don't suggest that long-term storage isn't required. I suggest only that the form of the in-memory representation isn't nearly so important as this approach might suggest. As I see it, a relational DB is needed when I have to save and query far more information than I can keep in memory at any one time. It would seem that we all concur that a PrevalenceLayer does not address this need.

Qualify "memory" as "virtual memory" and we might all concur. ?

In a Prevalence system, the data doesn't really need to be "in memory" it only needs to appear as if it is in memory. Obviously, with a sufficiently wide address space (say 64 bits) and a virtual paging system to manage moving storage between disk an RAM, Prevalence will work for larger data sets.

A key point however is that as the volume of data scales, standard paging algorithms won't necessarily scale as well as a well-designed system that admits to having two types of storage - high volume slower storage (disk) and lower volume fast storage (RAM). Certainly, all the research that has gone into sorting, merging and searching large data sets (see Knuth) is designed to optimize algorithms for just such a storage hierarchy. Similar efforts have gone into optimizing RDBMS's given the storage hierarchies.

While a virtual paging system will work admirably for a large set of applications that can be blind to the speed of the storage underlying it, for some applications it can not. I suspect some of the applications mentioned above are in this category. -- JohnVriezen

How does a PrevalenceLayer differ from a non-relational database?

A prevalence layer works by logging all changes made to a system in a replayable format, so that the system's state can be exactly restored.

But a database can and do use or include the ability for ChangeLog's.

A RDBMS can use such features. But if they do, they do it to recreate the state of the database, while a prevalence layer uses it to recreate the state of the application itself.

Here's an example of a prevalent system implementation:

I think I got it this time. Perhaps it is a *sub-set* of databases. Perhaps it can be defined as a "database based in RAM that does not have to be relational". But still, it is a "database" then. But that is close to an OODBMS still.

Try again. It isn't a database, so it isn't a sub-set of database. It is what it says it is.

A superset then? Okay, lets try to break this down to a more atomic level. Here is the a list of "typical database features" from DatabaseDefinition:

  1. Persistence
  2. Query languages or query ability
  3. Repository for NounModels?
  4. State management
  5. Multi-user contention management (locks, transactions, rollbacks, etc.)
  6. Backup and replication of data
  7. Access security
  8. Data rule enforcement or validation
  9. Multi-language data access

A PrevalenceLayer seems to offer 1, 2, 4, 5, and perhaps 9 at least. Now, what does it offer that a "database" does not? #9 is questionable. See below.

A PrevalenceLayer can be used by software that provides all of those features. The main thing it offers that a database doesn't is you don't have to use a database.

Isn't this CircularReasoning?

Not at all. One of the disadvantages of databases is that you have to structure your application to fit them. A PrevalenceLayer removes that requirement (and replaces it with the requirement that you structure your application to use CommandPattern.)

{How about an example between a DB version and a prev layer version. Note: you do not necessarily have to limit it to relational DB's.}

RDB version: Application must translate all persistent data structures to records composed of fields. Access to them is limited to vocabulary of RDBMS API (usually SQL). Either the "shape" of the RDB imposes itself on the rest of the application or an abstraction layer is built to insulate the app from the RDB.

PL version: Application can use any data structures. Access to principal behaviors must take the form of CommandPatterns.

[Claims that JavaSerializationIsBroken moved to that page]

Okay, the table "shape" is a defining characteristic of a RelationalDatabase, but not of databases in general. I am not asking how it differs from relational at this point.

If the database doesn't impose a preferred shape on the data structures, the biggest difference is that the application isn't restricted to the usage patterns defined by the database's API, I suppose.

I think you'd understand PrevalenceLayer better if you used one. It's a trivial concept.

To Klaus, ThePrevayler and PrevalenceLayer are excellent! -- JeffPanici

hear hear

OK, I just figured out how to explain a PrevalenceLayer to a database fan:

A database is a PrevalenceLayer with pre-defined CommandPatterns. In the case of RDBs accessed via SQL, those commands are select, insert, update, delete, etc. Most databases define CRUD commands.

The difference is that a PrevalenceLayer lets you write your own commands and define your own data structures.

How's that?

What if RDBMS's improved upon StoredProcedures a bit? (such as fixing SqlFlaws)

Are you trying to understand the difference or trying to confuse yourself?

Also, what exactly is "defining your own data structure"? If you mean you are not limited to relational's table view, then a NavigationalDatabase or OODBMS may also apply, since you can shape your "records" any way you want, IOW, ignoring the "table" shape constraints of relational. How about an example of a data structure you can make in a PrevalenceLayer that you cannot make in an OODBMS nor NavigationalDatabase?

Every OODBMS I've worked with resolved data structures to tables, although I've heard of some that don't. Perhaps they are PrevalenceLayers?. Every NavigationalDatabase I've worked with structured all the data as b-trees.

{I don't think the earlier NDB's used B-trees, but I cannot get much information on non-IBM ones.}

So a PrevalenceLayer is a NavigationalDatabase that allows you to define your own navigation commands?

No, the commands are not limited to navigation. They can be any deterministic commands.

Perhaps navigational primitives can also be used to make open-ended deterministic commands (TuringComplete). I have never tried it. Perhaps an example of a command that you found practical to create on your own is in order. Besides, I don't think the ability to add your own commands is sufficient to exclude it from being called a "database" or DBMS of sorts. With enough effort, stored procedures and/or views can be used to add new nearly open-ended functionality to out-of-the-box RDMBS. (BTW, I would do it differently than the current crop of Oracle-clones do it if given a choice.)

Perhaps a sow's ear could be used to make a silk purse.

It is a definition issue at this spot, not a paradigm pissing contest. You have yet to justify the need for roll-your-own-database-API's anyhow.

If you can't see any use for it, don't use it. You asked for the difference and I tried to describe it. The difference is obvious to me and I suspect to you as well. You're just pulling my chain.

I swear to God and Budha that I don't see any significant difference; nothing near enough to consider a PrevalenceLayer anything other than an object database of sorts. It does not have a single unique feature (in a general sense) that no other database has.

Here are some features that differ: it takes about 1 day to start using it, and it doesn't require actual full-time positions to babysit it. And therein lies the real reason for resistance among RDB people.

Feel free to consider a PrevalenceLayer an object database of sorts. I've used both and see many significant differences, most of them enumerated above. Use them both and you'll understand.

It seems what we have is commercial OODBMS having say features A, B, C, and some others, and RDBMS having features Q, R, S, and some others. These PrevalenceLayer things (allegedly) have features A,B,C,Q,R, and S. Although maybe this set of features is not found elsewhere in one system, I don't see the combo significant enough to give it a special name. It is simply a "database system with such and such features", features that may come and go in other DB's without having new names coined for them.

You're purposefully ignoring the big difference. A PrevalenceLayer can be more transparent than any database. The app doesn't connect to a database. It doesn't read and write records. The developer moves his primary behaviors to CommandPatterns, adds a few periodic calls to takeSnapshot() and writes the rest of the app as if memory lived forever. The behavior and data are free to take whatever form best solves the problems at hand.

[someone else] Apologies for my harsh tone. Rather than respond to the RDB baiting I'll shut up.

Some discussion moved to DbasGoneBad.

An interesting quote from the Prevayler's FAQ:

I think one of the reasons that Prevayler can be so radically simple is that it doesn't deal with the issue of making data remotely accessible. A database server, apart from providing a reliable storage mechanism for data, necessarily has to expose this to remote clients. Prevayler's striking simplicity is possible because it assumes that the data will be consumed only by the one, local application. It's likely that at some point during the lifetime of a Prevayler-based application, there will be another application that needs to use its data. -- Michael Prescott

Maybe it is not a database after all. I generally considered DB's to be multi-application accessible. (Maybe I shall update the DB list to clarify this.) If you want other apps to use the data, then you have to provide an explicit interface, which means you are turning your application into a roll-your-own DBMS of sorts in order to serve the data needs of other apps or languages.

This sounds like the old "bubble memory", which kept its state when power was removed. Is a PrevalenceLayer an attempt to emulate this?

Not intentionally, but they both minimize the hassles of disk I/O. Memory like that would be nice for implementing both OO applications and RDBMSes.

Okay, it's a journaling system for object orientated languages. It isn't a database: it doesn't deal with queries, it doesn't define distribution, doesn't define security/integrity policies of the data itself. Not that this is a bad thing, but it is a different thing.

Personally, I like the possibility of creating my database from custom components: these objects will persist using this framework, query with this framework, and distribute with this third one, except for these special ones which need to be dealt with specially for persistence, etc...


Performance comparisons between OODBMSs and RDBMSs are always questionable. A prevalence layer such as ThePrevayler, though, is typically 3 to 4 orders of magnitude faster than either, even when the DBMSs have all the data cached in RAM. -- KlausWuestefeld


Most importantly, see

I have not yet used ThePrevayler, but I have been doing some research on it. It took me a while to wrap my head around this concept, too. It's so simple, you miss it. It helps if you replace "database" by "data structure" in your thinking. That is, instead of stuffing all your application data into a "database," you instead stuff it into a data structure. That data structure can be anything it needs to be for your needs. If a hash table works best for you, fine. If a tree works best, fine. Just create (in my case, Java) objects and be on your way. Now, a typical application has multiple data structures: lists of this, trees of that, etc. Create a master object that just holds a reference to those other data structures and may provide some logic for either retrieval or mutation operations. This object is the root of your data structure object graph. In effect, it is your "database" and holds the application state that we're interested in persisting across application runs. Now, control mutation access to that data structure with a command pattern. Basically, like a journaling file system, before we apply a change to the data structure, we write a copy of the change we want to perform to persistent storage, then we perform the action. Every now and then, we serialize the whole data structure, starting at the root, to persistent storage and clear out the journal (we snapshot the whole thing). Every time the application starts up, we first read the snapshot, then apply all the journal entries that occurred after the last snapshot (because the journal is basically just serialized command objects). That brings the datastructure up to date.

A PrevalenceLayer just helps with all the snapshotting and journaling machinery, relieving you of the burden of writing it. ThePrevayler is trivially small (few hundred lines). That's good from a simplicity point of view.

  1. Data are just objects in memory. You control the format. You control the structure. You control everything.
  2. As a result of #1, it's fast. No layers of JDBC code, SQL code, blah, blah. Just navigate by object references. It doesn't get any faster than that. Note that a bad programmer could probably pick a bad data structure and blow this advantage, but it's pretty easy to use standard collections libraries that are fairly fast.
  3. If you crash, your data is still recoverable. You use the snapshot and the log to figure out where you were and continue from there. So you're better off than "just" an in-memory data structure.
  4. There is no need to write any sort of mapping code or data access layer. Your data is your data. That's it.

  1. It is inherently unsafe. Because of JavaSerializationIsBroken and also because of incorrect handling of errors. Ship your software with PrevalenceLayer and when it crashes, users won't be able to bring back the system alive to a consistent point in time without the help of a programmer and even a programmer might have to throw away lots of valid data. 99% of the time a real database will be able to recover without human intervention, and without loosing any committed transaction. This point should be enough to make any comparison with databases naive at best if not straight disingenuous.
  2. It does not execute transactions concurrently. [Because all transactions can be InstantaneousTransactions]. I can understand the because part, but the fact is they don't execute concurrently when you need that [If you understand all transactions can be InstantaneousTransactions, when would you need concurrent execution?].
  3. Also because JavaSerializationIsBroken you may have to think twice or three times how you organize your ObjectModel, and if how you write all your custom serialization correctly, and keep it in sync with ongoing modification.
  4. Doesn't work with data structures that don't fit in (virtual) memory. If you have 1 TB of data, this isn't for you.
  5. Complex object schema changes are burdensome using default Java serialization.
  6. Because it doesn't have transactions, it offers no error recovery. [ThePrevayler has transactions and, as of version 2, does automatic rollback in the case of a transaction throwing a RuntimeException]. The 2 alpha version has the rollback code is just unusable.
  7. If users demand just in time reports on their data, forget about Excel or Crystal Reports, you'll have to spend 10x at least to hack them in Java. Crystal doesn't let you access java objects? What sort of useless thing is that? Get a real reporting app.
  8. Requires some work on the programmer's part to choose good data structures and access methods into those. If you're a lunk-head, you may mess this up and run slowly. You'd have to be a pretty bad programmer to blow the advantage you have over going through xDBC and SQL access layers, however. Given good collections libraries, you'll be okay.
  9. You may have to do some more work on writing all the various mutation and retrieval routines. If you want data sorted, for example, you'll have to sort it yourself. Again, standard libraries make this trivial, but things like a SQL "LIKE" query would be more of a pain.
  10. It's difficult to have a bunch of different applications access the data. It works best when the client of the data resides in the same VM and just uses the data structure. If you have 100 different applications that can run at various times, then it becomes less of a win. You could still use PrevalenceLayer and write your own multi-app access code, but now you're recreating functionality that databases give you.

In summary: if your needs are simple, a PrevalenceLayer is simple too. If you really do have terabytes of data and exercise all sorts of SQL backflips, then you should probably use a DBMS of some sort. Many applications fall into the first category, however.

You don't have to implement it all at once - you can migrate. If it will be better for your business in the long run, start by making your [java] app the centre of information, and THEN make it the data storage mechanism.

Re: "1. It is inherently unsafe. [...]"

What is the big deal? "Prevalence Layer" has been well-known by everyone else as PersistentObjectStore, and the pro's and con's have been well-documented in the literature (as well as in the previous comments). It's just that ThePrevayler has found a way to implement it in Java in a fairly straightforward way.

By the way, an OODB is more or less a PersistentObjectStore with a standard query and/or traversal mechanism.

It seems to me that if "Prevalence Layer" proponents want to be understood by the programming community at large, they should at least use well-known terminology.

A prevalence layer isn't a POS. Those are typically concerned with storing individual objects, while a prevalence layer stores the entire system as a whole.

Laughing Out Loud here, because of my own Acronym Confusion. I first read this line A prevalence layer isn't a POS. as "A prevalence layer isn't a Pile of Shit", which of course, I agree, and in fact this made sense in the context of the discussion above. But then, I realized, POS probably meant "PersistentObjectStore". :)

That aside, I like the Prevalence layer concept, and cheer it on. Being involved with business systems development over that past 5 years, I'm always having to deal with RDBs, and never liking it. I've managed to implement a few prevalence-like applications; though, with the RDB as the target of all serialization, just to maintain a corporate blessing. Sure wish I had ran across Prevayler earlier.

I'm reminded of a (now-dead?) persistence framework/application in Smalltalk that was able to make use of the block structure to simply say [ some code ] persist., and achieve automatic persistence of any objects within the block. Also provided language for marking atomic transactions (whereby either all the objects within the block were saved, or not, together] too. Does anyone recall this application? -- KevinLacobie

How is an upgrade between between development snapshots and how is an upgrade between software releases handled? I have all this data, I change the object, now what happens? Do I lose all the data? Do I have to version classes? Do I have change the serialized file?

As I understand you need to take a snapshot just before classes update, then write a migration script in java that would convert the snapshot

Yes, you'll want to take a snapshot. And yes, you'll want to version your classes. Doing this will enable you to recognize and convert the data when it is deserialized back into objects when the new version of the system starts up.

This might well be the seed of a good argument for explicit serialization and deserialization. If you only have one site to worry about, then a one-shot migration script is a reasonable solution. If you have hundreds or thousands of sites (or more), then you need to consider alternatives.
  1. Assume that you application is going to be long-lived.
  2. Assume that the data format will change at least several times (mostly in minor ways).
  3. Assume that customers will sometimes want to do skip-level upgrades.
  4. Assume that - at least once - a customer will want to read newer data into older software.
Better to have one path through the code for reading all your data. Implicit serialization and a migration script counts as (at least) two paths.

-- PrestonBannister

Re: "3. Assume that customers will sometimes want to do skip-level upgrades." Re: "4. Assume that - at least once - a customer will want to read newer data into older software." Re: "Implicit serialization and a migration script counts as (at least) two paths."

Is prevalence similar to object cache that only writes to DB? I saw such approaches. Also it seems that modern ORM frameworks also extensively use object cache, and updates DB, one can extrapolate this and say that DB turns to be write only with big enough cache

I like the prevayler concept a lot, but one thing that keeps bothering me is the fact that there seems to be no obvious mechanism to make a prevalent datastore concurrently accessible. The only way I see is to update the system state from the command log on every retrieve for each concurrent client. Wouldn't this negate the performance gains? Also, like mentioned before I'm worried that refactoring the app in any non-trivial way would easily cause the serialization to break, forcing you to write a lot of porting code from the old format to the new, but this last point might be solvable by some new tools.

Read-only queries can execute concurrently; only commands that change the state must be serialized. Bamboo, the .Net library, supports this. But the main answer to this question is that transactions typically run so quickly that the complexity (and corresponding performance costs) of concurrency control are not needed.

(One could make an object prevelance system fully concurrent. But one would have to deal with the multi-threading and concurrency issues in application code, and ensure serializability. That would be a challenge, and it doesn't appear that anyone has needed it, yet.)

Refactoring can break serialization, but there are reasonable ways to recover -- by overriding the default deserialization logic with appropriate conversion logic. And yes, people often get worried about the cost of converting their data to a new schema -- until they compare it to the cost of changing their application code to support the new schema, and then they realize that the conversion costs are a relatively small contributor to the overall costs.


View edit of July 28, 2008 or FindPage with title or text search