Relational Breaks Encapsulation

See ObjectRelationalImpedanceMismatch.

JeffGrigg asserted the problem in the title. Here are the original arguments:

Here are the main strengths of Object Oriented Programming and Relational Databases:

The two are incompatible: If you wish to leverage the full declarative non-procedural power of SQL queries, you have to give up the encapsulation and maintainability of business objects. If you want to enjoy the benefits of maintainable business objects with properly encapsulated state, you have to restrict your use of SQL queries.

Consider a simple case of an OO program that uses a relational database to persist object state. Let's assume the simplest possible mapping: Each business class is one relational table and each instance is one row. Now, write a SELECT statement that joins two tables and fetches half the columns from each: How can you represent the results as business objects? Really, you can't: Not only does this one SELECT statement violate the encapsulation of two business classes, it fails to fetch sufficient data to create a single instance of either class. You can't make the transition from this database result set to any sensible collection of business objects without reading more data from the database, or doing something really tricky.

Generally speaking? Arbitrary SQL queries violate object identity and encapsulation.
More precisely, mixing object access with direct SQL access results in chaos. Don't do it. All access must be through the objects. The collection classes accessing the database can manage integrity, as long as nobody is coming in through the side door and munging things.

If, as in the above example, you have a collection whose members need half of one object and half of another, there is something dreadfully wrong with the object design.

--MarcThibault

But what if a collection's members need half of a tuple, or the previously unknown result of an outer join? -- EricHodges

"Half of a tuple," "outer join," or other categories of query are not relevant. What is relevant is whether or not the query result type -- i.e., the set of returned attributes -- can be known compile time, or whether it can only be known at run time. If the query result type is known at compile time, then the result set can be wrapped in an object manufactured by a static class definition or equivalent. If the query result type cannot be known at compile time, as with ad-hoc queries, then one must either employ a client-side language that supports run-time object definitions, or use collections to contain both the result set type and the result set body. -- DaveVoorhis

So you make a class for every possible result set known at compile time? That seems like overkill to me. Sometimes all I want are one or two values. The process of moving them from a result set to an object just adds code without adding benefit. I think most of the OO-relational "impedance mismatch" comes from the effort to single mindedly force all relational data into objects. Once I let go of that desire using RDBMSs became much simpler. -- EH

If all you want is one or two values, then create one class or set of static methods to handle the "find a description (and maybe some other stuff) given a key" idiom. This is trivial. You can easily create classes or methods to handle dynamic result sets, or static result sets, as needed. The manifestation of "impedance mismatch," to the extent that it exists at all, is directly proportional to the developer's ability (or inability) to create appropriate wrapper classes, and his or her ability to appropriately employ dynamic containers vs. static class definitions. -- DaveVoorhis

But you risk a kind of combinatorial explosion, and classes/methods such findAllGreenSweatersNotDiscounted, findAllGreenSweatersNotDiscountedBeforeHoliday, etc... If you use them multiple times, it may be worth it. But if their use is one-off (one spot in entire app), then you are wasting time and code.

I was unclear. By creating one class or methods, I didn't mean one class or method per query. I meant one class or method to handle the general category of ad-hoc (or ad-hoc-ish, which the sweaters example seems to be) queries. An obvious way is executeSQL(String SQL) method that returns a dynamic ResultSet (or whatever) collection instance, but this approach will ScatterSqlEverywhere throughout your application code. About this, someone above wrote "don't do it," and they're probably right -- maintenance can be painful when the schema changes.

Another way is to keep the executeSQL(String SQL) method described above, but also create an executeQuery(String queryName) method that retrieves a named query string from a table in your database and invokes executeSQL() with it. For example, executeQuery("greenSweatersNotDiscountedBeforeHoliday"). This keeps the SQL in the database instead of your application, which with some luck and discipline might make it easier to maintain through schema changes and definitely makes it easier to find and test all the SELECT queries en masse. However, among other limitations, this involves an additional database hit for every query unless you cache the query strings. A workable compromise is to maintain a pool of query strings in a static collection within the application, or retrieve it at run-time from an external source. Although it puts the SQL back in (or at least near) the app source, at least it's in one place this time. It doesn't take much imagination to come up with more approaches, all with their own pros and cons.

(I should point out that I'm not disagreeing with you here, though it seems I started out that way. I'd intended to expand on your original point, not contradict it.)

The real solution to all of this, of course, is a general purpose object oriented (or functional, or <insert favorite language paradigm or mix of paradigms here>) programming language with first class support for relations, relation-valued variables and relational algebra; and perhaps relegate SQL to being an internal, low-level protocol for accessing DBMSes. In other words, a RelationalLanguage. -- DV


Given an arbitrary SQL SELECT statement, there is not likely to be any sensible way to map it to fully usable business object instances. -- JeffGrigg

I'm not sure this is a valid argument. RelationalDatabases support encapsulation as well as SmallTalk does, ie by convention. Anyone can use instVarAt: if they wanted, but convention prevents it. Likewise, we could develop conventions in the relational world that SQL SELECTS are required to select an entire row, be joined to parent tables, etc. - MarkAddleman


Response:

Let's consider an application that is using a good object-relational mapping library: We fetch a number of objects into memory and modify a number of attributes. Now we issue a SELECT statement directly against the database, selecting from the very same columns and tables we "updated" in the objects. The only way the SELECT statement can return the correct results is if all changes to the object instances are flushed to the database first -- something that is usually deferred until "save/commit" time, because issuing a database UPDATE statement every time an attribute is changed would cause serious performance problems in most applications.

Likewise, if we UPDATE rows directly in the database, this will invalidate any data cached in object instances, making them invalid -- unless we have some way to signal this switch between OO and SQL "modes." ...or issue a SELECT statement to the database for each and every call to fetch an attribute from an instance -- again with serious performance implications.

This is not a difference. In standard OO, if you create two objects with the same base data, there is no guarantee that they will remain synchronized. If you modify one and not the other, they will differ.

There's another concept to consider here too: If clients consistently fetch properties through get and set methods, as is common in C++ and COM, any object can hide its implementation and expose a different set of properties than it stores as instance variables. You can't do that with relational tables: The columns in SQL statements are physical; none can be derived values, unless you put the derivation formulas right into your SQL statements.

Well, first of all, it is true that some values can be stored physically, any system will do that, even Smalltalk. But if you want to present your users with different derived values, you just create a view. An end-user will hardly see any difference between a View and a Table as far as queries are concerned. More, you can ENFORCE encapsulations against updates, but release the encapsulation burden for queries, which is very important.

If someone updates an underlying data while you have some objects in application space, then the transactional isolation level enforce encapsulation, and you can use isolation levels as to enforce encapsulation at the strict minimum level that guarantees consistency while allowing a greater degree of concurrency.

Therefore, the relational model breaks no encapsulation at all. It offers you every mean necessary to enforce a better encapsulation (encapsulation against writes, but not necessarily against reads for example, this isn't offered by any OO language other than Object Pascal (Delphi) as far as I know). -CostinCozianu

You might want to phrase that better - most OO languages let you encapsulate writes by making the instance variable private and providing a method. The analogous thing in a database is to use stored procedures. -BrianSlesinsky

Not necessarily. The analogy is a little bad, Stored Procedures should be seen as a client program also, only that it has moved for efficiency reason inside the database. It can be seen as a extension of client run-time inside the database, because there it executes a bunch of operations more efficiently than a client server protocol will allow. More, for databases such as SQL Server and Sybase , stored procedures are essentially no different than batches. And the most important thing, stored procedure uses the same high level interface (SQL commands) to operate on data as any other client, they have no access to the inner workings of the database (the exception being very special vendor-supplied stored procedures used for DBA type of jobs).

What I meant to say is that encapsulation usually means that clients of an "entity" ( a object if you want) have essentially no access to the internal state. And this can be further split in two:

Read access. Encapsulation for read access is a RedHerring. More or less every object.getXXX() call is an unnecessary fancy way to say give me the value of the XXX, and should be inlined by a smart compiler. Most often it is not inlined (in Smalltalk because the klanguage is too dynamic and there's no type static information for the compiler to make the decision, and in Java because getXXX is usually defined through an interface and an interface pointer instead of a base class). A just in time compiler could do it at runtime, but last time that I checked (2 weeks ago) Hotspot doesn't do it. Delphi has a smart solution for that named property keyword which essentially leaves read access to be direct while writes can be forced to a special method.

Write access. Write access encapsulation means that we shouldn't let anything else other than an object's method to modify the internal state. This is obviously needed in order to maintain a consistent state. The problem with encapsulating write access are not quite as simple as we'd like them to be. I'll come back further down this page to treat this subject separately. This is the only case where someone can pretend that OO approach has an advantage.

Unauthorized access. Preventing unauthorized access is handle in OO languages only with great difficulty if possible at all, while in a relational database it is usually very simple for a DBA to enforce access policies.

If anybody sees any other problem related to encapsulation, please add here. CostinCozianu

Getting back to the original question, here's how I imagine you could have full SQL along with business objects: Imagine a database where instead of having rows you have objects that implement a Row interface. There are get() methods to access column values, some update methods, and an attributeChanged() callback that a Row uses to tell the database when indexes needs to be updated. SQL could be used to query a table of these row-like objects just like they were rows. (The result of a SQL query is not a set of objects, but rather a table of values extracted by calling get() methods on objects that match the query.) An insert statement calls the Row class's constructor, and an update statement calls the update methods of many objects in parallel.

(I don't know of any databases that work that way, but instead-of triggers in Oracle might be a good substitute.)

- BrianSlesinsky


It seems that CollectionOrientedProgramming in general breaks encapsulation, not just relational in particular. See OoConflictsWithCollectionOrientation.

'Yes. It's a matter of granularity. SQL tables and the SQL language are not explicitly bound, so there is no encapsulation as such (though stored procedures and views can establish a functional encapsulation). Trying to OO a relational database seems like a strange thing to do. It is minimalist in its approach to information linkage, therefore adding gothic complexity is going against the grain. Perhaps that is why those who started with OO have real problems dealing with it. I dunno. -- RIH'


The "queue" discussion in RelationalHasLimitedModelingCapability has some points which may be applicable to this discussion. OOP leans towards its roots and models queues as a "hard" physical thing with solid walls/shells, while relational tends to view a queue as one of multiple possible viewpoints of the same info. In my opinion, relational provides a more flexible "abstraction engine" by not tying things to one viewpoint. But, the price is broken encapsulation. If you want to add security ("walls") to a "queue", it becomes a custom-built or domain-specific endeavor.


The relational model is closer in some respects to that of a dynamically typed language -- conceptually, each query potentially defines a new [relation] type, automatically provided with methods to retrieve (and perhaps update) the various attributes. This is similar to defining an object in a ProtocolBasesProgramming? language and defining methods just for that object. Its still OO, just not class-based. These methods can be simple accessors, or more complex (e.g. where the query does some computation). We can still have encapsulation, via views and the DBMS security mechanism.

The key (no pun intended) is that it doesn't _matter_ that we don't have "all the data about the X", because most/many operations on X's don't need _all_ the data about X, just some of it (I don't need to retrieve your address to change your DoB). By retrieving just what we need, we avoid StampCoupling -- as long as I give the function a relation (query result) with the relevant columns in, it doesn't matter if it has a few more columns. (Actually, there are some nasty details when it comes to updateability -- general view updateability is still a research problem. But the read-only and simple-updateable-view cases cover many real-world cases.)

Of course, SQL does a terrible job of supporting the update operators (because views, and queries in general, have very limited updateability). This does break encapsulation/abstraction. Also, most O-R mapping frameworks (at least those for statically-typed languages) seem to balk at the idea of "a new [relation] type for each query" and force the mapping to be per-table/object instead.

The protocol-based OO languages (and perhaps the dynamic ones, in which it is possible to get some of the effect of prototypes by creating classes on the fly) should be a much better fit with relational, it seems, than the C-family statically typed ones.


Additive Versus Subtractive Encapsulation

Generally, relational creates "global" domain models based on attributes (declarative) and assumes behavior is usually local; while OOP does the opposite: assumes behavior is global but attributes are local (encapsulated). This is why they don't get along very well.

Further, in relational, every table automatically "inherits" all the relational operators by default. If you want to limit operations, one uses various security mechanisms such as an AccessControlList.

Thus, overall, OOP's operations tend to be added as needed to classes, while in relational one subtracts as needed, using various predicate filters or many-to-many group or role relationships. (I find the subtractive approach more powerful because you don't have to reinvent the wheel. It's easier to un-invent than re-invent. Most OOPL's don't make mass-inheritance very easy.)

--top

Huh?

With encapsulation, classes start out with only the accessors/operations/methods you give them. A relational table, on the other hand by default has all the behaviors and abilities that the DB provides (such as DatabaseVerbs) until you remove them via constraints, triggers, AccessControlLists, etc. In OOP you build gateways, while in relational you shut them. -t

When they invent set-based encapsulation instead of just nested encapsulation, let me know. (Related: AlternativesToHierarchicalEncapsulation) --top

Huh?

Exactly! We know what nested-based encapsulation looks like (at least a form of it: OO's), but a set-based version is a big "huh". Maybe the two ideas are just a plain contradiction because the very word "encapsulate" implies a kind of nesting that mirrors physical nesting: stuff that's "inside" versus "outside". Sets tend to not match the real world much. They are virtual classifications with few real-world counterparts. -t
See also: GateKeeper, DatabaseNotMoreGlobalThanClasses, CollectionsArentOo ObjectsAreFromMarsTablesAreFromVenus, ObjectVsModel

EditText of this page (last edited January 10, 2011) or FindPage with title or text search