The acronym ACID describes desirable properties for transactions
(changes to a database, typically):
- Atomic = all changes are made (commit), or none (rollback).
- Consistent = transaction won't violate declared system integrity constraints
- Isolated = results independent of concurrent transactions.
- Durable = committed changes survive various classes of hardware failure.
Can your application or favorite database pass the ACID Test? See "Transaction Processing: Concepts and Techniques" for a good look at what it takes. ISBN 1558601902
- "In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction."
also known as "Acid Semantics", "Acid Properties"
see also BankAccountTransferProblem
Who implements it:
- Practically all RelationalDatabases.
- BullAnt have a fully persistent OO Virtual Machine and language which guarantees Acid properties for all transactions.
- So does the JiniTransactionManager?...
- SoftwareTransactionalMemory: Atomic, Isolated, optionally Durable (per cell), 'Consistent' (vacuously) - implemented by Clojure, Haskell (with many extra features), now macro-based implementations available for C/C++
- TransactionalActorModel: Atomic, Isolated, optionally Durable (per actor), 'Consistent' (vacuously) - new, not yet implemented
I'm not clear on the exact definition for atomicity. Does atomic mean all clients will see all or nothing at all times, and nothing in between? Or does it merely mean that *eventually* it will be all or nothing, but in the mean time it could be something in between? So is 2-phase commit really atomic? The existence of different SQL transaction isolation levels confuse me more.
- "Atomicity (database systems) is a property of database transactions which are guaranteed to either completely occur, or have no effects."
Atomic means all or nothing of the changes done by a transaction are recorded for posterity
. The property that clients will 'see' all or nothing is Isolation
Atomic also means that if your application or the database server crash in the middle of your transaction, none of your changes will "take." This is trivial if the database is volatile, but if Durable your transaction (IE: all of your changes) will be rolled back (IE: discarded). This may seem bad, but it's generally better than committing some unpredictable half-done work to the production database. ;->
- all transactions maintain data integrity constraints. A data set that meets integrity constraints can be called 'consistent'. A transaction maintains data integrity constraints if, when starting with a consistent data set, it will result in a consistent data set. A DBMS can maintain consistency by aborting transactions that would cause inconsistency. For clarity, consistency does not require correctness (where a DatabaseIsRepresenterOfFacts
), and consistency with regards to database systems does not refer to the sort of cache consistency issues seen in distributed computation (this issue is handled by 'isolation'). Consistency also does not apply to intermediate stages of a transaction.
- "In database systems, a consistent transaction is one that does not violate any integrity constraints during its execution. If a transaction leaves the database in an illegal state, it is aborted and an error is reported."
Databases and systems that do not allow expression of data integrity constraints might be considered vacuously
consistent. Examples of vacuously consistent systems include typical implementations of SoftwareTransactionalMemory
Unless isolation is violated, consistency does not need to be enforced at intermediate stages of a transaction. Further, if isolation is guaranteed, then consistency can be achieved if
one can guarantee that each completed transaction takes the system from one consistent state to another. But that's a pretty big 'if' given that one is essentially trusting that every client of the database follow the rules.
Rather than simply providing isolation then trusting the clients, many databases, especially relational databases, provide good support in maintaining consistency through the use of declarative constraints
, such as a statement that all accounts must have a positive balance, or that every foreign key in one table must have a matching primary key in another. A transaction that results in a violation of consistency will be aborted rather than committed.
I thought, perhaps wrongly, that consistency was not just about a DBMS recognising invalid states and rolling back the transaction.
In the BankAccountTransferProblem, defining a transaction to enclose both the deposit and withdrawal ensures the "Law of Conservation of Money" is enforced - attempting to transfer will never create or destroy money. So using transactions appropriately means your data is consistent with real world rules, even when the DBMS isn't aware of them.
In "Information Modeling and Relational Databases", Terry Halpin defines several interesting logical constraints you might incorporate into a data model. For example, if I report to my boss, my boss cannot report to me. It would be nice to have such constraints automatically enforced by a DBMS, but most can't do it automatically. If it is critical, you can write application code within a transaction to verify the constraint has been met, and rollback the transaction if not.
So isn't the important thing that transactions, used sensibly, can ensure ongoing consistency with any rule I think is important? Whether the consistency is with constraints defined using a feature of the DBMS or in application code is secondary.
If consistency over time is important to you, consider use of a temporal database. This would allow you to specify constraints on your data that traverse the temporal dimension.
I want to clarify what consistency means, rather than get into the capabilities of whatever technology.
Maybe where I'm heading is that there's no such thing as vacuous consistency, as defined elsewhere on this page. Transactions give the fundamental building block for consistency. The rest is how sophisticated or limited the DBMS is is expressing consistency rules. Even if it provides no automatic consistency checks, I can add the checks I want as part of the work of the transaction.
So is consistency a consequence of atomicity, isolation, durability, and correctly expressed consistency checks? A DBMS can make it much easier to specify consistency checks correctly, but it could be done in other ways.
- Aside: a temporal database isn't a special technology; it is, at essence, just a database (a managed collection of data) with explicit time information. A temporal DBMS is technology, and would provide extra features to efficiently index and mutate a temporal database.
Vacuous consistency simply means: "Does not allow you to specify any data integrity constraints, and therefore successfully enforces all the data integrity constraints you specified."
are properties of the
transactions. It is true that atomicity + isolation
is enough for you to roll-your-own consistency
. But we can also roll our own atomicity, roll our own isolation, and roll our own persistence (durability). When we roll our own, we must pay for features with our own blood, sweat, and parentheses. We don't say the properties were given to us by a transaction system.
- the reads and writes of successful transactions will (modulo performance) not be affected by reads and writes of any other transactions (whether or not those other transactions are successful). Isolated transactions are 'serializable', meaning that the final state of the system can be reached by placing system transactions in a global order where each transaction occurs one at a time without any concurrency. (There may be many such global orders, since there may be many transactions that do not affect one another.)
- "In database systems, isolation is a property that defines how/when the changes made by one operation become visible to other concurrent operations." "The isolation property is the most often relaxed ACID property in a DBMS (Database Management System)."
In implementations of transactional systems, you will often hear of optimistic
transactions and pessimistic
transactions, which describe how isolation is achieved: with optimistic transactions, the transaction management software generally assumes the other transactions will complete, also assumes they don't read or write to the same place twice, and allows reads and writes to data accessed by other transactions. If it turns out that the other transaction is aborted, or that ordering conflicts occurred (a transaction did hit the same place twice), then both transactions will be aborted and retried... potentially resulting in livelock
with no guarantee of progress
. With pessimistic transactions, resources are effectively locked until commit, guaranteeing that nobody else can interfere with them, but if one transaction needs resources in use by another, it will wait (often unnecessarily, hurting performance); further, if the other transaction needs the resources of the first, deadlock
is the result, though such a deadlock can be broken by detecting it then selectively aborting just one transaction (allowing progress
to be guaranteed). Many hybrid approaches exist (e.g. optimistic at first, less optimistic on retries), aiming to achieve the typically greater performance of optimistic transactions with the ability to guarantee progress from the pessimistic approach.
Relaxing the isolation property generally occurs for the same reason as those hybrid models between optimistic and pessimistic: to improve performance and guarantee progress. The ANSI/ISO SQL standard defines these 'isolation levels': SERIALIZABLE, REPEATABLE READ, READ COMMITTED, READ UNCOMMITTED. Examples of how the weaker levels of isolation fail are available on the wikipedia page.
Relaxing the isolation property can make it far more difficult to achieve the consistency
property, especially the notions of consistency that the data description language is not expressive enough to describe. With full serializable transactions, any notion of consistency (even informal notions of consistency) can be achieved by considering the success or failure of each transaction in isolation. Modulo issues of progress and performance, serializable isolation allows us to program as if there is no concurrency.
Even the lesser isolation levels reduce the degree to which programmers need to concern themselves with performing locks and tracking changes to data.
- "In database systems, durability is the ACID property that guarantees that transactions that are successfully committed will survive permanently and will not be undone by system failure."
is the easiest thing. What really means that committed changes make permanent changes to the data? The usual architecture is that you have a database client that initiates a transaction, performs some operations on data, and then issues a 'commit. Until that point the client should not assume
anything'' about the outcome of his operations, even if the client API allows him to see some kind of success of each operation. However, once the DBMS responded with OK to the commit, then the DBMS must make sure that anything that happens to the DBMS short of a nuclear attack the effects of client issued modifications will persist virtually forever. That doesn't necessarily mean the data itself, as another client may modify change it again in the future, but the logical effects of modified data on future transactions. In no scenario future transactions will be able to pretend that the modifications were not there.
Simply writing modified data to the disk is not enough to achieve durability, because among many other things, the disk may crash. What this really means in practice is that DBMS will write some kind of logs
about the changes it makes, first make sure that the logs are permanent, usually the logs themselves need to be redundant, and only after the logs are all right (including flushing the OS buffers) the DBMS can issue an OK to the commit operations, providing that the rest of the conditions are met. The data cache itself maybe written to its place on the disk at a later time, but it is immediately made available in its changed form to new transactions.
In case something bad happens (DBMS crash, OS crash, disk crash), the database or parts
of the database need to be restarted, and upon restart, the DBMS will first test if the data files are in sync with respect to the logs, because logs will always contain the effects of ALL committed transactions while the data files may not. If data files lost some information, the DBMS will replay the operations of committed transactions as found in the logs.
Apparently this strategy doesn't buy anything since we moved the problem from data files consistency to log files consistency, but under the covers, it is a lot easier for a DBMS to ensure log consistency, since log do not exhibit a high degree of concurrency, read-write and write-write conflicts the same way as data files and data buffers do. That is because logs are only appended, and they are not read unless in case of recovery from failure, so it is a lot easier for a DBMS to manage logs. -- CostinCozianu
In other words, any data stored by the database other than those logs in only a cache of the logs. They happen to be large caches which are dramatically effective, but they're still just caches. -- cwillu
They're only 'just caches' if the logs are never cleaned up, which I don't believe to be the common case...
Isolated, as in the transaction sequence is reorderable?
No, as in the intermediate results of a transaction that is in process are not visible to other transactions that are currently in process. Results only become visible when the transaction is committed. MichaelStonebraker? called a transaction that commutes freely with all other transactions sterile in the paper at http://www.mit.edu/~dna/vldb07hstore.pdf
A term commonly used is "serializable". Isolated semantics are only relevant when concurrent transactions are allowed. "Serializable" means that for any given set of concurrent transactions, there is a serial ordering of those transactions which produces the same result. As a trivial example, suppose you have an attribute whose initial value is 10; and you have two transactions - one which doubles the value, the other which adds seven. Two different clients initiate the two transactions. Under isolated semantics, the only acceptable values for this attribute are 27 (the result of doubling first then adding 7) and 34 (the result of adding 7 first and then doubling). Any other value (such as 20 or 17) would be an error.
s fail to fully implement the "Isolated" requirement:
My attempt to insert a record into a table with a unique index can fail due to another transaction's uncommitted
insert of a conflicting record. -- JeffGrigg
Well, I'll have to add the isolation definition, which is not what has been presented above. However Oracle does not fail in this regard. Just think about it, any database at all has to either isolate
you from the other, or isolate
the other from you, it just can't let you both win. Probably you want that a database should let you both continue and at the end whoever arrives first to issuing a commit has the winner takes it all
principle. This is a good concurrency control policy, but is not the only one, and can be good or bad, based on a lot of other arguments. Just think about the fact that if an insert
returns immediately with an error on the primary key, and the primary key is generated, then you might be able to retry with another generated PK, without sacrificing the whole transaction.
Remember, in a transactional environment you are not guaranteed to always succeed, EVEN IF each of the operations you make is correct from your point of view. What you are guaranteed instead is Atomicity, either all your operations succeed or no operations at all. -- CC
When my process gets the error, it rolls back. Suppose the other process rolls back for other reasons - like some business condition it encounters makes it roll back. Now you've lost the "serializability" property: There is no sequential execution of the two processes that would produce the result just encountered, because there's no way my process could fail based on a record that wasn't really inserted. Thus, Oracle fails to implement the full theoretical ACID model.
Not that I think this is entirely bad: I've used this property to improve performance of a multi-process application I was working on. When one process hit the unique index constraint, it "knew" that another process was already taking care of the business entity identified by that key, so it skipped to the next transaction.
It's not entirely bad in a practical sense, but it does violate ACID -- the processes aren't entirely Isolated from each other. - JeffGrigg
Jeff, first of all let me tell you that I really caught Oracle of not fully supporting the serializable isolation level
, but that's another story. The serializable isolation level as defined in SQL'92 is not properly defined anyway, according to more recent research. However I don't think what you describe is the case of a breaking in serialization, nor do I think that what I discovered can have any practical relevance, I think it was rather a design trade-off because they probably thought too expensive to guarantee fully serializable
isolation level as defined in the SQL 92 standard.
The principal argument is that no matter what happens from your point of view
, you are never guaranteed a success. So the fact that if executed separately both programs should have
succeeded is not an argument. Because the only guarantee you are given is that when a commit
returns with success, you have your transactions, otherwise the transactions can fail in between for no apparent good reasons
. That's how transactions theory and database standards work, I don't know if we can do better, you propose that a transactions should only fail for verifiable good reasons. That is desirable, but I don't know if it can be achieved. Maybe further research will open this territory and will simplify database development for us.
Having said that, I like to ask whether it is not a poor primary key choice, or a poor choice in choosing how you generate your primary key.
I hope you don't interpret me that I'm putting the blame on you, it would be Oracle's fault if Oracle made you such a promise in defining serializability, which by the way it doesn't.
And I just started my Oracle DB to verify what you said, because it contradicted all my knowledge about how Oracle works. It doesn't happen the way you say. Namely: (Step1) start a transaction T1, (Step2) insert a record (PK=X) ... do nothing in T1 (step3) start another transaction T2, (step4) in T2: insert a record (PK=X) -- here Oracle blocks the client - exactly the way I knew it worked (step5) in T1: if I issue rollback
, then transaction T2 goes ahead with success in the insert operation, else if I issue commit
in T1, then transaction T2 goes ahead
with error on last operation. Everything was done directly from the console(sqlplus) and I tested all the combinations between read-committed and serializable for both transaction. Please note that the default on all major DBMSes is that no individual statement failure will trigger a transaction rollback. That only happens in certain middleware
products which play it the safe way, and anything wrong they catch they rollback. My wildest guess at this point in time is that you might be using something of the nature of Bea Weblogic, or TopLink
or something like that.
Good point, that it may be valid for transactions to fail "for no obvious reason."
Maybe a different Oracle version; this was several years ago, and Oracle (esp. Oracle 8) has made significant improvements. -- JeffGrigg
I am practically sure that old 7.3 behaves exactly the same way. I haven't had any experience prior to 7.3.
Jeff, the reasons might not be obvious to you. They are to the DBMS, and I assure you that no DBMS will fail a transaction for no reasons at all. The thing specified in the ACID contract that you have with the database, that the database can fail your transaction at any time before it replies OK to your commit. For example, it might be a deadlock situation detected only 10 seconds after you issue a commit (everything was fine till then), it might be that the disk is full, or it has broken, you CAN'T KNOW from a client perspective, and it is not your business to know.
For example, in case of a deadlock, you can't say: but I was here a microsecond earlier, you shouldn't have given the lock to the other guy
, there's no such thing, the database will guarantee the order of your operations, the ACID properties, and that's it. Anything else is entirely its responsibility and this makes programming against a serious relational database fun. It's a perfect example of SeparationOfConcerns
The only thing you have to know is that some data access patterns are more problematic to be handled by the database (they generate deadlocks like there's no tomorrow, and you'll see many developers in EJB forums complaining that they don't know what happens), so you might get better performance (transactional throughput) using a relational database in some other ways then a place to store your objects
. The positive thing is that in any case, even with deadlocks, disk crashes and so on (minus really catastrophic events) you'll have your data safe and sound. -- CostinCozianu
Sometimes for efficiency/speed
purposes, one may want to selectively sacrifice
some aspects of ACID. See ParallelNeedScenario
See also: AtomicConsistentIsolated