I was recently sent a link to a blog post where someone looked in to how compliant databases were that claimed to be ACID. This is a very interesting post. I also comment in this post on MarkLogic’s ACID compliance…
This blog describes ACID compliance and different ANSI SQL levels of isolation. The most isolated (and thus probably most desirable / widely applicable) is Serializability. It turns out that Oracle is not Serializable in it’s ACID support! This surprised me as most people, much like myself, tend to think of Oracle when asked for an ACID compliant mainstream relational database.
It should be pointed out that this is not just a purely academic point when it comes to Isolation levels. As mentioned in section 6.2 of this paper [PDF], if you are not Serializable but instead provide Repeatable Read isolation, then it’s possible to get phantom results. Consider a transaction that reads a set of information, then later on in the transaction reads the ‘same’ information. Typically this would be relationships between tables. If a separate statement committed outside of the current transaction does an insert, it is possible in Repeatable Read isolation for the second read in the transaction to include that information.
This is because Repeatable Read guarantees only that you see the latest committed information – because that insert was committed before the first transaction re-read the information, it then sees the new information. Rarely a problem? Possibly. I for one though would not like to write any complex procedures on a database that didn’t give me the truth at the time I started the transaction.
Thankfully some systems, like MarkLogic’s Enterprise NoSQL database, use a system called Multi-Version Concurrency Control, or MVCC. This means that my transaction effectively operates at a point in time, or more accurately at only one version identifier of the database. In this system if the second transaction adds information, my original transaction will not see it. This is because it will have a higher version identifier. This is even true if you do an update, not an insert, as in MVCC you only append to the database, you don’t directly change the original information. This makes MarkLogic very scalable. Merges occur later to remove ‘old’ data versions. Thus my transaction can happily look at version 42 whereas a new transaction could look at the updated version 43.
Thus MarkLogic is Serializable. Incidentally, we’re Serializable by default unlike some other NoSQL products. All our benchmarks are done with this enabled, again unlike most Open Source databases. Also, thanks to our ability to provide multi node clusters on commodity hardware, we are able to offer very high degrees of scalability. The BBC ran all their Olympics results through a MarkLogic system, to great effect, and this barely dented their CPU utilisation, even with millions of visitors a day to their Olympics website. We have some very large civilian and non-civilian customers, but unfortunately I can’t share numbers due to confidentiality reasons.
We also have partners, such as SGI, who sell hardware for MarkLogic. SGI is unique in that they sell a DataRaptor alliance with MarkLogic pre-installed and tested. If you have a lot of data and want the fastest system possible, or largest storage possible, without sourcing hardware and validating it yourself, then you should check out the SGI DataRaptor appliance. (Be sure to let the sales rep know it was me who told you about it!)
If you have a project with Terabytes or Petabytes of information and want to know how we could help, then please contact me at adam dot fowler at marklogic dot com.