Larry Ellison is announcing a major new feature this Wednesday at Open World. For the first time in a while, his keynote is dedicated to the “database” as opposed to the usual high level ERP/Apps/Fusion. Even the title of his keynote is catchy — “Extreme Performance”.
Oracle has been keeping the new feature a secret. Even the 11gR2 beta program had very few participants to prevent information leaking out. It’s, “Something’s coming, but I am not telling what.”
Okay, it worked on me, I’m excited about it. Let’s think what it could be. What single database feature is so major, that Larry himself will announce it during OpenWorld?
What do we know so far?
- Starting with the obvious, Larry’s keynote is “Extreme Performance”, so it’s related to performance.
- We know Kevin Closson has worked on it – he had a blog entry saying “I am working on something big” that got pulled off the web.
Given these two points, let’s further think about it. What do we know about Kevin?
- He worked for PolyServe — a company whose main product is a cluster file system.
- He worked for Sequent on NUMA systems, which in today’s world is pretty close to cluster software with a very fast, low latency interconnect.
- He is an expert in storage systems and disk performance.
- He joined Oracle recently, possibly to work on this secret project.
- He must be really excited about it, to post anything on his blog under radio silence.
I think it’s something related to storage, something new and revolutionary about storage. But what?
We already know, from leaks on certain websites, that ASM will become a cluster filesystem which will allow storing OCR files, as well as user files, on the ASM disks.
But is this big enough? It’s definitely significant. Now you get a “free” reliable, cluster file system with Oracle. I don’t think it’s big enough though. Oracle already had OCFS and OCFS2. So it’s not something new to release a filesystem. And even if ASM becomes a true filesystem, that would not provide such a significant performance boost to warrant a keynote called “Extreme Performance”. An ASM filesystem would be a major manageability feature, not so much a performance feature.
That being ruled out, what could it be?
Recently, when setting up a new 11g database on a server with 128gb of RAM, I was setting up hugepages as usual, and thinking about how big my cache would be. It struck me that the cache will be bigger than the database for quite a while. Why do we even need the SAN/Datafiles?!
Then it hit me.
We don’t! We don’t need them at all!
What are the main storage components of a production database?
- the redo logs — to guarantee crash recovery
- the datafiles — primary storage
- the backups — mandatory for a production system
- the SGA — why is this part of storage? Well, because you can’t have a database without some fast in-memory storage, right?
If you have sufficient SGA (RAM) to load your entire database (datafiles), why do you need the datafiles?
I am sure you are immediately thinking what if the database crashes?
Remember, what’s the recent push in Oracle: grid computing.
Picture a RAC database – 8 nodes, 128 GB of RAM each, totaling 1 TB of storage. Add 2- or 3-way mirroring and you get 300 GB of highly redundant, extremely fast storage. A true, native “in-memory” cluster database. A true “shared nothing” cluster database.
Even if you do not consider the performance increase, the redundancy level goes up. You no longer have a “central” SAN to rely on. Maybe you have two mirrored SANs in your enterprise to protect you against such failures. How about none?
Let’s keep moving with that idea. How can Oracle achieve it? What technologies would be needed?
I think Oracle already has all the required technologies to achieve this “extreme performance”. It’s just a matter of connecting them.
And the answer is Cache Fusion. But how? Imagine this scenario. During database startup you would “restore” your database from your backups (compressed or not) directly into memory. Remember that’s 8 nodes that are doing the uncompression/reading. So starting up won’t really take much time.
Once the database is up, cache fusion will take care of the rest: sending blocks over the interconnect, keeping past images, keeping and managing multiple copies. Oracle already does this, just not for redundancy reasons.
If a node (or two) go down, who cares? All the data is already replicated 2- or 3-way. In the event all nodes go down, Oracle would still keep the online redo logs for archival purposes. Or maybe not? Replicated in memory REDO? Why not?
In fact, the only real changes are:
- backup will be restored into memory
- no dbwriter — no datafiles to write to
- cache fusion block replication for redundancy
The result? “Extreme Performance.” Now that’s definitely worthy for a keynote by Larry himself.
A major innovation indeed. For Oracle, at least. MySQL cluster databases are already all in memory. Actually, it’s the only way it can be, and this is seen as a limitation by the community, simply because it’s the only way.
Oracle doesn’t need to make the feature exclusive for the entire database. This may be a tablespace level feature, or even a table/partition level one. Then you would really be in control of which areas of your database get “extreme performance”. Think of the possibilities.
We were brainstorming with Paul Vallee on what the new feature could be. Paul’s idea was slightly different than mine. He envisioned ASM to be the driving technology behind an all-in-memory database. ASM already has 2- and 3-way mirroring. The change would be minor — instead of creating disks out of LUNs, they would be created out of RAM. ASM would take care of the inter-node replication.
If Oracle had an all-in-memory database done with ASM, you would still have to “read” the data into the buffer cache, introducing double-buffering. This would be a step back, actually. In the PC world, Windows NT/2000 revolutionized caching from DOS/Windows95. The merging of the file system cache with the execution memory was a significant step forward to avoid double buffering. And this would limit the granularity of what is “all-in-memory”.
This is how Paul’s idea looks:
We have our bets. What’s yours? Please throw in some wild guesses. The winner (the earliest correct guess) gets a Pythian Maestro shell shipped to him or her. (NOTE: I was going to write â€œdoes not apply to Oracle employees, but I decided to give them a chance too. As long as you don’t know and you are guessing, you can try).
Here’s Darrin Leboeuf, Pythian’s V.P. Client Services, modeling the Pythian Maestro shell.
Well TimesTen has been languishing for several years under Oracle ownership, so maybe they can use the technology. It might avoid some emberassing benchmark differences compared to MySQL’s in-memory cluster implementation.
So back to the speculation, Oracle’s track record in the filesystem department isn’t great (OCFS, anyone? Oracle iFS?). I don’t think the new product will be as ambitious as an in-memory database, but it will:
– Have easier and more robust installation/management, at least compared to ASM
– Be cluster-oriented: clusters are not only the largest growth market for Oracle, but are not as well served by traditional filesystems
– Be integrated with oracle’s clusterware: to keep things in the family
– Have specific perofrmance optimizations for solid-state drives, minimizing their limited write-erase cycles while taking advantage of their lightning-fast response times
I think it may be even deeper than that – based on peer discussions as well as some speculation on my part – The fact that cluster and OS and IO core drivers are all tied directly to a kernel – Perhaps the Oracle Extreme performance is a combination of what is conjected by Christo and postulated by Paul… Perhaps Oracle and Unbreakable Linux MERGE! Oracle-Linux becomes a bootable software that in turn makes it part of the kernel with all the resident-style goodness to control all aspects of the SGA directly within OS/IO/Memory modules of the Operating environment…
Anyone else got a guess?
Since everybody is quoting “This is how MySQL does it”, we might as well continue down that track but from an optimized storage point of view.
I would be nice to see something that deals with loading and quickly querying through large data sets where we don’t have to worry about indexing or partitioning.
The Brighthouse Storage Engine claims to be such a beast boasting column based storage with high compression rates along with a Knowledge Grid that automatically stores aggregate data at many levels.
In other works, a new highly compressible table type, that presents itself as a regular table to the DBA, with its own specialized mini-optimizer to return summaries from enormous amounts of data quickly.
“A major innovation indeed. For Oracle, at least. MySQL cluster databases are already all in memory. Actually, itâ€™s the only way it can be, and this is seen as a limitation by the community, simply because itâ€™s the only way.”
Not to split hairs, but with version 5.1 of MySQL the data can be stored in tables on disk in a cluster. The indexes still have to be in-memory. While 5.1.X still isn’t “production worthy” many companies use this version for MySQL cluster. Course many companies are already moving to 5.1 anyways..but I don’t want to turn this into a post unto itself!!
Your comment actually further supports my posting, that this *needs* to happen if Oracle is to compete with mySQL.
Willing to bet it’s an integration of ASM and one or more storage technologies – like the HP EVA – into a global clustered file system.
With the ability to store ANY data clustered, not just Oracle datafiles.
And with global cache performance, of course!
Some of the whispers I am hearing from the customers and prospect I have been to chatting to are around pushing the Oracle Compression algorithms through the roof. Compressing data > 40:1 makes IO wait time almost redundent and fits with the extreme performance messsaging. Just a hunch :o)
I believe it to be:
1) Running Oracle certified on EC2 and S3 (for backups with RMAN), have been thinking about that for the last year myself. Amazon just send me a news letter talking about this with a link to Oracle and that link doesn’t work and searching the oracle site doesn’t give anything either
2) Better compression and filtering on the disk level. So why not push the filter of a query to the disk, you only get the data that you need in the Oracle buffer cache. Making the cache more effective. Now that would be an project that Kevin would love working on.
3) Something with RAC, as the RAC has left the station a couple of years ago and it has run out of track and is going no where. Yep many people seem to be talking about it and even some people are using it. But there are better alternatives for most customers for RAC. But still, what is up with RAC lately? May be an de-support notice, as running without RAC will extremely improve performance of most applications :)
Anjo’s no 1 was announced already. I think I’ve got the newsletter with the link on Sunday.
Smart integrated storage gets my support as a winner (we actually circulated this in the emails recently).
Plug’n’plan cluster – nothing new but if deployment is finally done quickly, reliably and with a click of the mouse then great. But I bet it will be via OEM or OUI so it won’t work reliably.
We can also expect something from virtualization area.
My bet is still on on columnar storage. If Oracle adds that, the will own the data warehousing market for a long time. Add in super compression and it’s a home run.
I recently sat together with some collegues and developed a solution similar to Pauls. We did not test it in real live, but it should be possible with 11g ASM as well: 3 way mirrored, one group of ram-disks, this one as ASM_PREFERRED_READ_FAILURE_GROUPS. Only the procedures for shutdown (should these disks be dumped anywhere?) and startup (restore from dump or let ASM re-create the failure group?)
Nevertheless, i prefer the first way.
My personal guess is something like “ASM as full featured cluster-filesystem with integrated cache, which eliminates buffer cache”. Only the result cache will remain within the instances.
I know nothing, but Luke Lornegan’s comment here seems quite specific in naming hardware that might be part of this.
Best wait until tomorrow, I reckon, and I’m sure there’ll be more detail to this but there are some pretty dramatic suggestions here that I would have been quite complex ti implement quickly.
[…] of buzz about the X key note that will be just in couple hours and even non-OOW attendees are rumoring about […]
@Anjo RAC isn’t going nowhere, but instead they’ve been busy with 11gR2 stuff and no one is yet talking about those new features yet as I don’t think they’re finalized. However, I think it’s safe to assume that while 11gR1 was a bit of a yawn from a RAC perspective, 11gR2 will give RAC followers something(s) to talk about for sure.
[…] everyone guessed so wrong, that they were covering their embarassment (and, really, as I posted on Christo’s theory blog, a very close version of this already existed here, in Luke Lonergan’s comment.)I suppose you could […]
[…] Some incorrect speculation shortly before the announcement focused on the possibility of OLTP without disk, which clearly would speed things up a lot. I interpret that in part as being wishful […]
the grid has a more complex structure, which will require high administration cost.