Tonight at Oracle OpenWorld, Oracle CEO Larry Ellison announced the latest Exadata release: X3. It sports a refreshed hardware, including a big expansion in RAM and flash storage capacity: 4TB and 22TB respectively on a full rack. That works out to 512mbGB RAM on each database server and a bit over 1.5TB of flash on each storage server.
It’s being promoted as a hierarchical caching strategy, Exadata X3H2M2, where the most-frequently-used 4TB of data is in the database buffer cache, the next 22TB is in flash, and the least-used sits on disk.
Larry continued to explain how this approach was much better than “flash disk” storage because it could intelligently adapt to data usage patterns. This is where I disagree: There are pros and cons to both approaches.
A flash cache approach, as in Exadata X3H2M2 (Yes, it’s a mouthful. Will George Lucas sue for stealing his robot branding?) has the advantage of not requiring data redundancy: Since the master copy of the data is on a set of redundant disks, the caches don’t need the same level of redundancy, which makes for a more efficient use of flash capacity.
That having been said, flash caching implies that the entire dataset will not fit in cache and that requests for data not in cache must still go to disk, with the associated performance penalties. And depending on usage patterns (ad-hoc querying being one of the worst offenders), this uncached data access volume could be significant.
11 Comments. Leave new
As you know the exadata flash cache has a keep option where you can pin entire objects in thw flash cache if you so desire. I would argue with you however that an intelligent cache will be able to cache your entire working set but since you may disagree, the keep option will suffice. The cache still allows you to virtually increase the size of the cache. Tell me a use case with this where the flash disk option is better.
Hello G,
Thanks for stopping by. I’m not advocating CELL_FLASH_CACHE KEEP here; all I’m saying is that if the active dataset is larger than the cache, some data access will have to come off disk.
Some back-of-the-envelope numbers: the X2-2 datasheet says up to 224TB usable space for a high-capacity full rack, and the announcement today says 22TB flash/4TB RAM. Or on a quarter rack, 48TB usable and 4.8TB flash. Assuming the FRA is stored externally, that’s only 10% flash cache coverage and 2% buffer cache coverage.
Marc
The alternative you propose with using everything as a flash froddisk is what I have an issue with. If you have an active working set that is larger than the cache size in flash, using a flash griddisk based solution does not solve the issue.
Also, for a DW, that 22TB of flash could store around 220TB when you factor in compression. I am also not sure that the use case for an oltp database with a greater than 22tb exists much (and that is not accounting for oltp compression). Thats why the exadata material says that you can fit the entire active working set for most databases, I think.
Hello G,
I think we’re talking about different systems entirely. By a flash disk-based system, I don’t mean an Exadata machine with flash griddisks. Rather, I’m referring to a non-Exadata platform with all-flash storage. There are quite a few of these platforms on the market, and I understood Larry Ellison’s comments about the superiority of flash caching to be directed against these vendors in particular.
So in the purely Exadata context, I do agree with your point: the most efficient use of Exadata’s flash memory is for caching.
Cheers!
Marc
oracle exadata x3. Once again rumous. I tried to understand x3 architecture and what Oracle has tried to do. For a big data set with data warehousing envinoment keeping active and cold data set on different media means you are destroying the purpose of parallel processing.
I think its not wise of keep active data set on flash and cold data set on disk for data warehouse environments.
However, for OLTP purposes its okay. Once again Oracle OLTP minded professionals failed to understand its. objective. The architecture did improve the warehouse performance but not by using flash cache. We are in POV phase of new x3 and my results prove that fact.
But its not what Marc Fielding or G says. keeping warehouse data on flash means you have to read disk and flash each time your parallel query runs. does not make sense to me.
For a quarter Rack the size of disk in one cell is 9.5TB HP disks. with 4.8 flash (Raw). its are completely wrong concept as usually promote here.
Hello Amir,
Very interesting comments… thanks!
“I think its not wise of keep active data set on flash and cold data set on disk for data warehouse environments.”
Why not? If the vast majority of the queries in your data warehouse touch a small proportion of your data (for example, recent rather than historical data), there is definitely a case for hierarchical storage. However as flash gets less and less costly, the case becomes increasingly less compelling.
“you have to read disk and flash each time your parallel query runs”
I’m not understanding your point of comparison. If you’re comparing to RAM and the buffer cache, you do have in-memory parallel query. And even if you’re doing direct path reads, if you active data set lets them all come off flash, you can take advantage of both the flash performance and hopefully storage offload. Though it won’t be as fast as RAM, naturally.
Another point that hasn’t been made is that flash is an order of magnitude faster than magnetic storage for random access, but for sequential access (think smart scans) the margin is much narrower.
And by the way, I’d be interested in hearing how your POV works out.
Cheers,
Marc
please mind the typo error
For a quarter Rack the size of disk in one Exadata Quarter Rack is 9.5TB HP disks. with 4.8 flash (Raw).
Marc,
Thanks for reply.
I was looking for exactly the similar question and they are quite valid ones but if you take huge data as we are testing on our Quarter Rack. You would find out the Exadata x3 is just increase in Flash cache and Exadata software which enables the frequently access queries to read and write to Flash cache.
Now when you say frequently access queries, then you are not talking about data warehouse queries. Why?
Because data warehouse queries are Adhoc because of nature of reports are adhoc. users will not execute the same reports hitting the same data time and again. Users will use data as source of knowledge and try to extract analytical analysis needed by executives. so they have a small reporting system and rest of their system will be adhoc contain Analytic and predictive analytic reporting. if it has canned reports then its not Business Intelligence. In Business Intelligence all of your data on disk is your active data, because adhoc queries should be able to hit any data for analytic purposes.
so the concept described in this thread is not of warehouse thread and oracle is conveying wrong information just for marketing purposes or it may be conceives wrong by audiences
in warehouse system vast amount of queries dont hit the same data. However in OLTP systems its true. in warehouse analytic each query is different and will hit different data.
coming to the second point.
“you have to read disk and flash each time your parallel query runs”
Marc, its gets complicated but I try to explain. Only range scans and indexes will be benefited from Flash Cache. parallel queries will yield little gain but not huge. How?
First I come to database cache or in-memory parallel processing. its a stupid concept, which contracts itself. How?
according to Oracle doc, if data has been accessed frequently and more certain ratio 2/3(cannot remember the exact ratio) is in-memory oracle will do in-memory parallel processing. Then Oracle 11g introduces a serial direct load read which is, if a query is fetch data above a certain threshold it will do a direct read by passing the database cache. which leaves us only one change to populate the cache and that is small reads of table through indexes. Tell me under this condition what is possibility that 70% of table is in cache. does not make sense and features contractions functionality of each other
Another Technical problem which would in reality be a huge problem if Data warehouse keeps active on Flash and Cache and Disk. Image a table which is on cache, Flash and disk and a full table scan is needed. Lets say the Degree of Parallism is 8 (computed by database). so it spans 8 slave processes S1, S2, S3, S4, S5, S6 and S8. Lets says S1, to S4 reads data from Cache and Flash and S4-S8 read data from disk. Since S1 to S4 is in memory and Flash they are 10x faster. if S1-S4 execute in 1 minutes, S4-S8 will execute in 10 minutes ( due to data read from disks) so the query will finish in 10 minutes.
Since in data warehouse parallel query. Time to execute a query is:
Time to execute a parallel query = Max { s1,s2, s3,s4,s5,s6,s7, s8}
That is why parallel query distribute the data evenly across these slave processes. so that they all Finish in near equal time.
Whats been described here by Ibrahim does not follow the traditional parallel query processing. May be its something new but the deeper you go, that you have to say,
Exadata is not in-memory warehouse system. The in-memory systems like SAP HANA keeps all the adhoc data in-memory compressed. There is no active data in Hana.
Hi Amir,
I just realized that I never replied to your last comment. SO if you don’t mind rolling the time machine back to October 11, I do have a quick comment to make.
My experience has been that while data warehouses experience ad-hoc querying, these queries frequently tend to reference much of the same underlying data, particularly when it comes to time periods. Consider if your data warehouse had 10 years of sales data, you may see that more recent data is accessed much more often than older data.
Marc
Also, Please check the what is in-memory data warehouse like SAP HANA. in SAP HANA there is no active data all the data in memory is compress by columns and indexes by zones (storage indexes in Exadata terms).
Since media of reading is same and data distribution is even. all the slave doing parallel query in Hana finishes at same time
We have implement SAP HANA in China for one of our client, Oracle Exadata is no where near to SAP Hana and Oracle knows this so they are using marketing tricks to buy some time.