- I am talking about PHYSICAL IO testing for the storage systems based on traditional hard disks that use spinning parts. If you would like to test anything else, please forget about using ORION. ORION is good for physical IO testing ONLY.
- My interest is to test random physical reads typically seen as major IO consumers in OLTP systems. The focus here is on IOPS and Latency characteristics of a storage subsystem. If you are interested in a throughput or anything else, this blog post isn’t for you.
This posts finalizes a few weeks of work on comparing SLOB and ORION IO testing tools. If someone would ask me to test a STORAGE system’s performance (a.k.a. physical reads) from Oracle’s perspective, the first thing I would usually consider to use would be ORION. It just happened that a few weeks ago, the related dialog on Twitter between Kevin Closson and Alex Gorbachev drew my attention. You can find the details of the discussion here. The conversation led me to ask myself if ORION was really the right tool for the job. During the last few weeks, I made a few blog posts as testing progressed. You can find references to all posts under the “index” blog post here. Today, I am ready to share my conclusions.
Before I conclude, I would like you to think about the following questions:
- Can you control or easily predict how much storage cache (on any level) an application is going to use at different points in time (load patterns)?
- Can you dedicate just a small part of physical disks to your application and make sure that no other applications are using the rest of the space?
- If you don’t have the luxury to reserve just a small part of the disks for your application, can you make sure that your application data is located just in certain areas of the disk?
I am guessing that for all 3 questions, the answer will be “NO” in 90% of cases. Therefore, to test storage performance, we should:
- Eliminate any caches and test back-end HDD’s response time. Later on we can make an assumption on what % of Oracle physical IO requests is going to be served from storage cache under different application load patterns at different time frames and therefore forecast possible storage response time. As it’s really dependent on the application’s specifics and is different at certain processing time frames (e.g. reporting periods, batch processing, active consumer’s time frames, etc.), physical IO testing served from cache has little value. We can assume that any cashed IO is much faster than physical IO. In such cases, we should focus our attention on CPU testing.
- If we can’t control how our OLTP data is located over our physical hard drive surface, we should assume that it’s located randomly across available space. Note that we CAN control where our data located (e.g. we can allocate a partition to be used by data and leave the rest of the disk empty). We should make sure that we test the whole space that the application could use, not just a small part of it.
ORION vs SLOB – 2:0
Now it is time to talk about ORION and SLOB testing results. If you didn’t read it before, the figures, details, and explanations are available in the “Final results – ORION vs SLOB” blog post. The summary is as follows:
|Test #||Testing Tool||Data Placement||Latency||IOPS||IO per Spindle|
|1||ORION||Full 12 disks||11.35 ms||2114||176.2|
Let me provide Kevin’s and Alex’s statements that triggered my testing activities once again:
- Kevin Closson said: “Orion may give It’s VERY easy to get huge Orion nums but reasonable SLOB”
- Alex Gorbachev said: “lots of the system IO bound below the CPU level so you should see similar number with Orion or SLOB.”
After my testing, I think they are both wrong. First of all, in my case it was SLOB that gave “HUGE” physical IO numbers (tests 2 & 3), and it wasn’t “VERY” easy to get SLOB to report close number to what ORION showed in the first run (tests 4 & 1). Second, if we run SLOB in default configuration without verifying the results, we may get numbers that are too high and a claim that your system delivers a record PIO processing power. However, in reality those numbers are not real physical IO statistics that your application would benefit from. Based on my testing, there are two reasons:
- The first issue is Oracle kernel IO optimization. As an example, I can mention “db file parallel read”, and Kevin Closson is going to address the problem in the next drop of the SLOB kit. However, it may limit SLOB usage in your system as you may not implement the workaround in your production system. If you used SLOB for physical IO testing, check your AWR reports. If you see “db file parallel read” under “Top 5 Timed Foreground Events,” you just wasted your time and should redo the testing again. See this blog post for the explanation and fix.
- The second issue isn’t as easy to address. It is related to the fact that if your data is located in one area of a physical disk (tests 2 & 3), your physical IO response time is much better (by 100%) than if your data was randomly spread across the whole disk surface (test 4). Please note that it doesn’t matter if your data is located at INNER or OUTER parts of the disks. The performance is more or less the same independently of what part of the disk your data is located. The key is that the testing tool should read data randomly from all areas of the disks used by the application.
ORION is free from the mentioned SLOB problems and doesn’t require any setup, tuning, or extensive result verification. Next time someone asks me “Yury can you please test our new storage from Oracle’s perspective for us?” or “Yury we think that based on specification (spindles numbers), our storage under-performs and can’t proof it; can you help us?” I will consider ORION as the first option.
At this stage, if you are ready to use your arguments to state that SLOB is better for the physical IO testing than ORION, please be my guest and post them in the comments field bellow. However, before you do that, please reread the first part of the post (especially the questions that I asked).
For the record – There is no silver bullet
The following is my comment on Kevin Closson‘s blog post SLOB Is Not An Unrealistic Platform Performance Measurement Tool – Part I:
“IMHO: There is no silver bullet here. One tool is preferable in one case other in another. It almost like in performance tuning. Cary will say 10046 is the way to go 99% of the cases. Some will find that he may use STATSPACK resolving the same business problem in less time inverted. Yes 10046 will be more precise from the results point of view, however if other method is a bit more economically efficient and delivers reasonable results it could be used. Definitely we shouldn’t blindly rely on the results we receiving and should now each methods’ limitations and keep them in mind.”
I am sure that SLOB is a brilliant idea and is very good for all sort of testing (especially for LIO and Physical IO stack CPU testing). However, for real HDD physical IO testing, my preference goes to ORION today.