References to my SLOB testing related blog posts:
- ALL Pythian blog posts with Post Tag #SLOB
- My First Experience Running SLOB – Status Update 1 – or why I started the testing
- Status Update 2 (first results) – first conclusions Orion vs SLOB – 1:0
- Don’t repeat my errors (AWR) – explains the AWR issue that can screw up results
- Final results – ORION vs SLOB – results
- ORION vs SLOB – 2:0 – final post and conclusions
- SLOB on steroids – Part 1
- SLOB LinkedIn Group
If you are seriously interested in getting a good understanding of your IO subsystem performance characteristics, then one of the tools that you may consider for putting your system IO on stress is SLOB. See Introducing SLOB – The Silly Little Oracle Benchmark from Kevin Closson. I did and learned a lot. A bit of warning from my side: You should be ready to invest a bit of time (days in my case) to make sure that the results you are getting are reliable.
As SLOB takes the IO testing on Oracle Instance level, you need to watch for several things before you declare official testing results. My suggestion is: Don’t take any result for granted. You need to assess those before stating that you are done. Here is my list.
NOTE: This list is valid for Physical IO testing only as opposite to LIO or Oracle Physical IO stack testing (cached IO) and based on my personal experience. Your case/configuration may be different.
- Make data cache as small as possible. Don’t worry about any concurrency issues before you hit them. I set db_cache_size=12M, and it didn’t give me any concurrency problems even with 128 readers test. Details are here.
- Make sure that the SQL used by SLOB does what it’s intended to do – make a lot of “db file sequential read” based IOs. Newer Oracle versions bring some optimizations in the area, so you should carefully assess results before making them available to others. You will find some recommendations on how to avoid some of 22.214.171.124 version’s optimizations in this blog post (see hidden init.ora parameters).
- Run SLOB several times and make sure that the results you are getting are consistent. Otherwise, question those. One thing that I found during my tests was the fact that earlier versions of SLOB had a “bug” generating awr.txt reports. AWR didn’t cover all my tests’ time. For details and solutions see “My First Experience Running SLOB – Don’t repeat my errors (AWR)“.
- Keep in mind that Inner and Outer areas of HDDs have different performance characteristics. If your goal is to test your real application IO performance, then think about in which disk areas your application data will be located and try your best to locate SLOB data the same way (no blog post here as of now).
- Don’t put all your test data close to each other. If HDDs in your system are big, SLOB puts test data in the same/close area. Oracle doesn’t need to move HDDs’ heads that much to read the data in this case. It rarely happens in real world configurations. If you do use default SLOB data load tests, results may not be close to your application’s IO load (no blog post here as of now).
- Switch ALL other load on your system if possible. If you seriously want to get reliable and consistent results, you should switch off all other things that may impact testing results. Even if you created an empty database (as I did), you should put some effort into switching off some IO impacting processes. The following are in my list as of now: Backups, Oracle Automatic Jobs, All other Scheduler and DBA_JOBS tasks, and Resource manager(no blog post here as of now).
- Avoid any caches. If you want to test real Physical IO performance, you should make sure that no data is returned from any kind of cache. Some would say: “Hey, we do have caches in day to day operations, why should we switch those off?” Well, sit back for a moment and ask yourself what you are trying to achieve with these tests and read the note at the beginning of this list once again. As a rule of thumb, any IOs served in less than 5ms have a good chance to be returned from some sort of cache (unless you are using SSDs). Try your best to verify it and adjust your test accordingly.
- Switch on 10046 trace for some of the SLOB sessions to verify the results. The AWR.txt report gives you the average numbers only. This means that there is no way to verify what percentage of your IO requests are returned from caches. In my case, AWR reported 5ms average IO response for “db file sequential read” events. This seemed to be a reasonable figure. However, luckily enough, I switched on 10046 trace for some of the SLOB sessions. Using Method-R MrTools, I easily found that 51% of all IOs for the test served in 0.1ms. Most of you know what that means: IOs has been served from cache. My 5ms response time results weren’t reliable. I had to redo the test.
- Added on 2012.05.21: Make sure that “db file sequential read” is the only IO-related event in the AWR TOP 5 events. If there are other IO-related events, the SLOB test probably didn’t work well. As an example of other IO-related events, I would mention “db file parallel read”. If you see it in your SLOB AWR report, you should investigate why and redo the test.
I think I will stop here for now. I seem to have given you enough material to think and work on. I hope that my work will let some of you get back to your SLOB testing. I will try my best to add some more details to this post later on as time permits.
If you like my work, please do me a favor: Let me know about any other SLOB PHYSICAL IO testing-related blog posts in the comments! :)
Good luck with your testing,
P.S. My current impression after 3+ days in SLOB testing is that it takes a lot of time and fine tuning to get reliable PIO testing results. Orion, however, just tests Physical IO, providing pessimistic results straight away.