As some of you probably already noticed, there was a thread on AskTom discussing the scalability tests I did back in 2007. You are welcome to read the entire thread, but in a nutshell, Tom Kyte claimed that my tests did not reflect how one would use the result cache in the real world.
What is “real world?”
Of course, the important question is whether I tested a feature in a way it was never designed to be used, or whether someone is just trying to make an excuse for poor scalability results by defining “real world” in a way that makes my tests inappropriate.
A new feature
What do you do, then, you first see a new feature? You read about it in the documentation, and then you test it in order to compare what you have read with what you have in reality.
What the documentation tells us
Open the Performance Tuning Guide and go to 184.108.40.206 Result Cache Concepts:
When these queries and functions are executed repeatedly, the results are retrieved directly from the cache memory. This results in a faster response time. The cached results stored become invalid when data in the dependent database objects is modified. The use of the result cache is a database-wide decision.
All it says is that you have to have repeatedly-executed functions and queries to get faster response time. It says nothing about what kind of queries or functions. It also suggests that the result cache should be used database-wide or shouldn’t be used at all (which is perfectly sound according to Jonathan Lewis’s Rules for Hinting).
Now skip up to 220.127.116.11 Use of Result Cache:
OLTP applications can benefit significantly from the use of the result cache. The benefits highly depend on the application. Consider the use of the PL/SQL function result cache and the SQL query result cache when evaluating whether your application can benefit from the result cache.
It clearly says that result cache is perfectly appropriate for OLTP applications. They leave a backdoor with the words, “depend on the application” but, yet again, they say nothing about what kind of OLTP applications.
Am I reading something wrong there?
Now, I may well be misunderstanding something, but I got the following impressions about the result cache after reading the documentation:
- Its use is recommended for frequently-executed functions and queries.
- It is to be used database-wide or not used at all.
- it is highly recommended for OLTP applications.
The above doesn’t leave the impression that the result cache is not appropriate for caching small queries and functions being executed at a very high rate with high degree of concurrency (yes, I am spelling out the basic definition of an OLTP system right now). Moreover, the documentation actually suggests that it is recommended to use result cache in such circumstances.
What others are doing?
Let’s say there is still a chance that I misread documentation. Others must be doing something else, then. So I started to search around to see other examples.
I saw that Steven Feuerstein wrote On the PL/SQL Function Result Cache. You can spot the words “best practices” and “the real world” there. He even explains which kind of application he is simulating. This article was published in Oracle Magazine which means someone actually reviewed it before publishing and it shows good result cache usage.
What is the test about? It caches a function that does a single-row lookups into a table. I was doing exactly the same in my scalability tests with the exception that I was caching the query itself rather than a PL/SQL function that calls it. The results, though, are quite different. Of course, you can shave off more by caching a function call, and this article is able to demonstrate quite a bit of improvement, which is probably why no one has complained about this test being less than “real world” — despite the fact that it is doing essentially the same thing as mine did.
What about scalability?
The tests published in Oracle Magazine article lacks an important part: they didn’t test concurrency. If you read my blog posts regarding result cache, you already know that it is not the pure speed of getting something out which is a problem — it is concurrency that kills the result cache.
If I am right about the result cache not being scalable, I should see the same results. So I took the same test case and tested it in a concurrent environment (we’re after “real world” here). Here is how I created an employees table:
SQL> create table employees as 2 select level employee_id, rpad('x', 100, 'x') v 3 from dual 4 connect by level <= 10000; Table created SQL> alter table employees add constraint pk_employees primary key (employee_id); Table altered
Note that I am already giving the result cache an artificial advantage by not placing this table into a hash cluster (and a hash cluster is perfectly acceptable, considering test case description).
After the employees table, I created two functions in exactly as in the Oracle Magazine article — one that does result caching, and another that does not. I ran my tests in exactly the same way as I did before — parallel job processes executed the function using a random
employee_id. I measured how many executions per second we were able to archive across 1 million executions. I repeated every test twice, and picked up the best result. I also made sure both runs provided consistent results. As before, all tests were performed on an 8-way Itanium server. Here is what I got:
|# of prcoesses||Not Cached||% linear||Cached||% linear|
And a nice graph:
I hope you guessed which line is red.
And if we draw a graph out of “% linear”:
I wonder if anyone still says that the result cache scales. Because it simply doesn’t.
Of course, you may invalidate my tests by saying that, because everyone lives in a single-user environment, concurrent tests are not “real world”.
So what now?
I understand Tom’s point about my using result cache in a way he doesn’t envision it. Let’s not forget, however, that I wrote my blog post back in November 2007, and back then, all we had was a standard Oracle Documentation. I see my tests as perfectly in accord with what the documentation says (and you can actually see others using result cache as I did). And instead of discarding all these tests by simply “adjusting” what “real world” means, I thought I could better spent my time by clarifying these issues in the documentation.