In my previous blog entry, I explained why I would expect Result Cache not to scale well. Unfortunately, at the time that blog entry was written, I had no access to hardware with more than two cores. That left me in an everything-but-the-proof state. “Theory without practice is sterile.” ©Albert Einstein.
Since then, I got a chance to re-run my test cases on a quad-core CPU, moving one step forward.
I re-executed my test cases with one to four processes against the
Buffer Cache and the
Result Cache in order to capture the number of lookups per second. I raised number of iterations to 1M to make the results more stable though.
Here is what I got:
# of processesBuffer Cache% linearResult Cache% linear
Both approaches demonstrate almost linear scalability, with Result Cache being slightly faster in all cases. The single latch problem is either non-existent, or four processes are not enough to saturate the latch. In order to clarify this, I collected a table with latch wait times as well:
# of processesBuffer Cache: CBC latches (ms)Result Cache: Latch (ms)% per process
You can spot the important data there. Although
Result Cache: Latch waits are still very insignificant, they were growing very rapidly — at a rate greater than the factorial. The reason I didn’t notice those is that, on a quad-core box with four concurrent processes, those waits are still too small to produce any major effect on results.