Have you ever wondered how fast you can commit ?
I’ve been introduced to an application which was doing commits in every step of a higher level transaction. The result was a significant portion of the time was spent in “log file sync” and there was nothing you could do. A commit can only be so fast. Your overall high level transaction speed will be limited by how fast you can commit.
No matter what you do, a true commit will take time. The time will be very small, but it will still be more then doing an in memory query of a lookup table. And worst of all, you will serialize at the entire instance on it. If a commit has caused a “log file sync”, another session will have to wait for the already started flush, and then wait for it’s data to be flushed in effect waiting twice the time.
So I did a test on how fast you can commit with 1 session. This will be the absolute maximum you can achieve in commit performance.
Starting in Oracle 10g testing commit time is a bit easier. With the introduction of the new commit parameters, “batch” and “nowait” and more importantly, the “wait” counterpart.
The default commit does this:
1. writes ‘end of transaction’ record into the redo buffer
2. posts (sends message) LGWR, telling it to flush the redo buffer to disk
3. waits for the flush to finish into the well known ‘log file sync’ event.
The “nowait” and “batch” parameters control how far in those steps you want to go. Basically:
– commit write nowait – will avoid step 3 (it will still tell LGWR to flush the buffer)
– commit write batch,nowait – will avoid step 2 and 3, allowing the data to remain buffered into the LOG_BUFFER until either someone else commits, causing a flush, or some of the background events cause the buffer to be flushed asynchronously. The background events are:
– the buffer is 1/3 full
– the buffer is 1 mb full
– 3 seconds have elapsed
A lesser known fact is that PL/SQL will silently optimize your commits to “commit write nowait”. Only the last commit in a given call will be a real “commit”.
The reason for this optimisation is the nature of client/server relationships. In theory, if you call a PL/SQL procedure, the client gets no information about its progress until the procedure has returned. So why wait for disk flushes?
You can easily verify this by running the following example (10g only):
We’ll create this table in and use it in all of our tests:
create table c_test (a number);
Then run this code :
set timing on declare i number; begin for i in 1..10000 loop insert into c_test values (55); commit; end loop; end; /
And notice how much time and how many “log file sync” waits (the count, not time) you had.
Then run this code:
declare i number; begin for i in 1..10000 loop insert into c_test values (55); commit write wait; end loop; end; /
Now you will get “the real” number. The actual amount of time it took to wait for 10 000 disk syncs.
I specifically choose 10 000 for the loop number, so that I can easily measure the average time to commit. This number however, will not be pure as there’s some extra time for the call to return back to sqlplus, but it’s close enough.
I got 4.6 seconds. Doing the math, this means a single disk sync (or SAN sync should I say) took 0.46 ms. How much did you get? Please post your results along with a quick overview of the system you ran this on in the comments to this post!
For the record, this is LINUX Red Hat 4, with a 3.8 Ghz XEON with 2mb cpu cache with 64 bit Oracle. The redo is on a dedicated ASM diskgroup, using a single ASMLIB disk, which is a raid 1 (mirror) array of 2×36 gb hard drives, 15k RPM inside an IBM SAN, with 1gb of mirrored cache (512mb effective) with 2gbit fibre channel adapters. The SAN setting for “write-back cache” is on.
Now here’s the strange thing. I found that any other disk IO activity on the machine will dramatically increase your commit time. It doesn’t matter if it’s on a separate spindle/device/LUN. The time to complete the operation is so short that it is affected by the smallest change in the working conditions. In fact it is so small that I’m pretty sure just taking a deep breath will probably affect it. This is with the default IO scheduler. I havent tried any of the fancy ones.
I noticed this behaviour during archiving. I actually have 2 pairs (4 drives) with 4 redo groups on alternating pairs. So when I write to side A, side B can be archived and vice versa. So no concurrent IO on the redo spindles whatsoever.
If you’re wondering the time difference is about 30% in my case.
Running multiple sessions at the same time, and monitoring “service time” and “average wait time” with iostat deserve a separate post.