Disclaimer: Much of what follows is pure speculation on my part. It could be completely wrong, and I’m putting it out there in the hopes that it’ll eventually be proven one way or the other.
- Underscore parameter _use_adaptive_log_file_sync
- Default value changed in 184.108.40.206 from FALSE to TRUE
- Dynamic parameter
- Enables a new method of communication for LGWR to notify foreground processes of commit
- Old method used semaphores, LGWR had to explicitly “post” every waiting process
- New method has the FG processes sleep and “poll” to see if commit is complete
- Advantage is to free LGWR from CPU work required to inform lots of processes about commits
- LGWR dynamically switches between old and new method based on load and responsiveness
- Method can switch frequently at runtime, max frequency is 3 switches per minute (configurable)
- Switch is logged in LGWR tracefile, we have seen several switches per day
- Few problems in general, possible issues seem to be in RAC and/or the switching process itself
We’re working with a customer who had a very successful Exadata go-live last weekend. They moved a large application mostly unchanged from 9i to Exadata – and we’ve generally had very positive feedback. (!!) It might be that just getting on modern hardware and software accounts for this, but it still gives everyone a good feeling.
Nevertheless, there are always a few little adventures. One of these was a mysterious orange glob on the Cloud Control radar Monday at 1am. Orange is the “commit” class – so this was a little surprising! (And reminiscent of Karl Arao’s famous halloween monster!) A few drill-down clicks and we’re looking at “log file sync” events. To make a long story short, it wasn’t any of the usual suspects, and the necessary diagnostics info wasn’t available after the fact to say what happened with certainty. We’ve done a fairly comprehensive incident report and it hasn’t recurred… So I’m happy for now.
Anyway, I haven’t gotten to the really fun part yet. The really fun part is the #everydaylearning as Yury would say. In the process of analyzing the orange glob, I ended up investigating whether it might be related to a new feature called Adaptive Log File Sync. I’d never heard of this before so I had quite a bit of fun learning, and it seemed worthwhile to share a few things I found.
Before I dive in, one other important thing: I have to mention Christo Kutrovsky. We were both on-site and digging into this feature at the same time. There was a constant exchange of competing theories and different ideas about how things might work. Together we assembled a strong, in-depth, and working theory very, very quickly – I love working on a great team.
Log File Sync
First off, there are a few basic starting points for anyone troubleshooting time spent in the log file sync event.
- For past problems, use AWR reports and ASH data.
- For live problems, use sql trace and Tanel’s snapper.
- Read Riyaj Shamsudeen’s excellent article about Tuning ‘log file sync’ Event Waits.
- Read Kevin Closson’s excellent article about LGWR processing.
- Note 34592.1 is a generic reference note explaining the wait event.
- Note 137696.1 is a useful troubleshooting guide.
- Note 1064487.1 is a useful script to collect some diagnostic info for LFS troubleshooting.
In my specific case, I discovered quickly from the ASH data that this was a very unusual situation. This led to some more creative searching in the Oracle Support KnowledgeBase – at which point I discovered something very interesting:
Adaptive Log File Sync
Note 1462942.1 describes a feature whereby LGWR can switch between log write methods. This feature is enabled through an underscore parameter called _use_adaptive_log_file_sync and the description of this parameter is: adaptively switch between post/wait and polling.
Now I’ll be honest: searching on the internet and searching in Oracle Support’s KnowledgeBase for “adaptive log file sync” yields almost nothing. I do believe in the principle of BAAG – but this is a case where some guesswork might be useful, especially to guide experimentation that could nail down more concrete answers. Hence the disclaimer at the top of this article.
The words “post” and “wait” indicate that we’re talking about semaphores. For some general background on semaphores, check out the Wikipedia article and for more detail about the unix post() and wait() calls, a good resource is chapter 30 from Andrea Arpaci-Dusseau’s (UW Madison) textbook on Operating Systems.
Oracle uses semaphores extensively. In fact, LGWR and commits are specifically mentioned as an example in the Performance Tuning Guide – and Oracle even has an API to replace semaphore use with a third-party driver for lightweight post-wait implementations. (Like an ODM for your OS kernel.)
If you look for references to “commit” in the Oracle docs, you’ll find the word “post” everywhere when they talk about communication between the foreground processes and LGWR. Now, remember that a COMMIT has two options: first, IMMEDIATE or BATCH and second, WAIT or NOWAIT. It looks like this to me:
- Immediate: FG process will post to LGWR, triggering I/O (default)
- Batch: FG process will not post LGWR
- Wait: LGWR will post FG process when I/O is complete (default)
- NoWait: LGWR will not post FG process when I/O is complete
Why change a good thing? My theory is that it ties back to a rare problem Riyaj mentioned in that blog post over 4 years ago. I’ll quote him:
“LGWR is unable to post the processes fast enough, due to excessive commits. It is quite possible that there is no starvation for CPU or memory, and that I/O performance is decent enough. Still, if there are excessive commits, LGWR has to perform many writes/semctl calls, and this can increase ‘log file sync’ waits. This can also result in sharp increase in redo wastage statistics.”
Maybe Oracle is reading Riyaj’s blog? It appears that they came up with a new algorithm where LGWR doesn’t post. My guess: foreground processes can probably still post LGWR – but LGWR never posts back. Instead, foreground processes in WAIT mode “poll” either a memory structure or LGWR itself. It could be an in-house implementation by Oracle, it could still use semaphores, or it could use message queues somehow [seems like a long shot but the unix poll() call is found there].
There’s one interesting challenge that I can think of in implementing this. With the semaphore approach, all the active commits are *sleeping* (off the CPU) while LGWR flushes the log buffer. There could be dozens or hundreds of foreground processes simultaneously commiting on a very busy system. If we switch to a polling method, how do we ensure that these hundreds of processes don’t start spinning and steal CPU from LGWR, making the whole system even worse than it was in the beginning?
The answer might lie in a quick search for “adaptive_log_file_sync” from the underscore parameters. There are five more hidden parameters with that string:
select a.ksppinm name, b.ksppstvl value, a.ksppdesc description from sys.x$ksppi a, sys.x$ksppcv b where a.indx = b.indx and a.ksppinm like '\_%adaptive\_log%' escape '\' order by name
|_adaptive_log_file_sync_high_switch_freq_threshold||3||Threshold for frequent log file sync mode switches (per minute)|
|_adaptive_log_file_sync_poll_aggressiveness||0||Polling interval selection bias (conservative=0, aggressive=100)|
|_adaptive_log_file_sync_sched_delay_window||60||Window (in seconds) for measuring average scheduling delay|
|_adaptive_log_file_sync_use_polling_threshold||200||Ratio of redo synch time to expected poll time as a percentage|
|_adaptive_log_file_sync_use_postwait_threshold||50||Percentage of foreground load from when post/wait was last used|
It appears that there are a number of knobs to turn with this new algorithm – and it looks like Oracle is somehow dynamically calculating a “polling interval”. Furthermore, it seems to be taking “scheduling delay” into account.
It also seems that by default, we’re limited to switching modes every 20 seconds (3 per minute). This switching seems to be controlled by “thresholds” – and the threshold for enabling the new polling mode seems to be based on time. From the default percentage, it looks to me like Oracle won’t switch until it thinks the poll time will be less than half of the current post time. For switching back it also seems to be another “half” percentage (50), though I’m not sure what “foreground load” might mean.
Now there’s one other underscore parameter which I noticed while looking for things related to log file sync:
|_fg_sync_sleep_usecs||0||Log file sync via usleep|
This parameter is interesting for one reason: The adaptive sync parameters were introduced in 11g, but this parameter was introduced in 10g. And if you think about the name, it actually sounds very similar to the “polling” strategy and not like a semaphore strategy at all! Foreground processes call usleep() during a log sync – if you’re calling wait() then you don’t need to sleep. But if you’re polling, you definitely need to sleep. Maybe Oracle has been working on this idea since 10g? I wouldn’t put it past them. :)
So how do you know if you’re using this feature? The most obvious sign will be in the LGWR trace file.
There will be messages each time that a switch happens, looking something like this:
*** 2012-10-16 01:47:50.289 kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=8 switch_sync_count_delta=59 *** 2012-10-16 01:47:50.289 Log file sync switching to post/wait Current approximate redo synch write rate is 2 per sec *** 2012-10-16 02:51:19.285 kcrfw_update_adaptive_sync_mode: post->poll long#=51 sync#=352 sync=4600 poll=1061 rw=500 rw+=500 ack=7 min_sleep=1061 *** 2012-10-16 02:51:19.285 Log file sync switching to polling Current scheduling delay is 1 usec Current approximate redo synch write rate is 117 per sec
I mentioned that the underscore parameters were introduced in 11g; however, my current understanding is that only in 220.127.116.11 did the default value of _use_adaptive_log_file_sync change to true!! Is it comforting that they waited so long, or is it scary that they’d make this change with no available documentation or troubleshooting information? I guess it depends on your perspective. Here’s mine:
- This is a change of critical, core code that impacted every single database installation and upgrade.
- I think it is a strange departure from Oracle’s normal practices to change such important code with absolutely no announcement, documentation, or technical troubleshooting information even in their customer-only support database.
- In my opinion this requires explanation; they can do (and have done) much better.
Regardless, it’s now been 12 months since 18.104.22.168 was released. At present, it’s only mentioned in a single support article. This offers at least some evidence that adaptive log file sync hasn’t caused widespread panic. For the past year, many of you have been using this feature without even knowing – and there weren’t enough problems to merit even two Oracle Support notes.
I sent a few emails out to friends who also tend to work on large, loaded systems. I wasn’t the first person to stumble across this – James Morle mentioned it in a tweet a few months ago, and (not surprisingly) Riyaj has stumbled across the feature as well. It seems to me that the issues that have been encountered intersect with RAC and/or the switching process itself. But my informal survey was hardly scientific.
Overall, I think Oracle dodged the bullet this time. But they still made an important change without supporting documentation. To my friends inside the big red mothership: Please don’t do that again! We like it better the other way.
Have you ever heard of this new 22.214.171.124 feature? Have you heard of the underscore parameter? I’d love to hear your stories!
[…] https://www.pythian.com/news/36791/adaptive-log-file-sync-oracle-please-dont-do-that-again/ […]
[…] Jeremy Schneider: “Adaptive Log File Sync: Oracle, Please Don’t Do That Again” […]
But I doubt Oracle will stop doing this – remember the “famous” _complex_view_merging? They changed it back and forth a few times without too much announcement and documentation.
They have done it before, are doing it now and will probably do it later.
Thanks for this information !
I have a situation, that after upgrading from 126.96.36.199 to 188.8.131.52 on extended rac cluster (two nodes), the other node is facing sometimes over >100s (!!) log file sync. The other node is working ok. There are also other databases in the same environment and they are working ok. There is no log writer delays at the same time. Nobody finds anything wrong the environment and I have been starting to wonder about this Adaptive log file sync. The other node (which does not face any problems) does not have this pooling on currently. The other one has is on (according to lgwr-file). Have you been facing same kind of problems in rac ? I am thinking of taking it off, but I’m still waiting an answer from Oracle support. This does happen maybe once a day or once a week. Really bad feature, if this can cause this problem.
@Elina – if adaptive log file sync was the cause of your problem then you would see a message in the lgwr trace file indicating that the sync method changed at the exact time when you began seeing >100s log file sync events. you could verify exactly when the log file sync events started by looking in the ASH. in our case the time in the trace file did match exactly with the start time of our hung sessions.
Thank you, this article has helped to solve our problems.
Verified, SR-backed issue of this in two databases upgraded from 184.108.40.206 to 220.127.116.11 apprx two months ago. It can start “suddenly” as in one night during/after a large data load, log file sync jumped from 3ms to 11ms and stayed there, consistently, until we restarted the database days later. Perfect example, high log file sync, low log file parallel write, pwrites() in dtruss lined up perfectly with log file parallel writes, and no change to log file parallel write time when log file sync spiked up to its ‘new normal’. A database restart fixes it, but as this Oracle doc you quoted (and the Oracle engineer gave us – 1462942.1) says, it comes back. The bullet points at the beginning of that doc were a 100% match for our experience. We had the Solaris kernel folks engaged as well, they were convinced it was memory fragmentation, then Oracle finally suggested this after days of round and round about non-matching ‘fixes’. Our tech said it is fixed in 18.104.22.168. The doc says 22.214.171.124.1, we’re on 126.96.36.199.7, Solaris 10.
Thanks for putting this out there. I agree, some warning would have been nice.
Finally hit this issue on 188.8.131.52, lgwr trace file clearly shows:
*** 2014-05-08 08:17:18.377
Log file sync switching to polling
Current scheduling delay is 1 usec
Current approximate redo synch write rate is 158 per sec
Did you guys switch “_use_adaptive_log_file_sync” = FALSE; as described in Doc ID 1462942.1?
Did this orange glob occur at the same time each day or just on Mondays?
we just experienced this with one of our customers… found the article on Oracle’s site before I ran into this… thank you very much for insightful article.. this should be on everyone’s checklist when going to 11gr2 (.04).
hit this issue on my client’s core system. Oracle support say we either download the patch or apply the w/a by setting the parameter into false.
HOWEVER, i didn’t notice the polling switch keywords in my lgwr trace file, any hint on that?
Thanks for this post and for your research. Experiencing intermittent changes in LFS average time (good periods around 2 ms, bad periods 9-12 ms, while LFPW staying constantly low), I had hard time taking care on that database. The LFS was critical since the application is insert intensive, performing some 40 technical transactions per application request, and the response time under SLA… The bad period typically ended with some change in redo log mechanism (redo log resize, instance restart etc.), but after some days it was back. After reading your article and some browsing in the lgwr trace, I have decided to switch the adaptive LFS off. Now it seems ok. Many thanks for sharing your knowledge.
Experiencing in our environment too. 184.108.40.206/Solaris 10 RAC. Back and forth with support and finally they said they don’t have patches for this version and asked us to set the parameter. was set last night (15th feb 2018 10pm). Issue re-occurred at 12:39 AM 16th Feb as shown by lgwr trace.
Problem is jdbc sessions keep piling for a minute(up to 120 concurrent connections) and then drops after the commit is done taking 1 minute by LGWR.
According to Oracle this is a dynamic parameter and we did not restart. Support also did not insist on that.
Waiting for their answer.