Handling the Leap Second - Linux

Maris Elsins

June 2, 2015

Tags: Dba Lounge, Big Data, Technical Track, Hadoop, Group Blog Posts

Last week I published a blog post titled " Are You Ready For the Leap Second?", and by looking at the blog statistics I could tell that many of you read it, and that's good, because you became aware of the risks that the leap second on June 30th, 2015 introduces. On the other hand, I must admit I didn't provide clear instructions that you could use to avoid all possible scenarios. I've been looking into this for a good while and I think the official RedHat announcements and My Oracle Support notes are confusing. This blog post is my attempt to explain how to avoid the possible issues. Update (June 9th, 2015): Made it clear in the text below that ntp's slewing mode (ntp -x) is mandatory from Oracle Grid Infrastructure and therefore for RAC too. The complexity of solving these problems comes from the fact that there are multiple contributing factors. The behavior of the system will depend on a combination of these factors. In the coming sections I'll try to explain what exactly you should pay attention to and what you should do to avoid problems. The content of this post is fully theoretical and based on the documentation I've read. I have NOT tested it, so it may behave differently. Please, if you notice any nonsense in what I'm writing, let me know by leaving a comment!

1. Collect the data

The following information will be required for you to understand what you're dealing with:

OS version and kernel version: [code light="true"] $ cat /etc/issue Oracle Linux Server release 6.4 Kernel \r on an \m $ uname -r 2.6.39-400.17.1.el6uek.x86_64 [/code]
Is NTP used and which version of NTP is installed: [code light="true"] $ ps -ef | grep ntp oracle 1627 1598 0 02:06 pts/0 00:00:00 grep ntp ntp 7419 1 0 May17 ? 00:00:17 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g $ rpm -qa | grep ntp- ntp-4.2.4p8-3.el6.x86_64 [/code]
Version of tzdata and the configuration of /etc/localtime: [code light="true"] $ rpm -qa | grep tzdata- tzdata-2012j-1.el6.noarc $ file /etc/localtime /etc/localtime: timezone data, version 2, 5 gmt time flags, 5 std time flags, no leap seconds, 235 transition times, 5 abbreviation chars [/code]

2. Check the kernel

Here's a number of bugs that are related to leap second handling on Linux:

System hangs on printing the leap second insertion message - This bug will hang your server at the time when the NTP notifies kernel about the leap second, and that can happen anytime on the day before the leap second (in our case anytime on June 30th, 2015). It's fixed in kernel-2.6.9-89.EL (RHEL4) and kernel-2.6.18-164.el5 (RHEL5).
Systems hang due to leap-second livelock - Because of this bug systems repeatedly crash due to NMI Watchdog detecting a hang. This becomes effective when the leap second is added. The note doesn't exactly specify which versions fix the bug.
Why is there high CPU usage after inserting the leap second? - This bug causes futex-active applications (i.e. java) to start consuming 100% CPU. Based on what's discussed in this email in Linux Kernel Mailing List Archive, it's triggered by a mismatch between timekeeping and hrtimer structures, which the leap second introduces. The document again does not clearly specify which versions fix the problem, however this "Kernal Bug Fix Update" mentions these symptoms to be fixed in 2.6.32-279.5.2.el6.

MOS Note: "How Leap Second Affects the OS Clock on Linux and Oracle VM (Doc ID 1453523.1)" mentions that kernels 2.4 to 2.6.39 are affected, but I'd like to know the exact versions. I've searched a lot, but I haven t found much, so here are the ones that I did find:

"System hangs on printing the leap second insertion message" for RHEL4 is fixed in kernel-2.6.9-89.EL
All mentioned bugs on OL5 UEK are fixed in 2.6.39-200.29.3.el5uek and this is also back-ported to an earlier code-path in 2.6.32-300.32.2.el5uek
All mentioned bugs on OL6 UEK are fixed in 2.6.39-200.29.3.el6uek and also back-ported to 2.6.32-300.32.2.el6uek
Bugs 2 and 3 are mentioned to be fixed in 2.6.32-279

I'm quite sure by reading this you're thinking: "What a mess!". And that's true. I believe, the safest approach is to be on kernel 2.6.39-200.29.3 or higher.

3. NTP is used

You're using NTP if the ntpd process is running. In the outputs displayed above it's running and has the following arguments: ntpd -u ntp:ntp -p /var/run/ntpd.pid -g. The behavior of the system during the leap second depends on which version of NTP you use and what's the environment.

ntp-4.2.2p1-9 or higher (but not ntp-4.2.6p5-19.el7, ntp-4.2.6p5-1.el6 and ntp-4.2.6p5-2.el6_6) configured in slew mode (with option "-x") - The leap second is not added by kernel, but the extra time is added by increasing the length of each second over ~2000 second period based on the differences of the server's time and the time from NTP after the leap second. The clock is never turned backward. This is the configuration you want because:
- Time never goes back, so there will be no impact to the application logic.
- Strange time values like 23:59:60 are not used, so you won't hit any DATE and TIMESTAMP datatype limitation issues.
- As the leap second is not actually added, It should be possible to avoid all 3 kernel bugs that I mentioned by using this configuration. In many cases updating NTP is much simpler than a kernel upgrade, so if you're still on an affected kernel use this option to bypass the bugs.
The drawbacks of this configuration are related to the fact that the leap second is smeared out over a longer period of time:
- This probably is not usable for applications requiring very accurate time.
- This may not be usable for some clusters where all nodes must have exactly the same clocktime, because NTP updates are usually received every 1 to 18 minutes, plus giving the ~2000 seconds of time adjustment in slew mode the clocks could be off for as long as ~50 minutes. Please note, the slewing mode is (ntp -x) is mandatory for Oracle Grid Infrastructure as documented in Oracle® Grid Infrastructure Installation Guides 11g Release 2 and 12c Release 1.
ntp-4.2.2p1-9 or higher configured without slew mode (no "-x" option) - The NTP will notify the kernel about the upcoming leap second some time during June 30th, and the leap second will be added as an extra "23:59:59" second (time goes backward by one second). You will want to be on kernel with all fixes present.
below ntp-4.2.2p1-9 - The NTP will notify the kernel about the upcoming leap second some time during June 30th, and depending on the environment, the leap second will be added as an extra "23:59:59" second (time goes backward by one second), or the time will freeze for one second at midnight.

Extra precaution: if you're running NTP make sure your /etc/localtime does not include leap seconds by running "file /etc/localtime" and confirming it lists message "no leap seconds".

4. NTP is NOT used

If NTP is not used the time is managed locally by the server. The time is most likely off already, so I really do recommend enabling NTP in slew mode as described above, this is the right moment to do so. If you have tzdata-2015a or higher installed, the information about the leap second on June 30th, 2015 is also available locally on the server, but it doesn't mean yet it's going to be added. Also if NTP is not used and the leap second is added locally, it will appear as "23:59:60", which is an unsupported value for DATE and TIMESTAMP columns, so this is the configuration you don't want to use. Here are the different conditions:

You're below tzdata-2015a - the leap second will not be added.
You're on tzdata-2015a or higher and "file /etc/localtime" includes message "X leap seconds", where X is a number - the leap second will be added as "23:59:60" and will cause problems for your DATE/TIMESTAMP datatypes. You don't want this configuration. Disable leap second by copying the appropriate timezone file from /usr/share/zoneinfo over /etc/localtime. It's a dynamic change, no reboots needed. (Timezone files including the leap seconds are located in /usr/share/zoneinfo<strong>/right</strong>)
"file /etc/localtime" includes message "no leap seconds" - the leap second will not be added.

The recommendations

Again I must say this is a theoretical summary on how to avoid leap second issues on Linux, based on what's written above. Make sure you think about it before implementing as you're the one who knows your own systems:

Single node servers, or clusters where time between nodes can differ - Upgrade to ntp-4.2.2p1-9 or higher and configure it in slew mode (option "-x"). This should avoid the kernel bugs too, but due to lack of accurate documentation it's still safer to be on kernel 2.6.39-200.29.3 or higher.
Clusters or applications with very accurate time requirements - NTP with slew mode is not suitable as it's unpredictable when it will start adjusting the time on each server. You want to be on kernel 2.6.39-200.29.3 or higher. NTP should be enabled. Leap second will be added as an extra "23:59:59" second (the time will go backward by one second). Oracle Database/Clusterware should detect time drifting and should deal with it. Check MOS for any bugs related to time drifting for the versions you're running.
I don't care about the time accuracy, I can't update any packages, but need my systems up at any cost - The simplest solution to this is stopping the NTP on June 29th and starting it up on July 1st, so that the server was left unaware of the leap second. Also, you need to make sure the /etc/localtime does not contain the leap second for June 30th, 2015 as explained above. [code light="true"]-- on June 29th (UTC) # /etc/init.d/ntpd stop # date -s "`date`" (reset the system clock) -- on July 1st (UTC) # /etc/init.d/ntpd start[/code]
Very accurate time requirements + time reduction is not allowed - I don't know. I can't see how this can be implemented. Does anyone have any ideas?

Post Scriptum

Initially I couldn't understand why this extra second caused so much trouble. Don't we change the time by a round hour twice a year without any issues? I found the answers during the research, and it's obvious. Servers work in UTC time, which does not have daylight saving time changes. The timezone information is added just for representation purposes later on. UTC Time is continuous and predictable, but the leap second is something which breaks this normal continuity and that's why it is so difficult to handle it. It's also a known fact that Oracle Databases rely heavily on gettimeofday() system calls and these work in UTC too. Discover more about Pythian's Oracle Ace Maris Elsins.

Insight and analysis of technology and business strategy