Are you ready for the Leap Second?

Posted in: Technical Track

If you’re not aware of what the leap second is look into it. The fact is, this year the last minute of June 30th will be one second longer and “June 30, 2015 23:59:60” will be a valid and correct time. There are a few issues that could be caused by the leap second, so I’ve reviewed a number of MOS notes and this blog post is the summary of the findings.

Update (June 4th, 2015): I’ve put together another blog post about handling the leap second on Linux here.

There are 2 potential issues, which are described below.

1. NTPD’s leap second update causes a server hang or excessive CPU usage

Any Linux distributions using kernel versions from 2.4 though and including 2.6.39 may be affected (including both UEK and RedHat compatible kernels). This range is very wide and includes any RHEL and OEL releases except version 7 unless the kernel versions are kept up to date on lower versions.

Problems may be observed even a day before the leap second happens, so this year it could cause the symptoms any time on June 30. This is because the NTP server lets the host know about the upcoming leap second up to a day ahead of time, and the update from the NTP triggers the issues.

There are 2 possible symptoms:

  1. Servers will become unresponsive and the following can be seen in system logs, console, netconsole or vmcore dump analysis outputs:
    INFO: task kjournald:1119 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kjournald     D ffff880028087f00     0  1119      2 0x00000000
    ffff8807ac15dc40 0000000000000246 ffffffff8100e6a1 ffffffffb053069f
    ffff8807ac22e140 ffff8807ada96080 ffff8807ac22e510 ffff880028073000
    ffff8807ac15dcd0 ffff88002802ea60 ffff8807ac15dc20 ffff8807ac22e140
  2. Any Java applications suddenly starts to use 100% CPU (leap second insertion causes futex to repeatedly timeout).
    $top - 09:38:24 up 354 days,  5:48,  4 users,  load average: 6.49, 6.34, 6.44
    Tasks: 296 total,   4 running, 292 sleeping,   0 stopped,   0 zombie
    Cpu(s): 97.2%us,  1.8%sy,  0.0%ni,  0.7%id,  0.1%wa,  0.1%hi,  0.2%si,  0.0%st
    Mem:     15991M total,    15937M used,       53M free,      107M buffers
    Swap:     8110M total,       72M used,     8038M free,    13614M cached
    PID USER      PR    NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    22564 oracle    16   0 1400m 421m 109m S  353  2.6   2225:11 java
    7294 oracle     17   0 3096m 108m 104m S   22  0.7   0:02.61 oracle
    

And the only workaround mentioned in the notes is to run these commands as root after the problem has occurred (obviously it would be for issue 2) only, as the issue 1) would require a reboot)

# /etc/init.d/ntpd stop
#  date -s "`date`"    (reset the system clock)
# /etc/init.d/ntpd start

I think, as the problem is triggered by the update coming from NTP on June 30, it should also be possible to stop the NTPD service on June 29th, and re-enable it on July 1st instead. This would allow it to bypass the problem conditions.
Just because any Java application can be effected we need to think about where Java is used. And for Oracle DBAs the typical ones to worry about would be all enterprise manager agents as well as any fusion Middleware products. So if you’re using Grid control or Cloud control to monitor your Oracle infrastructure it’s very likely most of your servers are potentially under risk if the kernels are not up to date.

2. Inserts to DATE and TIMESTAMP columns fail with “ORA-01852: seconds must be between 0 and 59”

Any OS could be affected. Based on MOS note “Insert leap seconds into a timestamp column fails with ORA-01852 (Doc ID 1553906.1)”, any inserts of time values having “60” seconds into DATE or TIMESTAMP columns will result in ORA-01852.
This can’t be reliably mitigated by stopping the NTPD as the up to date TZ information on the server may already contain the information about the extra second. The note also provides a “very efficient workaround”: *the leap second record can be stored in a varchar2 datatype instead.*.  You might be thinking, “What? Are you really suggesting me that?” According to MOS note 1453523.1 it appears that the time representation during the leap second is something that could differ depending on the OS/kernel/ntpd versions. For example, it could show “23:59:60” or it could should show “23:59:59” for 2 consecutive seconds, which would allow avoiding the ORA-01852. Be sure to check it with your OS admins and make sure that the clock never shows “23:59:60” to avoid this issue completely.

Consider your infrastructure

By no means are the issues described above an exhaustive list. There’s too much information to cover everything, but based on what I’m reading the issues caused by leap second can be quite severe. Please consider your infrastructure and look for information about issues and fixes to address the upcoming leap second. Search MOS for the products you use and add the “leap second” keyword too, If you’re using software or OS from another vendor, check their support notes regarding leap seconds. Here are additional MOS notes for reading if you’re on some of Oracle’s engineered systems, but again, you’ll find more information if you search:

  • Leap Second Time Adjustment (e.g. on June 30, 2015 at 23:59:59 UTC) and Its Impact on Exadata Database Machine (Doc ID 1986986.1)
  • Exalogic: Affected EECS Releases and Patch Availability for Leap Second (Doc ID 2008413.1)
  • Leap Second on Oracle SuperCluster (Doc ID 1991954.1)
  • Leap Second Handling in Solaris – NTPv3 and NTPv4 (Doc ID 1019692.1)

References

  • Oracle support note for Leap Second Hang problem that may result into 100% CPU utilization in Linux environment
  • How Leap Second Affects The OS Clock on Linux and Oracle VM (Doc ID 1453523.1)
  • Leap Second Hang – CPU Can Be Seen at 100% (Doc ID 1472421.1)
  • What Impact Will the Upcoming Leap Second Have on Java (Doc ID 1987418.1)
  • Insert leap seconds into a timestamp column fails with ORA-01852 (Doc ID 1553906.1)
  • Leap seconds (extra second in a year) and impact on the Oracle database. (Doc ID 730795.1)

Discover more about Pythian’s expertise in Oracle.

email

Interested in working with Maris? Schedule a tech call.

About the Author

Maris Elsins is an experienced Oracle Applications DBA currently working as Lead Database Consultant at The Pythian Group. His main areas of expertise are troubleshooting and performance tuning of Oracle Database and e-Business Suite systems. He is a blogger and a frequent speaker at Oracle related conferences such as UKOUG, Collaborate, Oracle OpenWorld, HotSos, and others. Maris is an Oracle ACE, an Oracle Certified Master, and a co-author of “Practical Oracle Database Appliance” (Apress, 2014). He's also a member of the board at Latvian Oracle User Group.

4 Comments. Leave new

Hey Maris, as far as I know Windows will do the “prolonging” one second to two seconds to catch up to the leap second. The only real impact I see this would have on SQL Server (and I expect the same in Oracle on Windows) is if your db schema has a constraint enforcing uniqueness of a datetime column.

Reply
Uwe M. Küchler
June 10, 2015 6:01 am

Warner,
the constraint you describe is a bug waiting to happen. Unless you never, ever insert two records within the same second.

Maris,
thank you for sharing your research on handling this issue!

Cheers,
Uwe

Reply

“If we sync with NTP server, what can be the time stamp during the leap second insertion ?

23:59:59
23:59:59

OR

23:59:59
23:59:60”

Reply
Maris Elsins
June 30, 2015 1:34 am

If you’re using ntp without slew mode, it will be:
23:59:59
23:59:59

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *