Beware the /var/tmp/.oracle Hidden Directory!

Posted in: Technical Track

A few months ago, we had a test instance complaining that it couldn’t write to ASM. This was an 11.1.0.7 single (non-RAC) instance on Oracle Enterprise Linux 5, using ASM for the storage. We first saw these errors in the alert log:

ORA-15032: not all alterations performed
ORA-29702: error occurred in Cluster Group Service operation
ORA-29702: error occurred in Cluster Group Service operation
ERROR: error ORA-15032 caught in ASM I/O path

Uh-oh, that doesn’t look good. So I log into the ASM instance and try to see if the disks are OK:

SQL> select path, mount_status from v$asm_disk;
select path, mount_status from v$asm_disk
                               *
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-29702: error occurred in Cluster Group Service operation
ORA-29702: error occurred in Cluster Group Service operation


I can’t even query that. As Ted would say, “strange things are afoot at the Circle K.” To be safe, I thought I’d try to shutdown the DBMS instance, which also failed without having to abort:

SQL> shutdown immediate
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '+FOOTEST_DATA/footest1_footest_db/control01.ctl'
ORA-15081: failed to submit an I/O operation to a disk
SQL> shutdown abort
ORACLE instance shut down.

We decided to restart the whole DBMS/ASM/CSS stack, but CSS wouldn’t stop either:

-bash-3.2# /etc/init.d/init.cssd stop
Stopping Cluster Synchronization Services.
Unable to communicate with the Cluster Synchronization Services daemon.
Shutdown has begun. The daemons should exit soon.

We ended up booting the server altogether, after which everything came up nicely. We filed an SR with Oracle Support, who directed us to Note 391790.1 (Unable To Connect To Cluster Manager Ora-29701). This note lists the cause, quite simply, as:

The hidden directory ‘/var/tmp/.oracle’ was removed while instances & the CRS stack were up and running. Typically this directory contains a number of “special” socket files that are used by local clients to connect via the IPC protocol (sqlnet) to various Oracle processes including the TNS listener, the CSS, CRS & EVM daemons or even database or ASM instances. These files are created when the “listening” process starts.

The solution is to restart CRS or reboot the machine. Our /var/tmp/.oracle directory looked like this:

[[email protected] ~]$ ls -la /var/tmp/.oracle
total 12
drwxrwxrwt 2 root   root 4096 May  8 15:03 .
drwxrwxrwt 3 root   root 4096 May 10 07:02 ..
srwxrwxrwx 1 oracle dba     0 May  8 15:03 s#18854.1
srwxrwxrwx 1 oracle dba     0 May  8 15:03 s#18854.2
srwxrwxrwx 1 oracle dba     0 May  8 15:03 sEXTPROC
srwxrwxrwx 1 oracle dba     0 May  8 14:44 sfootestDBG_CSSD
srwxrwxrwx 1 oracle dba     0 May  8 14:44 sOCSSD_LL_footest_
srwxrwxrwx 1 oracle dba     0 May  8 14:44 sOCSSD_LL_footest_localhost
srwxrwxrwx 1 oracle dba     0 May  8 14:44 sOracle_CSS_LclLstnr_localhost_0
srwxrwxrwx 1 oracle dba     0 May  8 15:03 sPNPKEY

I did some sandbox testing, and found that only the Oracle and root OS users could delete that directory, and was able to duplicate the error every time when doing so.

However, I really was dumbstruck that Oracle would have so critical a directory in /var/tmp! I politely note this to Oracle Support, who justified this location with a few solid reasons:

  1. It has always been in this location (and still is in 11gR2).
  2. /var/tmp/.oracle is a hidden directory, so it probably won’t be noticed by any miscreants looking to cause trouble.

OK, I was being sarcastic, these reasons are awful. The only safeguard they gave was “make sure no one deletes it.” We scoured the server for cron jobs that would automatically clean out /var/tmp but didn’t find any, nor any bash history suggesting malice. The only thing that we could think of was that this test server was in a VM (Citrix Xen), although one would hope that it doesn’t happen at all, regardless. We certainly could not find an explanation, but now we’re aware to not delete /var/tmp/.oracle while the instances are running (even though we never did before).

Surachart Opun has also blogged on this topic.

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Oracle database administrator for The Pythian Group, headquartered in Ottawa, Ontario, Canada. I am located in Manitowoc, Wisconsin, USA. OCP 10gR2 DBA

13 Comments. Leave new

Thanks for sharing, to secure it.. mmm possible to set stickybit on this hidden directory..

Reply

Thanks for sharing this info. Very helpful Info…

Reply

We hit this issue with exactly same events.Thanks

Reply
ERROR: unrecoverable error ORA-29701 raised in ASM I/O path when using GoldenGate replicat process : Somewhere in between
February 19, 2011 5:19 pm

[…] /var/tmp/.oracle hidden directory Beware the /var/tmp/.oracle Hidden Directory! […]

Reply

I hit a simillar problem on one of my nodes (on a 2-node RAC setup). Ours, stops working, and solution has been to delete everything under /var/tmp/.oracle and reboot server. Indicative of the the problem, however we don’t have our fingers on it yet.

Reply
Allen Shatzer
January 10, 2012 1:45 pm

We ran into this as well. We found a script /etc/cron.daily/tmpwatch that was removing files over 30 days old under /var/tmp. We had to add a “-x /var/tmp/.oracle” to the command to prevent this directory from being cleaned up by this cron job. Oracle is ASSUMING that making the directory “hidden” by prefixing it with a . will prevent BAD things from happening. And we all know what happens when we assume! Jeeze! Can we get some pressure from the Oak Table members to get this fixed by NOT using /var/tmp or /tmp for this directory? It’s a problem just waiting to happen.

I assume (oops!) that there isn’t a way to specify a different directory for Oracle to use than /var/tmp/.oracle, correct?

Reply

Allen, thanks for the note. I’ll research tmpwatch and follow-up on this post!

The interesting thing to note was that it was only a problem on a VM. If we ran directly on the server hardware we never saw this.

Reply

Looks like you’re right. I wonder why this would only affect VMs though, or least why it seemed to do so.

/usr/sbin/tmpwatch “$flags” 720 /var/tmp

Reply

This problem varies by sysadmin. Whoever writes the system cleanup scripts. Our Linux boxes never had this problems, because the sysadmin wasn’t cleaning out /var/tmp. Every one of our HPUX boxes had this problem (or the potential). I’d really like Oracle to move these things into dbs or into the temporary directory in the oracle account profile.

Reply

Thanks for the great note. I think Oracle has to solved this issue in later release by creating it own directory.

Reply

To those suggesting this only happens on VMs this isn’t true, we had this on a physical database server running 11.2. A good article!

Reply

I noticed that there is also the /tmp/.oracle directory which has similar files and with similar creation dates. I wonder if these also needs to be regarded as do not delete?

Reply

We ran into an issue where /var ran out of inodes. I had to delete old sock files then have the application restarted.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *