If your configuration matches the following setup, then this blog could be helpful to you.
OS: Redhat Enterprise Linux 3 or 4
JDK: Sun Java 1.3 or 1.4
Apps: 11.5.9 or 11.5.10
Users: many Oracle Self Service Web Applications Users e.g., iProc, iRec, Timecard, and HR self-service
With this setup, you might have already faced issues like the Apps login page not responding, or browsers timing out in loading SSWA pages. You might have raised numerous long-running TARs with Oracle support on this and ended up uploading lot of Apache and Jserv debug logs, and you always end up recycling or bouncing the Apache service to fix the issue. Don’t worry—you are not alone here.
With production outages like this, we don’t get lot of time to investigate the processes that are responsible for the issue. Users will be after the Support team for resolution and we often end up bouncing Apache before we could nail down the actual process causing the issue.
After good number of occurrences of this issue, I discovered that the Java JVM processes that power the oacore group in Jserv were causing the issue. We didn’t see any errors in the Jserv logs or Apache logs. The only error we could see in the strace output is that these Java processes kept hanging on the futex
system call.
$ cd $IAS_ORACLE_HOME/Apache/Jserv/logs/jvm $ /sbin/fuser OACoreGroup.*.stderr OACoreGroup.0.stderr: 18912 $ strace -fp 18912 Process 18912 attached - interrupt to quit futex(0x80881fc, FUTEX_WAIT, 2, NULL Process 18912 detached
On searching into this, I found couple of blogs and forum posts that talked about this issue: Java 1.5 on Linux, and FUTEX_WAIT hangs on Ubuntu forums. The fix was to set the LD_ASSUME_KERNEL
parameter. But this parameter is pretty familiar to most every Oracle Application DBA on earth, and we had already set this up in the applmgr user’s .profile
or .bash_profile
as part of Apps Installation. Why then was this fix not working for me?
After doing some more research, I nailed down the root cause. Apache Jserv in the Oracle Apps Environment uses java.sh
under the apache bin directory to start the JVMs for oacore and other groups in the jserv.conf
file. This java.sh
is not sourcing any of the environment variable files like .profile
, adovars.env
or APPSORA.env
, so the JVMs started with java.sh
don’t see this LD_ASSUME_KERNEL
variable. The fix is to add the following lines to the java.sh
file before the line exec $JSERVJAVA $JAVA_ADDITIONAL_ARGS $ARGV 1>> $STDOUTLOG 2>> $STDERRLOG
:
LD_ASSUME_KERNEL=2.4.19 export LD_ASSUME_KERNEL
How do you confirm that the issue is fixed? If you do a fuser
on the OAcore stdout log file, you will that, unlike earlier, there is more than one process ID. This confirms that the issue is fixed.
$ /sbin/fuser OACore*stdout OACoreGroup.0.stdout: 12827 13384 13385 13386 13387 13401 13402 13403 13422 13425 13458 13473 13474 13477 13478 13479 13634 13667
What is the logic behind this LD_ASSUME_KERNEL
? This IBM article on Linux Threading models explains this very clearly.
When this variable is not set, Linux (RHEL 3 and above) by default uses NPTL ( Native POSIX Thread Library)
files when creating threads. Though NPTL is more efficient than Normal Linux Threads, it is, unfortunately, not certifed with JDK 1.3.1.
Google search results show that other recent Java version also have issues with NPTL libraries. Only problem with Linux threads is that they create one process at the OS level for each thread. This is the reason you see more process IDs in the output fuser
command output after setting LD_ASSUME_KERNEL
. This Sun article has a good explanation for seeing multiple process IDs for Java under Linux.
The strange thing is, I could not find any info related to this on metalink. I’m not sure whether Oracle is going certify this or not, but I am tired of opening TARs. The good thing is, we stopped seeing login page issues after setting this parameter. This fix worked for two of our customers who have many SSWA users.
If this fix works for you and want make these changes permanent, you need to follow note 270519.1 - Customizing an AutoConfig Environment
, to customize afjava.sh
under $FND_TOP/admin/template
.
I hope this blog post helps you! Adiós!
3 Comments. Leave new
Vasu,
Great article regarding Jserv Jvm Issue.I would like see one article on memory leak.
Regards,
Phani.K
Very Good analysis Vasu, Thank You
Vinodh
HI, i found your artical interesting , but we have a smilar and more complex issue of slow performance, No page found error, HTTRP Server Error, or Service Temp Not avaiable error , (404,500 errors)
We are running 11.5.10 istore,isupport,isupplier on solaris 5.9
The issue is happening intermittently without any pattern , some times it last few days , after bouncing the server sometimes it works many time it does not.
We would like to see if you can help us to analysize the issue, we are running reverse proxy and DMZ envirnorment for secure access from internet.
Your reply would really help us.