Jserv JVM Issues with Many SSWA Users

Posted in: Technical Track

If your configuration matches the following setup, then this blog could be helpful to you.

OS: Redhat Enterprise Linux 3 or 4
JDK: Sun Java 1.3 or 1.4
Apps: 11.5.9 or 11.5.10
Users: many Oracle Self Service Web Applications Users e.g., iProc, iRec, Timecard, and HR self-service

With this setup, you might have already faced issues like the Apps login page not responding, or browsers timing out in loading SSWA pages. You might have raised numerous long-running TARs with Oracle support on this and ended up uploading lot of Apache and Jserv debug logs, and you always end up recycling or bouncing the Apache service to fix the issue. Don’t worry—you are not alone here.

With production outages like this, we don’t get lot of time to investigate the processes that are responsible for the issue. Users will be after the Support team for resolution and we often end up bouncing Apache before we could nail down the actual process causing the issue.

After good number of occurrences of this issue, I discovered that the Java JVM processes that power the oacore group in Jserv were causing the issue. We didn’t see any errors in the Jserv logs or Apache logs. The only error we could see in the strace output is that these Java processes kept hanging on the futex system call.

$ cd $IAS_ORACLE_HOME/Apache/Jserv/logs/jvm
$ /sbin/fuser  OACoreGroup.*.stderr
OACoreGroup.0.stderr: 18912
$ strace -fp 18912
Process 18912 attached - interrupt to quit
futex(0x80881fc, FUTEX_WAIT, 2, NULL
Process 18912 detached

On searching into this, I found couple of blogs and forum posts that talked about this issue: Java 1.5 on Linux, and FUTEX_WAIT hangs on Ubuntu forums. The fix was to set the LD_ASSUME_KERNEL parameter. But this parameter is pretty familiar to most every Oracle Application DBA on earth, and we had already set this up in the applmgr user’s .profile or .bash_profile as part of Apps Installation. Why then was this fix not working for me?

After doing some more research, I nailed down the root cause. Apache Jserv in the Oracle Apps Environment uses java.sh under the apache bin directory to start the JVMs for oacore and other groups in the jserv.conf file. This java.sh is not sourcing any of the environment variable files like .profile, adovars.env or APPSORA.env, so the JVMs started with java.sh don’t see this LD_ASSUME_KERNEL variable. The fix is to add the following lines to the java.sh file before the line exec $JSERVJAVA $JAVA_ADDITIONAL_ARGS $ARGV 1>> $STDOUTLOG 2>> $STDERRLOG:

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL

How do you confirm that the issue is fixed? If you do a fuser on the OAcore stdout log file, you will that, unlike earlier, there is more than one process ID. This confirms that the issue is fixed.

$ /sbin/fuser OACore*stdout
OACoreGroup.0.stdout: 12827 13384 13385 13386 13387 13401 
13402 13403 13422 13425 13458 13473 13474 13477 13478 13479 13634 13667

What is the logic behind this LD_ASSUME_KERNEL? This IBM article on Linux Threading models explains this very clearly.

When this variable is not set, Linux (RHEL 3 and above) by default uses NPTL ( Native POSIX Thread Library) files when creating threads. Though NPTL is more efficient than Normal Linux Threads, it is, unfortunately, not certifed with JDK 1.3.1.

Google search results show that other recent Java version also have issues with NPTL libraries. Only problem with Linux threads is that they create one process at the OS level for each thread. This is the reason you see more process IDs in the output fuser command output after setting LD_ASSUME_KERNEL. This Sun article has a good explanation for seeing multiple process IDs for Java under Linux.

The strange thing is, I could not find any info related to this on metalink. I’m not sure whether Oracle is going certify this or not, but I am tired of opening TARs. The good thing is, we stopped seeing login page issues after setting this parameter. This fix worked for two of our customers who have many SSWA users.

If this fix works for you and want make these changes permanent, you need to follow note 270519.1 - Customizing an AutoConfig Environment, to customize afjava.sh under $FND_TOP/admin/template.

I hope this blog post helps you! Adiós!

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Vasu Balla’s colleagues call him “Eagle Eye” for a reason – his diverse technical background enables him to view his clients’ systems from a 360-degree angle, giving him a higher level of understanding. Vasu is well known for being approachable, and he truly enjoys helping people. Even former colleagues reach out to Vasu when they are really stuck on an issue. When he isn’t working, Vasu can be found in the kitchen trying new recipes.

3 Comments. Leave new

Vasu,

Great article regarding Jserv Jvm Issue.I would like see one article on memory leak.

Regards,
Phani.K

Reply
Vinodh Kolluri
May 15, 2009 4:55 pm

Very Good analysis Vasu, Thank You

Vinodh

Reply

HI, i found your artical interesting , but we have a smilar and more complex issue of slow performance, No page found error, HTTRP Server Error, or Service Temp Not avaiable error , (404,500 errors)

We are running 11.5.10 istore,isupport,isupplier on solaris 5.9

The issue is happening intermittently without any pattern , some times it last few days , after bouncing the server sometimes it works many time it does not.

We would like to see if you can help us to analysize the issue, we are running reverse proxy and DMZ envirnorment for secure access from internet.

Your reply would really help us.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *