Post-Mortem Analysis Tools: Hanganalyze

Posted in: Oracle, Series, Technical Track

Editor’s note: This “post-mortem analysis” series consists of four posts in total. Make sure you also check out:

Now, onto today’s post:

I wrote these posts to help DBAs, specifically in a situation where a database is hanging, you restart it and after restart everything seems OK, with the problem solved. However, in terms of RCA (root cause analysis), you’re at a loss.

As mentioned in previous posts, there are a few tools I’ll be covering in this series, and the third one—our subject for today—is hanganalyze.

Let me refer to the clearest Oracle words I could find:

Hanganalyze tries to work out who is waiting for who by building wait chains, and then depending on the level will request various processes to dump their errorstack.

This is very similar to what we can do manually through v$wait_chains. But it’s quicker and “official,” so let’s use it!

Before I show you how to do it, it’s important to mention that Oracle does not recommend you use “numeric events” without an SR (MOS), according to Note: 75713.1.

So, how can you do this? There are basically two ways:

  1. ALTER SESSION SET EVENTS ‘immediate trace name HANGANALYZE level LL’; OR EVENT=”60 trace name HANGANALYZE level 5″
  2. ORADEBUG hanganalyze LL

I prefer to use ORADEBUG on the database server if possible, if you’re already experiencing some hanging:

sqlplus / as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug hanganalyze LL

For example, connected with SQL*Plus as SYSDBA:

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug unlimit;
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /db/oracle/diag/rdbms/pythiandb/pythiandb/trace/pythiandb_ora_2096.trc

Here are the meanings of the various levels:

Level Description Comment
1 Very minimal output Could be useful
2 Minimal output Useful in some cases
3 Dump only processes thought to be in a hang Most common level
4 Dump leaf nodes in wait chains Do you really need this info?
5 Dump all processes involved in wait chains This can be a lot!
6 Dump errorstacks of processes involved in wait chains This can be high overhead
10 Dump all processes Not a good idea

Be mindful! Using higher levels will cause many processes to be asked to dump their stack. This can be very expensive.

In summary, remember the Note: 75713.1.

What about if you have an RAC?

oradebug setmypid
oradebug unlimit
oradebug setinst all 
oradebug -g def hanganalyze LL

Or

oradebug setmypid
oradebug unlimit 
oradebug -g all hanganalyze LL

What does hanganalize look like? Here’s an Oracle example of the output:

 ==============
 HANG ANALYSIS:
 ==============
 Open chains found:
>>   This process (below) is running
 Chain 1 :  :
     
>>   Below is a wait chain. Sid 16 waits for Sid 17
 Chain 2 :  :
     
  -- 
 Other chains found:
 Chain 3 :  :
     
 Extra information that will be dumped at higher levels:
>> This just shows which nodes would be dumped at each level
 [level  4] :   2 node dumps -- [LEAF] [LEAF_NW] [IGN_DMP]
 [level  5] :   2 node dumps -- [NLEAF]
 [level 10] :  10 node dumps -- [IGN]
 
 State of nodes
>> All nodes are listed below. The "state" column shows the state
>> that the session is in
 ([nodenum]/sid/sess_srno/session/state/start/finish/[adjlist]/predecessor):
>> The first nodes are IGN (ignore)
 [0]/1/1/0x826f94c0/IGN/1/2//none
 [1]/2/1/0x826f9d2c/IGN/3/4//none
 [2]/3/1/0x826fa598/IGN/5/6//none
 [3]/4/1/0x826fae04/IGN/7/8//none
 [4]/5/1/0x826fb670/IGN/9/10//none
 [5]/6/1/0x826fbedc/IGN/11/12//none
 [6]/7/1049/0x826fc748/IGN/13/14//none
 [7]/8/1049/0x826fcfb4/IGN/15/16//none
 [8]/9/1049/0x826fd820/IGN/17/18//none
 [9]/10/1049/0x826fe08c/IGN/19/20//none
>> Below are LEAF nodes in various states
 [12]/13/158/0x826ff9d0/LEAF_NW/21/22//none
 [15]/16/416/0x82701314/NLEAF/23/26/[16]/none
 [16]/17/941/0x82701b80/LEAF/24/25//15
 [17]/18/344/0x827023ec/NLEAF/27/28/[16]/none
>> You are told which processes are being dumped
>> They will dump errorstacks to their own trace files
 Dumping System_State and Fixed_SGA in process with ospid 18668
 Dumping Process information for process with ospid 18656
 Dumping Process information for process with ospid 18658
 ...
 ================================
 PROCESS DUMP FROM HANG ANALYZER:
 ================================
>> This process dumps its errorstack and processstate. 
>> See  for details of this informaiton
 ----- Call Stack Trace -----
 calling              call     entry             
  ...
 ======================================
 END OF PROCESS DUMP FROM HANG ANALYZER
 ======================================
 ====================
 END OF HANG ANALYSIS
 ====================

And what about the node states?

State Meaning
IGN Ignore
LEAF A waiting leaf node
LEAF_NW A running (using CPU?) leaf node
NLEAF An element in a chain but not at the end (not a leaf)

Interesting, right?

There’s an observation in MOS about the “dump” word, let me reproduce it:

“Note that in 11g+ the “ORADEBUG HANGANALYZE NN” form will also try to include SHORT_STACK dumps in the hanganalyze chains for level 3 and higher. Short stacks will NOT be included in event triggered HANGANALYZE (like from ALTER SESSION) nor from “ORADEBUG DUMP HANGANALYZE nn”, only from ORADEBUG HANGANALYZE nn (no DUMP keyword).”

Repeating what was already mentioned in previous posts of this series, if you can’t create a connection to the database with SQL*Plus (even as SYSDBA), because it’s a hang situation, you can use a preliminary connection.

A further post is coming about this. But if you need help ASAP, I’d recommend you read:

  • How to Collect Systemstate Dumps When you Cannot Connect to Oracle (Doc ID 121779.1)
  • Important Customer information about using Numeric Events (Doc ID 75713.1)
Additional References:
  • Troubleshooting Database Hang Issues (Doc ID 1378583.1)
  • How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)
  • Troubleshooting Database Contention With V$Wait_Chains (Doc ID 1428210.1)
  • EVENT: HANGANALYZE – Reference Note (Doc ID 130874.1)
  • Important Customer information about using Numeric Events (Doc ID 75713.1)

I hope this was helpful! If you have questions or thoughts, please leave them in the comments.

You can find other posts in this series here:

email

Interested in working with Matheus? Schedule a tech call.

About the Author

Lead Database Consultant
Well known in the Oracle community in Latin America and Europe where he participates regularly in technology events, Matheus is actually the youngest Oracle ACE Director in the world. Lead Database Consultant at Pythian, Matheus is a Computer Scientist by PUCRS and has been working as an Oracle DBA for the last 10 years.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *