Troubleshooting Oracle’s Auto Service Request

Posted in: Technical Track

I’ve spent the better part of the day troubleshooting an issue with Oracle’s Auto Service Request (ASR) and wanted to share my results in case if saves someone else some effort.

The ASR manager is designed to be a side-wide aggregation point for ASR alerts, receiving SNMP traps and forwarding over https to transport.oracle.com. But if you’re using port 162 for SNMP traps on a Linux system, you may find that such traps are never sent to Oracle.

I was testing this by creating test traps through IPMI:

# ipmitool sunoem cli "set /SP/alertmgmt/rules/1 testrule=true"
 Connected. Use ^D to exit.
 -> set /SP/alertmgmt/rules/1 testrule=true
 Set 'testrule' to 'true'
 -> Session closed
Disconnected

This command should be passed onto Oracle and result in an e-mail noting a test service request had been created. But in my case, nothing came up.

/var/log/messages however did show a test trap generated:

Dec 19 16:12:23 asrmgr01 snmptrapd[14527]: 2013-12-19 16:12:23 testdb01.example.com [UDP: [43.218.200.118]:32957]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (51161892) 5 days, 22:06:58.92  SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.42.2.175.103.2.0.63    SNMPv2-SMI::enterprises.42.2.175.103.2.1.1.0 = STRING: "Oracle Database Appliance X3-2 1234ABC12B"      SNMPv2-SMI::enterprises.42.2.175.103.2.1.14.0 = STRING: "1234ABC12B"    SNMPv2-SMI::enterprises.42.2.175.103.2.1.15.0 = STRING: "SUN FIRE X4170 M3"     SNMPv2-SMI::enterprises.42.2.175.103.2.1.20.0 = STRING: "This is a test trap"

But none of the ASR manager logs in /var/opt/SUNWsasm/log showed any indication of activity.

After a lot of digging, including copious logfile reading, straces, and tcpdumps, I found that the ASR manager process is not even listening for SNMP traps:

[[email protected] log]# lsof -p `pidof java` | grep UDP
java    31318 root   93u  IPv6           23334618      0t0      UDP *:41178

Searching for who’s holding the SNMP port 162, “snmptrap”

[[email protected] log]# lsof | grep UDP | grep ":snmptrap"
snmptrapd 28163 root    8u  IPv4           23357406      0t0      UDP *:snmptrap

It’s another complete process, snmptrapd.

[[email protected] log]# ps -ef | grep snmptrapd | grep -v grep
root      4986     1  0 Dec15 ?        00:00:04 /usr/sbin/snmptrapd -Lsd -p /var/run/snmptrapd.pid

Decoding the arguments from the command line, -Lsd sends “L”og messages to “s”yslog at “d”aemon priority. And it was these messages I had seen in /var/log/messages.

And a little more diffing in the ASR manager lgofile /var/opt/SUNWsasm/log/sasm.log does show a telling message:

2013-12-19_16:00:51  command executed:  sasm start-instance
Starting Oracle Automated Service Manager...
Cannot bind to port : 162

Unfortunately sasm continued to start, not reporting anything in stdout. It would have been much easier if it would have simply exited on a fatal error like this.

Anyways, the fix was quite simple: disabling snmptrapd on the ASR manager host:

chkconfig snmptrapd off
service snmptrapd stop
service sasm restart

And then my test traps start succeeding in generating e-mail alerts.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Marc is a passionate and creative problem solver, drawing on deep understanding of the full enterprise application stack to identify the root cause of problems and to deploy sustainable solutions. Marc has a strong background in performance tuning and high availability, developing many of the tools and processes used to monitor and manage critical production databases at Pythian. He is proud to be the very first DataStax Platinum Certified Administrator for Apache Cassandra.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *