Troubleshooting failed database startup after GRID Out Of Place (OOP) rollback

Posted in: Technical Track

GRID Out Of Place (OOP) patching completed successfully for 18.6.0.0.0.

GRID_HOME=/u01/18.3.0.0/grid_2
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1

Here is an example of the inventory after patching.

+ /u01/18.3.0.0/grid_2/OPatch/opatch lspatches
29302264;OCW RELEASE UPDATE 18.6.0.0.0 (29302264)
29301643;ACFS RELEASE UPDATE 18.6.0.0.0 (29301643)
29301631;Database Release Update : 18.6.0.0.190416 (29301631)
28547619;TOMCAT RELEASE UPDATE 18.0.0.0.0 (28547619)
28435192;DBWLM RELEASE UPDATE 18.0.0.0.0 (28435192)
27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171
27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)
+ /u01/app/oracle/12.1.0.1/db1/OPatch/opatch lspatches
28731800;Database Bundle Patch : 12.1.0.2.190115 (28731800)
28729213;OCW PATCH SET UPDATE 12.1.0.2.190115 (28729213)

Run cluvfy was successful, too.

[[email protected] ~]$ cluvfy stage -post crsinst -n racnode-dc1-1,racnode-dc1-2 -verbose
Post-check for cluster services setup was successful.
CVU operation performed:      stage -post crsinst
Date:                         Apr 30, 2019 8:17:49 PM
CVU home:                     /u01/18.3.0.0/grid_2/
User:                         oracle
[[email protected] ~]$

GRID OOP Rollback Patching completed successfully for node1.

[[email protected] ~]# crsctl check cluster -all
**************************************************************
racnode-dc1-1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode-dc1-2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[[email protected] ~]#
[[email protected] ~]# echo $GRID_HOME
/u01/18.3.0.0/grid_2
[[email protected] ~]# $GRID_HOME/OPatch/opatchauto rollback -switch-clone -logLevel FINEST
OPatchauto session is initiated at Fri May  3 01:06:47 2019
System initialization log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchautodb/systemconfig2019-05-03_01-06-50AM.log.
Session log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchauto/opatchauto2019-05-03_01-08-00AM.log
The id for this session is R47N
Update nodelist in the inventory for oracle home /u01/18.3.0.0/grid.
Update nodelist in the inventory is completed for oracle home /u01/18.3.0.0/grid.
Bringing down CRS service on home /u01/18.3.0.0/grid
CRS service brought down successfully on home /u01/18.3.0.0/grid
Starting CRS service on home /u01/18.3.0.0/grid
CRS service started successfully on home /u01/18.3.0.0/grid
Confirm that all resources have been started from home /u01/18.3.0.0/grid.
All resources have been started successfully from home /u01/18.3.0.0/grid.
OPatchAuto successful.
--------------------------------Summary--------------------------------
Out of place patching clone home(s) summary
____________________________________________
Host : racnode-dc1-1
Actual Home : /u01/18.3.0.0/grid_2
Version:18.0.0.0.0
Clone Home Path : /u01/18.3.0.0/grid
Following homes are skipped during patching as patches are not applicable:
/u01/app/oracle/12.1.0.1/db1
OPatchauto session completed at Fri May  3 01:14:25 2019
Time taken to complete the session 7 minutes, 38 seconds
[[email protected] ~]# crsctl check cluster -all
**************************************************************
racnode-dc1-1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode-dc1-2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[[email protected] ~]# /media/patch/findhomes.sh
   PID NAME                 ORACLE_HOME
 10486 asm_pmon_+asm1       /u01/18.3.0.0/grid/
 10833 apx_pmon_+apx1       /u01/18.3.0.0/grid/
[[email protected] ~]# cat /etc/oratab
#Backup file is  /u01/app/oracle/12.1.0.1/db1/srvm/admin/oratab.bak.racnode-dc1-1 line added by Agent
#+ASM1:/u01/18.3.0.0/grid:N
hawk1:/u01/app/oracle/12.1.0.1/db1:N
hawk:/u01/app/oracle/12.1.0.1/db1:N             # line added by Agent
[[email protected] ~]#

GRID OOP Rollback Patching completed successfully for node2.

[[email protected] ~]# crsctl check cluster -all
**************************************************************
racnode-dc1-1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode-dc1-2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[[email protected] ~]#
[[email protected] ~]# echo $GRID_HOME
/u01/18.3.0.0/grid_2
[[email protected] ~]# $GRID_HOME/OPatch/opatchauto rollback -switch-clone -logLevel FINEST
OPatchauto session is initiated at Fri May  3 01:21:39 2019
System initialization log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchautodb/systemconfig2019-05-03_01-21-41AM.log.
Session log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchauto/opatchauto2019-05-03_01-22-46AM.log
The id for this session is 9RAT
Update nodelist in the inventory for oracle home /u01/18.3.0.0/grid.
Update nodelist in the inventory is completed for oracle home /u01/18.3.0.0/grid.
Bringing down CRS service on home /u01/18.3.0.0/grid
CRS service brought down successfully on home /u01/18.3.0.0/grid
Starting CRS service on home /u01/18.3.0.0/grid
CRS service started successfully on home /u01/18.3.0.0/grid
Confirm that all resources have been started from home /u01/18.3.0.0/grid.
All resources have been started successfully from home /u01/18.3.0.0/grid.
OPatchAuto successful.
--------------------------------Summary--------------------------------
Out of place patching clone home(s) summary
____________________________________________
Host : racnode-dc1-2
Actual Home : /u01/18.3.0.0/grid_2
Version:18.0.0.0.0
Clone Home Path : /u01/18.3.0.0/grid
Following homes are skipped during patching as patches are not applicable:
/u01/app/oracle/12.1.0.1/db1
OPatchauto session completed at Fri May  3 01:40:51 2019
Time taken to complete the session 19 minutes, 12 seconds
[[email protected] ~]#

GRID OOP Rollback completed successfully for 18.5.0.0.0.

GRID_HOME=/u01/18.3.0.0/grid
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1

Here is an example of inventory after the rollback:

+ /u01/18.3.0.0/grid/OPatch/opatch lspatches
28864607;ACFS RELEASE UPDATE 18.5.0.0.0 (28864607)
28864593;OCW RELEASE UPDATE 18.5.0.0.0 (28864593)
28822489;Database Release Update : 18.5.0.0.190115 (28822489)
28547619;TOMCAT RELEASE UPDATE 18.0.0.0.0 (28547619)
28435192;DBWLM RELEASE UPDATE 18.0.0.0.0 (28435192)
27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171
27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)
+ /u01/app/oracle/12.1.0.1/db1/OPatch/opatch lspatches
28731800;Database Bundle Patch : 12.1.0.2.190115 (28731800)
28729213;OCW PATCH SET UPDATE 12.1.0.2.190115 (28729213)

Validation shows database is OFFLINE,

+ crsctl stat res -w '((TARGET != ONLINE) or (STATE != ONLINE)' -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.GHCHKPT.advm
               OFFLINE OFFLINE      racnode-dc1-1            STABLE
               OFFLINE OFFLINE      racnode-dc1-2            STABLE
ora.crs.ghchkpt.acfs
               OFFLINE OFFLINE      racnode-dc1-1            STABLE
               OFFLINE OFFLINE      racnode-dc1-2            STABLE
ora.helper
               OFFLINE OFFLINE      racnode-dc1-1            STABLE
               OFFLINE OFFLINE      racnode-dc1-2            IDLE,STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.hawk.db
      1        ONLINE  OFFLINE                               Instance Shutdown,STABLE
      2        ONLINE  OFFLINE                               Instance Shutdown,STABLE

Start database FAILED.

[[email protected] ~]$ . /media/patch/hawk.env
The Oracle base has been set to /u01/app/oracle
ORACLE_UNQNAME=hawk
ORACLE_SID=hawk2
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1
Oracle Instance not alive for sid "hawk2"
[[email protected] ~]$ srvctl status database -d $ORACLE_UNQNAME -v
Instance hawk1 is not running on node racnode-dc1-1
Instance hawk2 is not running on node racnode-dc1-2
[[email protected] ~]$ srvctl start database -d $ORACLE_UNQNAME
PRCR-1079 : Failed to start resource ora.hawk.db
CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-1/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-1' failed
CRS-2632: There are no more servers to try to place resource 'ora.hawk.db' on that would satisfy its placement policy
CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-2/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-2' failed
[[email protected] ~]$
[[email protected] ~]$ . /media/patch/hawk.env
The Oracle base has been set to /u01/app/oracle
ORACLE_UNQNAME=hawk
ORACLE_SID=hawk1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1
Oracle Instance not alive for sid "hawk1"
[[email protected] ~]$ srvctl start database -d hawk
PRCR-1079 : Failed to start resource ora.hawk.db
CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-2/crs/trace/crsd_oraagent_oracle.trc".
CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-1/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-2' failed
CRS-2632: There are no more servers to try to place resource 'ora.hawk.db' on that would satisfy its placement policy
CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-1' failed
[[email protected] ~]$

Incorrect permissions for the Oracle library was the cause. Changing permissions for $GRID_HOME/bin/oracle (chmod 6751 $GRID_HOME/bin/oracle), and stopping and starting CRS resolved the failure.

[[email protected] dbs]$ ls -lhrt $ORACLE_HOME/bin/oracle
-rwsr-s--x 1 oracle dba 314M Apr 20 16:06 /u01/app/oracle/12.1.0.1/db1/bin/oracle
[[email protected] dbs]$ ls -lhrt /u01/18.3.0.0/grid/bin/oracle
-rwxr-x--x 1 oracle oinstall 396M Apr 20 19:21 /u01/18.3.0.0/grid/bin/oracle
[[email protected] dbs]$ cd /u01/18.3.0.0/grid/bin/
[[email protected] bin]$ chmod 6751 oracle
[[email protected] bin]$ ls -lhrt /u01/18.3.0.0/grid/bin/oracle
-rwsr-s--x 1 oracle oinstall 396M Apr 20 19:21 /u01/18.3.0.0/grid/bin/oracle
[[email protected] ~]# . /media/patch/gi.env
The Oracle base has been set to /u01/app/oracle
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
GRID_HOME=/u01/18.3.0.0/grid
ORACLE_HOME=/u01/18.3.0.0/grid
Oracle Instance alive for sid "+ASM1"
[[email protected] ~]# crsctl stop crs
====================================================================================================
[[email protected] ~]# . /media/patch/gi.env
The Oracle base has been set to /u01/app/oracle
ORACLE_SID=+ASM2
ORACLE_BASE=/u01/app/oracle
GRID_HOME=/u01/18.3.0.0/grid
ORACLE_HOME=/u01/18.3.0.0/grid
Oracle Instance alive for sid "+ASM2"
[[email protected] ~]# ls -lhrt $GRID_HOME/bin/oracle
-rwxr-x--x 1 oracle oinstall 396M Apr 21 01:44 /u01/18.3.0.0/grid/bin/oracle
[[email protected] ~]# chmod 6751 $GRID_HOME/bin/oracle
[[email protected] ~]# ls -lhrt $GRID_HOME/bin/oracle
-rwsr-s--x 1 oracle oinstall 396M Apr 21 01:44 /u01/18.3.0.0/grid/bin/oracle
[[email protected] ~]# crsctl stop crs
====================================================================================================
[[email protected] ~]# crsctl start crs
[[email protected] ~]# crsctl start crs

Reference: RAC Database Can’t Start: ORA-01565, ORA-17503: ksfdopn:10 Failed to open file +DATA/BPBL/spfileBPBL.ora (Doc ID 2316088.1)

In conclusion, while the GRID rollback may have completed successfully, the database was down and resulted in an outage. It may be prudent to check permissions for the Oracle library before a patching or rollback to avoid any disasters.


email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *