Troubleshooting failed database startup after GRID Out Of Place (OOP) rollback

Posted in: Technical Track

GRID Out Of Place (OOP) patching completed successfully for 18.6.0.0.0.

GRID_HOME=/u01/18.3.0.0/grid_2
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1

Here is an example of the inventory after patching.

+ /u01/18.3.0.0/grid_2/OPatch/opatch lspatches
29302264;OCW RELEASE UPDATE 18.6.0.0.0 (29302264)
29301643;ACFS RELEASE UPDATE 18.6.0.0.0 (29301643)
29301631;Database Release Update : 18.6.0.0.190416 (29301631)
28547619;TOMCAT RELEASE UPDATE 18.0.0.0.0 (28547619)
28435192;DBWLM RELEASE UPDATE 18.0.0.0.0 (28435192)
27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171
27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)

+ /u01/app/oracle/12.1.0.1/db1/OPatch/opatch lspatches
28731800;Database Bundle Patch : 12.1.0.2.190115 (28731800)
28729213;OCW PATCH SET UPDATE 12.1.0.2.190115 (28729213)

Run cluvfy was successful, too.

[oracle@racnode-dc1-1 ~]$ cluvfy stage -post crsinst -n racnode-dc1-1,racnode-dc1-2 -verbose

Post-check for cluster services setup was successful.

CVU operation performed:      stage -post crsinst
Date:                         Apr 30, 2019 8:17:49 PM
CVU home:                     /u01/18.3.0.0/grid_2/
User:                         oracle
[oracle@racnode-dc1-1 ~]$

GRID OOP Rollback Patching completed successfully for node1.

[root@racnode-dc1-1 ~]# crsctl check cluster -all
**************************************************************
racnode-dc1-1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode-dc1-2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[root@racnode-dc1-1 ~]#
[root@racnode-dc1-1 ~]# echo $GRID_HOME
/u01/18.3.0.0/grid_2
[root@racnode-dc1-1 ~]# $GRID_HOME/OPatch/opatchauto rollback -switch-clone -logLevel FINEST

OPatchauto session is initiated at Fri May  3 01:06:47 2019

System initialization log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchautodb/systemconfig2019-05-03_01-06-50AM.log.

Session log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchauto/opatchauto2019-05-03_01-08-00AM.log
The id for this session is R47N

Update nodelist in the inventory for oracle home /u01/18.3.0.0/grid.
Update nodelist in the inventory is completed for oracle home /u01/18.3.0.0/grid.


Bringing down CRS service on home /u01/18.3.0.0/grid
CRS service brought down successfully on home /u01/18.3.0.0/grid


Starting CRS service on home /u01/18.3.0.0/grid
CRS service started successfully on home /u01/18.3.0.0/grid


Confirm that all resources have been started from home /u01/18.3.0.0/grid.
All resources have been started successfully from home /u01/18.3.0.0/grid.


OPatchAuto successful.

--------------------------------Summary--------------------------------
Out of place patching clone home(s) summary
____________________________________________
Host : racnode-dc1-1
Actual Home : /u01/18.3.0.0/grid_2
Version:18.0.0.0.0
Clone Home Path : /u01/18.3.0.0/grid


Following homes are skipped during patching as patches are not applicable:

/u01/app/oracle/12.1.0.1/db1

OPatchauto session completed at Fri May  3 01:14:25 2019
Time taken to complete the session 7 minutes, 38 seconds

[root@racnode-dc1-1 ~]# crsctl check cluster -all
**************************************************************
racnode-dc1-1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode-dc1-2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

[root@racnode-dc1-1 ~]# /media/patch/findhomes.sh
   PID NAME                 ORACLE_HOME
 10486 asm_pmon_+asm1       /u01/18.3.0.0/grid/
 10833 apx_pmon_+apx1       /u01/18.3.0.0/grid/

[root@racnode-dc1-1 ~]# cat /etc/oratab
#Backup file is  /u01/app/oracle/12.1.0.1/db1/srvm/admin/oratab.bak.racnode-dc1-1 line added by Agent
#+ASM1:/u01/18.3.0.0/grid:N
hawk1:/u01/app/oracle/12.1.0.1/db1:N
hawk:/u01/app/oracle/12.1.0.1/db1:N             # line added by Agent
[root@racnode-dc1-1 ~]#

GRID OOP Rollback Patching completed successfully for node2.

[root@racnode-dc1-2 ~]# crsctl check cluster -all
**************************************************************
racnode-dc1-1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode-dc1-2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[root@racnode-dc1-2 ~]#
[root@racnode-dc1-2 ~]# echo $GRID_HOME
/u01/18.3.0.0/grid_2
[root@racnode-dc1-2 ~]# $GRID_HOME/OPatch/opatchauto rollback -switch-clone -logLevel FINEST

OPatchauto session is initiated at Fri May  3 01:21:39 2019

System initialization log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchautodb/systemconfig2019-05-03_01-21-41AM.log.

Session log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchauto/opatchauto2019-05-03_01-22-46AM.log
The id for this session is 9RAT

Update nodelist in the inventory for oracle home /u01/18.3.0.0/grid.
Update nodelist in the inventory is completed for oracle home /u01/18.3.0.0/grid.


Bringing down CRS service on home /u01/18.3.0.0/grid
CRS service brought down successfully on home /u01/18.3.0.0/grid


Starting CRS service on home /u01/18.3.0.0/grid
CRS service started successfully on home /u01/18.3.0.0/grid


Confirm that all resources have been started from home /u01/18.3.0.0/grid.
All resources have been started successfully from home /u01/18.3.0.0/grid.


OPatchAuto successful.

--------------------------------Summary--------------------------------
Out of place patching clone home(s) summary
____________________________________________
Host : racnode-dc1-2
Actual Home : /u01/18.3.0.0/grid_2
Version:18.0.0.0.0
Clone Home Path : /u01/18.3.0.0/grid


Following homes are skipped during patching as patches are not applicable:

/u01/app/oracle/12.1.0.1/db1


OPatchauto session completed at Fri May  3 01:40:51 2019
Time taken to complete the session 19 minutes, 12 seconds
[root@racnode-dc1-2 ~]#

GRID OOP Rollback completed successfully for 18.5.0.0.0.

GRID_HOME=/u01/18.3.0.0/grid
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1

Here is an example of inventory after the rollback:

+ /u01/18.3.0.0/grid/OPatch/opatch lspatches
28864607;ACFS RELEASE UPDATE 18.5.0.0.0 (28864607)
28864593;OCW RELEASE UPDATE 18.5.0.0.0 (28864593)
28822489;Database Release Update : 18.5.0.0.190115 (28822489)
28547619;TOMCAT RELEASE UPDATE 18.0.0.0.0 (28547619)
28435192;DBWLM RELEASE UPDATE 18.0.0.0.0 (28435192)
27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171
27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)

+ /u01/app/oracle/12.1.0.1/db1/OPatch/opatch lspatches
28731800;Database Bundle Patch : 12.1.0.2.190115 (28731800)
28729213;OCW PATCH SET UPDATE 12.1.0.2.190115 (28729213)

Validation shows database is OFFLINE,

+ crsctl stat res -w '((TARGET != ONLINE) or (STATE != ONLINE)' -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.GHCHKPT.advm
               OFFLINE OFFLINE      racnode-dc1-1            STABLE
               OFFLINE OFFLINE      racnode-dc1-2            STABLE
ora.crs.ghchkpt.acfs
               OFFLINE OFFLINE      racnode-dc1-1            STABLE
               OFFLINE OFFLINE      racnode-dc1-2            STABLE
ora.helper
               OFFLINE OFFLINE      racnode-dc1-1            STABLE
               OFFLINE OFFLINE      racnode-dc1-2            IDLE,STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.hawk.db
      1        ONLINE  OFFLINE                               Instance Shutdown,STABLE
      2        ONLINE  OFFLINE                               Instance Shutdown,STABLE

Start database FAILED.

[oracle@racnode-dc1-2 ~]$ . /media/patch/hawk.env
The Oracle base has been set to /u01/app/oracle
ORACLE_UNQNAME=hawk
ORACLE_SID=hawk2
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1
Oracle Instance not alive for sid "hawk2"

[oracle@racnode-dc1-2 ~]$ srvctl status database -d $ORACLE_UNQNAME -v
Instance hawk1 is not running on node racnode-dc1-1
Instance hawk2 is not running on node racnode-dc1-2

[oracle@racnode-dc1-2 ~]$ srvctl start database -d $ORACLE_UNQNAME
PRCR-1079 : Failed to start resource ora.hawk.db
CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-1/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-1' failed
CRS-2632: There are no more servers to try to place resource 'ora.hawk.db' on that would satisfy its placement policy
CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-2/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-2' failed
[oracle@racnode-dc1-2 ~]$


[oracle@racnode-dc1-1 ~]$ . /media/patch/hawk.env
The Oracle base has been set to /u01/app/oracle
ORACLE_UNQNAME=hawk
ORACLE_SID=hawk1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1
Oracle Instance not alive for sid "hawk1"

[oracle@racnode-dc1-1 ~]$ srvctl start database -d hawk
PRCR-1079 : Failed to start resource ora.hawk.db
CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-2/crs/trace/crsd_oraagent_oracle.trc".

CRS-5017: The resource action "ora.hawk.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora'
ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba)
. For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-1/crs/trace/crsd_oraagent_oracle.trc".

CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-2' failed
CRS-2632: There are no more servers to try to place resource 'ora.hawk.db' on that would satisfy its placement policy
CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-1' failed
[oracle@racnode-dc1-1 ~]$

Incorrect permissions for the Oracle library was the cause. Changing permissions for $GRID_HOME/bin/oracle (chmod 6751 $GRID_HOME/bin/oracle), and stopping and starting CRS resolved the failure.

[oracle@racnode-dc1-1 dbs]$ ls -lhrt $ORACLE_HOME/bin/oracle
-rwsr-s--x 1 oracle dba 314M Apr 20 16:06 /u01/app/oracle/12.1.0.1/db1/bin/oracle

[oracle@racnode-dc1-1 dbs]$ ls -lhrt /u01/18.3.0.0/grid/bin/oracle
-rwxr-x--x 1 oracle oinstall 396M Apr 20 19:21 /u01/18.3.0.0/grid/bin/oracle

[oracle@racnode-dc1-1 dbs]$ cd /u01/18.3.0.0/grid/bin/
[oracle@racnode-dc1-1 bin]$ chmod 6751 oracle
[oracle@racnode-dc1-1 bin]$ ls -lhrt /u01/18.3.0.0/grid/bin/oracle
-rwsr-s--x 1 oracle oinstall 396M Apr 20 19:21 /u01/18.3.0.0/grid/bin/oracle

[root@racnode-dc1-1 ~]# . /media/patch/gi.env
The Oracle base has been set to /u01/app/oracle
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
GRID_HOME=/u01/18.3.0.0/grid
ORACLE_HOME=/u01/18.3.0.0/grid
Oracle Instance alive for sid "+ASM1"
[root@racnode-dc1-1 ~]# crsctl stop crs

====================================================================================================

[root@racnode-dc1-2 ~]# . /media/patch/gi.env
The Oracle base has been set to /u01/app/oracle
ORACLE_SID=+ASM2
ORACLE_BASE=/u01/app/oracle
GRID_HOME=/u01/18.3.0.0/grid
ORACLE_HOME=/u01/18.3.0.0/grid
Oracle Instance alive for sid "+ASM2"

[root@racnode-dc1-2 ~]# ls -lhrt $GRID_HOME/bin/oracle
-rwxr-x--x 1 oracle oinstall 396M Apr 21 01:44 /u01/18.3.0.0/grid/bin/oracle

[root@racnode-dc1-2 ~]# chmod 6751 $GRID_HOME/bin/oracle
[root@racnode-dc1-2 ~]# ls -lhrt $GRID_HOME/bin/oracle
-rwsr-s--x 1 oracle oinstall 396M Apr 21 01:44 /u01/18.3.0.0/grid/bin/oracle

[root@racnode-dc1-2 ~]# crsctl stop crs

====================================================================================================

[root@racnode-dc1-2 ~]# crsctl start crs
[root@racnode-dc1-1 ~]# crsctl start crs

Reference: RAC Database Can’t Start: ORA-01565, ORA-17503: ksfdopn:10 Failed to open file +DATA/BPBL/spfileBPBL.ora (Doc ID 2316088.1)

In conclusion, while the GRID rollback may have completed successfully, the database was down and resulted in an outage. It may be prudent to check permissions for the Oracle library before a patching or rollback to avoid any disasters.


email

Interested in working with Michael? Schedule a tech call.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *