How to Fix the Status of the Oracle GI CRS After a Failed Upgrade

Posted in: Oracle, Renew Refresh Republish, Technical Track

Editor’s Note: Because our bloggers have lots of useful tips, every now and then we update and bring forward a popular post from the past. Today’s post was originally published on April 29, 2019.

Blogger’s Note: The information in this post is also valid for 19.x upgrades, in addition to 18.x upgrades.

A couple of weeks ago I was working on a two-node Oracle Grid Infrastructure upgrade from 12.1 to 18.5. Everything went well, with both rootupgrade.sh scripts running correctly. The only thing pending to run was the gridSetup.sh -executeConfigTools command, which failed in the rhprepos upgradeSchema section:

[oracle@node1 /u01/app/18.5.0/grid ]$ ./gridSetup.sh -executeConfigTools -responseFile /tmp/gridresponse.rsp -silent 

########################################
# From the upgrade log file :
########################################
INFO: [Apr 9, 2019 3:24:08 PM] Starting 'Upgrading RHP Repository' 
INFO: [Apr 9, 2019 3:24:08 PM] Starting 'Upgrading RHP Repository' 
INFO: [Apr 9, 2019 3:24:08 PM] Executing RHPUPGRADE 
INFO: [Apr 9, 2019 3:24:08 PM] Command /u01/app/18.5.0/grid/bin/rhprepos upgradeSchema -fromversion 12.1.0.2.0 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn.handleProcess() entered. 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn: getting configAssistantParmas. 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn: checking secretArguments. 
INFO: [Apr 9, 2019 3:24:08 PM] No arguments to pass to stdin 
INFO: [Apr 9, 2019 3:24:08 PM] ... GenericInternalPlugIn: starting read loop. 
INFO: [Apr 9, 2019 3:24:11 PM] Completed Plugin named: rhpupgrade 
INFO: [Apr 9, 2019 3:24:11 PM] ConfigClient.saveSession method called 
INFO: [Apr 9, 2019 3:24:11 PM] Upgrading RHP Repository failed. 
INFO: [Apr 9, 2019 3:24:11 PM] Upgrading RHP Repository failed. 
INFO: [Apr 9, 2019 3:24:11 PM] ConfigClient.executeSelectedToolsInAggregate action performed 
...
INFO: [Apr 9, 2019 3:24:11 PM] Validating state <setup> 
WARNING: [Apr 9, 2019 3:24:11 PM] [WARNING] [INS-43080] Some of the configuration assistants failed, were cancelled or skipped 

[oracle@node1 ~]$ crsctl query crs activeversion -f 
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [UPGRADE FINAL]. The cluster active patch level is [2532936542].

After looking for information in MOS (My Oracle Support), I couldn’t find much to help me solve the issue; just a lot of bugs related to the RHP (rapid home provisioning) repository.

The main problem was that instead of upgrading to 18.5 (or 19.5) during the upgrade process, the MGMTDB remained in version 12.1. As a result, when the RHP migration tried to execute, it failed.

I was lucky enough to get on a call with a good friend (Ricardo Gonzalez) who is the PM of the RHP, and we were able to work through it. Below is the solution for the issue.

The first step is to bring up the MGMTDB in the 12.1 GI_HOME.

[oracle@node1 ~]$ srvctl start mgmtdb
PRCR-1079 : Failed to start resource ora.mgmtdb
CRS-2501: Resource 'ora.mgmtdb' is disabled
[oracle@node1 ~]$ srvctl enable mgmtdb
[oracle@node1 ~]$ srvctl start mgmtdb 
[oracle@node1 ~]$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node node2

Once the MGMTDB is up and running, you need to drop the RHP service that was created during the rootupgrade process. You do this from the 18.5 GI_HOME.

[root@node2 ~]$ env | grep ORA
ORACLE_SID=+ASM2
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[root@node2 ~]$ srvctl remove rhpserver
PRCT-1470 : failed to reset the Rapid Home Provisioning (RHP) repository
 PRCT-1011 : Failed to run "mgmtca". Detailed error: [MGTCA-1005 : Could not connect to the GIMR. 
 ORA-01034: ORACLE not available
 ORA-27101: shared memory realm does not exist
 Linux-x86_64 Error: 2: No such file or directory
 Additional information: 4150
 Additional information: -1526109961
 ]
[root@node2 ~]$ srvctl remove rhpserver -f

Now that your have removed the RHP service, you need to remove the MGMTDB in 12.1.

You should do this from the first node. While it’s possible to do it from the other nodes, Oracle highly recommends doing it in the first node. Accordingly, if it’s running from any other node, relocate it to the first node.

########################################
# As root user in BOTH nodes
########################################
#Node 1
[root@node1 ~]$ export ORACLE_HOME=/u01/app/12.1.0.2/grid
[root@node1 ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[root@node1 ~]$ crsctl stop res ora.crf -init
[root@node1 ~]$ crsctl modify res ora.crf -attr ENABLED=0 -init

#Node 2
[root@node2 ~]$ export ORACLE_HOME=/u01/app/12.1.0.2/grid
[root@node2 ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[root@node2 ~]$ crsctl stop res ora.crf -init
[root@node2 ~]$ crsctl modify res ora.crf -attr ENABLED=0 -init

########################################
# As oracle User on Node 1
########################################
[oracle@node1 ~]$ export ORACLE_HOME=/u01/app/12.1.0.2/grid
[oracle@node1 ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[oracle@node1 ~]$ srvctl relocate mgmtdb -node node1                                                          
[oracle@node1 ~]$ srvctl stop mgmtdb
[oracle@node1 ~]$ srvctl stop mgmtlsnr
[oracle@node1 ~]$ srvctl remove mgmtdb -force
Remove the database _mgmtdb? (y/[n]) y
########################################
##### Manually Removed the mgmtdb files
##### Verify that the files for MGMTDB match your environment before deleting them
########################################
ASMCMD> cd DBFS_DG/_MGMTDB/DATAFILE
ASMCMD> ls
SYSAUX.257.879563483
SYSTEM.258.879563493
UNDOTBS1.259.879563509
ASMCMD> rm system.258.879563493
ASMCMD> rm sysaux.257.879563483
ASMCMD> rm undotbs1.259.879563509
ASMCMD> cd ../PARAMETERFILE
ASMCMD> rm spfile.268.879563627
ASMCMD> cd ../TEMPFILE
ASMCMD> rm TEMP.264.879563553
ASMCMD> cd ../ONLINELOG
ASMCMD> rm group_1.261.879563549
ASMCMD> rm group_2.262.879563549
ASMCMD> rm group_3.263.879563549
ASMCMD> cd ../CONTROLFILE
ASMCMD> rm Current.260.879563547

Once the MGMTDB is deleted, you now run the mdbutil.pl (which you can grab from MOS Doc 2065175.1) and add the MGMTDB in the 18.5 GI_HOME.

########################################
# As oracle User on Node 1
########################################
[oracle@node1 ~]$ env | grep ORA
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[oracle@node1 ~]$ ./mdbutil.pl --addmdb --target=+DBFS_DG
mdbutil.pl version : 1.95
2019-04-14 19:11:48: I Starting To Configure MGMTDB at +DBFS_DG...
2019-04-14 19:11:53: I Container database creation in progress... for GI 18.0.0.0.0
2019-04-14 19:20:29: I Plugable database creation in progress...
2019-04-14 19:22:25: I Executing "/tmp/mdbutil.pl --addchm" on node1 as root to configure CHM.
root@node1's password:
2019-04-14 19:23:08: W Not able to execute "/tmp/mdbutil.pl --addchm" on node1 as root to configure CHM.
2019-04-14 19:23:08: I Executing "/tmp/mdbutil.pl --addchm" on node2 as root to configure CHM.
root@node2's password:
2019-04-14 19:23:27: W Not able to execute "/tmp/mdbutil.pl --addchm" on node2 as root to configure CHM.
2019-04-14 19:23:27: I MGMTDB & CHM configuration done!

########################################
# As root user in BOTH nodes
########################################
[root@node1 ~]$ env | grep ORA
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[root@node1 ~]$ /tmp/mdbutil.pl --addchm ##Only if it failed in the mdbutil.pl execution
[root@node1 ~]$ crsctl modify res ora.crf -attr ENABLED=1 -init
[root@node1 ~]$ crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'node1'
CRS-2676: Start of 'ora.crf' on 'node1' succeeded

[root@node2 ~]$ env | grep ORA
ORACLE_SID=+ASM2
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[root@node2 ~]$ /tmp/mdbutil.pl --addchm ##Only if it failed in the mdbutil.pl execution
[root@node2 ~]$ crsctl modify res ora.crf -attr ENABLED=1 -init
[root@node2 ~]$ crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'node2'
CRS-2676: Start of 'ora.crf' on 'node2' succeeded

########################################
# As oracle User on Node 1
########################################
[oracle@node1 ~]$ srvctl status MGMTDB
Database is enabled
Instance -MGMTDB is running on node tstedbadm01
oracle@node1 : ~> srvctl status mgmtlsnr
Listener MGMTLSNR is enabled
Listener MGMTLSNR is running on node(s): tstedbadm01
[oracle@node1 ~]$ srvctl config MGMTDB
Database unique name: _mgmtdb
Database name: 
Oracle home: <CRS home>
Oracle user: oracle
Spfile: +DBFS_DG/_MGMTDB/PARAMETERFILE/spfile.282.1005320705
Password file: 
Domain: 
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Type: Management
PDB name: GIMR_DSCREP_10
PDB service: GIMR_DSCREP_10
Cluster name: test-clu
Database instance: -MGMTDB

Once the MGMTDB has been recreated, you can rerun the gridSetup.sh -executeConfigTools command, and you will see that the cluster status is now NORMAL and everything is running as expected in version 18.5 (or 19.5).

[oracle@node1 ~]$ env | grep ORA
ORACLE_SID=+ASM1
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/18.5.0/grid
[oracle@node1 ~]$ /u01/app/18.5.0/grid/gridSetup.sh -executeConfigTools -responseFile /tmp/gridresponse.rsp -silent 
Launching Oracle Grid Infrastructure Setup Wizard...

You can find the logs of this session at:
/u01/app/oraInventory/logs/GridSetupActions2019-04-11_04-07-18PM

Successfully Configured Software.

[oracle@node1 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [2532936542].

[oracle@node1 ~]$ crsctl check cluster -all
**************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

I hope this blog post helps you solve this issue if you ever face this problem.

Quick note: We were not using the rapid home provisioning feature, and the deletion of the GIMR database did not have any impact on the environment . If you are using RHP, I highly recommend you contact Oracle before running this, to avoid losing the RHP repository.

Oracle also confirmed that this is a bug in the upgrade process of 18.X, so hopefully they will fix it in the future.

Note: This was originally posted on rene-ace.com.

email

Authors

Interested in working with Rene? Schedule a tech call.

About the Author

Currently I am an Oracle ACE ; Speaker at Oracle Open World, Oracle Developers Day, OTN Tour Latin America and APAC region and IOUG Collaborate ; Co-President of ORAMEX (Mexico Oracle User Group); At the moment I am an Oracle Project Engineer at Pythian. In my free time I like to say that I'm Movie Fanatic, Music Lover and bringing the best from México (Mexihtli) to the rest of the world and in the process photographing it ;)

2 Comments. Leave new

Such a great article, I upgrated from 12.2 to 19.8, I faced this issue and worked good. The only thing that I changed was the part of delete the old MGM database I used the force option because ora.chad process depended on this one.
Thanks for the great detail level. Appreciated…

Reply

Good to know it helped you out , it was painful when it happened to me with very little information about it :)

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *