During the last couple of months, I saw some discussions and questions in different online conferences and user groups about upgrading RAC and Exadata to 188.8.131.52. The questions were mostly about the upgrade procedure, the timing, the things that can happen during the upgrade and the system’s behavior after the upgrade.
I’ve recently done two Exadata upgrades to 184.108.40.206 and want to share the experience. I hope this short note will help someone to make the decision, calculate an estimation, and prepare for maintenance. I am going to talk about upgrades from the version 220.127.116.11 BP10 to 18.104.22.168 BP2.
First, you need to thoroughly read the Oracle Support note [ID 1373255.1] (strongly recommended as a primary guidance for the upgrade), make a general plan, and calculate the estimated time for every upgrade step. Most of the steps can be done in rolling mode and don’t require full downtime for the environment.
The second step is to gather information on your current system and check if your firmware and Exadata software versions fulfill the requirements for 22.214.171.124.
There are several ways to get all necessary information from your Exadata. The simplest way is to run exachk software. It will provide you with all the information compiled together. Go to a database box on your Exadata, set up user equivalence for user Oracle to root the account on all database boxes, storage cells, and infiniband switches, or just keep the passwords handy and provide them when the script requests them. Another way is to check the firmware/software by running some scripts on your Exadata database box:
a) –will show you the version for your Exadata software on a cell box
Here is sample output:
Kernel version: 2.6.18-126.96.36.199.4.el5 #1 SMP Sat Feb 19 03:38:37 EST 2011 x86_64
Cell version: OSS_188.8.131.52.3_LINUX.X64_110616
Cell rpm version: cell-184.108.40.206.3_LINUX.X64_110616-1
Active image version: 220.127.116.11.3.110616
Active image activated: 2011-07-23 17:57:58 -0700
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
In partition rollback: Impossible
Cell boot usb partition: /dev/sdm1
Cell boot usb version: 18.104.22.168.3.110616
Inactive image version: undefined
Rollback to the inactive partitions: Impossible
b) — will show your infiniband switch’s version
here is sample output:
[INFO] SUCCESS Switch swtch-ib2 has correct software and firmware version:
[INFO] SUCCESS Switch swtch-ib2 has correct opensm configuration:
controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5
[INFO] SUCCESS All switches have correct software and firmware version:
[INFO] SUCCESS All switches have correct opensm configuration:
controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 for non spine and 8 for spine switch5
c) — will show your box’s version
dmidecode -s system-product-name
here is sample output:
SUN FIRE X4270 M2 SERVER
You need to make sure that your Datacenter InfiniBand Switch 36 is running software release 1.3.3-2 or later and that your Exadata Storage Server software is release 22.214.171.124.0 or later. In my case, I had proper firmware versions for infiniband switched, but I still needed to upgrade Oracle Exadata’s software. I chose version 126.96.36.199.2 for Exadata storage software, which was the latest at that moment, and used it to upgrade my cell and database boxes.
You have to check if you have applied the fix for ‘Synchronization problem in the IPC state’, unpublished bug 12539000. Also I needed to apply the patch 13404001 to my existing GI and RDBMS software before the upgrade. I recommend to prepare a list of all one-off patches applied to your RDBMS or GI software and check all those to make sure they have been fixed in 188.8.131.52.
II. Upgrade Oracle Exadata software:
1. Exadata storage software had been updated on all cell storage boxes in rolling mode. It was done online during the day without no impact on any running service. It took about 1.5 hours per cell box, and the upgrade completed successfully without any issues for most of them.
The only issue I hit while upgrading one of the cells was Ora-600, which broke the upgrade during inactivation for grid disks.
As a result, the patching process had been aborted in the middle and was unable to continue on the cell box. I found a workaround for the issue.
Here are the steps I took to fix the MS application and start patching for the cell from scratch:
a) I stopped MS services on the cell
cellcli>alter cell shutdown services MS
b) Deployed MS
sh setup_dynamicDeploy -D
c) Started to apply the patch from scratch
patchmgr -cells cell_group -cleanup
patchmgr -cells cell_group -patch -rolling
2. The minimal Pack for database hosts was also applied in rolling mode. It was done without any warning and took about 1 hour per db box. According to my observations, the most time was spent upgrading the ILO software to the new version.
Now, we have new Exadata software on cell and db boxes and are able to start upgrading the Grid Infrastructure and Database homes.
Here are the steps to upgrade Grid Infrustructure to 184.108.40.206:
1. First, you need to check your BP level and install the patch for bug 12539000 if you have BP11 or lower. The bug leads to an Oracle fatal error during the rolling upgrade. It can be done in the rolling mode and takes about 1 – 1.5 hours to complete patching for 4 database nodes.
2. Second, perform all pre-installation steps described in Oracle Support note [ID 1373255.1].
3. The next step is to install the new 220.127.116.11 GI software. Oracle recommends “out of place” upgrade for 11gr2.
I used silent installation for GI 18.104.22.168 with the “UPGRADE” option. It can be done during normal business hours without any impact to the environment, and the software installation will not take too much time. It took about 1 hour for me. You can possibly install BP to the GI in advance before running rootupgrade.sh script and save some time during the database upgrade.
Don’t forget to re-link the software with RDS after installation and don’t run the rootupgrade.sh script!
4. The upgrade of the GI itself should be scheduled when the impact from the rolling upgrade would be minimal. All you need to do now is to run the rootupgrade.sh on each node in rolling mode. It completed in about 10-20 minutes per node without any errors.
Now we have the 22.214.171.124.0 GI and can continue with the database software.
Upgrade the database software and databases to 126.96.36.199:
1. Install the new database software to a new directory and recompile it with the RDS option. I used silent install with “INSTALL_DB_SWONLY” and recompiled the libs to use RDS after it. You can verify if the database is using RDS for IB by running command $ORACLE_HOME/bin/skgxpinfo (for version starting from 188.8.131.52). The installation was simple and took about 1 hour.
2. The next step is to install the latest Bundle Patch (BP) to the GI and RDBMS software. We can do it in rolling mode for GI and the new installed software 184.108.40.206. I didn’t have any issues with the BP. It was installed successfully and took about 1 hour to apply.
3. The next step is to upgrade the databases. You need to verify all parameters and run all pre-upgrade checks for the databases.
The Oracle Support note [ID 1373255.1] will help you go through all the steps, and you can use dbua or manual upgrade for the databases. It will require downtime for upgraded databases. It should not take more than an hour to perform the upgrade and post-upgrade steps on a database. The upgrade ran smoothly for me and completed successfully.
4. I hit a couple of internal errors after the upgrade. One was related to materialized view, and I resolved it by rebuilding the mmatview. Another was related to a shared server connection and was eventually resolved. I also had a problem with AWR reports. There were no completed reports after the upgrade, and the problem was fixed only after gathering statistics for fixed objects.
Finally, you need to adjust the database compatible parameter, ASM, and disk group compatibility, verify all processes to make sure no one used old binaries, and detach and move out old binaries to clear the space.