Here we will see how the deployment process of the new Exadata X8M works.
RoCE issues from the factory
Exadata X8M servers are coming from the factory with the RoCE private network disabled. In case the Field Engineer assigned to work on the physical setup of the Exadata did not enable the RoCE network it is your job to do so.
RoCE network must be enabled on all Compute Nodes and also on all Storage Servers.
In Exadata X8M the private network is not on InfiniBand switches anymore, but on RoCE (RDMA over Converged Ethernet) Fabric switches. The interface cards we see in the operating system are re0 and re1.
When checking the active interface cards we cannot see re0 and re1:
[[email protected] ~]# ifconfig bondeth0: flags=5187<up,broadcast,running,master,multicast> mtu 1500 inet 10.201.80.54 netmask 255.255.254.0 broadcast 10.201.81.255 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 54309 bytes 3744342 (3.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14088 bytes 1318384 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0: flags=4163<up,broadcast,running,multicast> mtu 1500 inet 10.201.84.190 netmask 255.255.254.0 broadcast 10.201.85.255 ether 00:10:e0:ee:c5:6c txqueuelen 1000 (Ethernet) RX packets 279171 bytes 18019054 (17.1 MiB) RX errors 0 dropped 1 overruns 0 frame 0 TX packets 9553 bytes 1693920 (1.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0x9ca00000-9cafffff eth3: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 31847 bytes 2396622 (2.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14088 bytes 1318384 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth4: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 22492 bytes 1349520 (1.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 104 (104.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<up,loopback,running> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 136405 bytes 6139347 (5.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 136405 bytes 6139347 (5.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Most of the InfiniBand related commands/tools do not work anymore, but ibstat still does, so we can use that tool to check the state of the private network state:
[[email protected] ~]# ibstat | grep -i 'state\|rate' State: Down Physical state: Disabled Rate: 100 State: Down Physical state: Disabled Rate: 100
Checking the config of RoCE interface cards:
[[email protected] ~]# cat /etc/sysconfig/network-scripts/ifcfg-re0 #### DO NOT REMOVE THESE LINES #### #### %GENERATED BY CELL% #### DEVICE=re0 BOOTPROTO=none ONBOOT=no HOTPLUG=no IPV6INIT=no [[email protected] ~]# cat /etc/sysconfig/network-scripts/ifcfg-re1 #### DO NOT REMOVE THESE LINES #### #### %GENERATED BY CELL% #### DEVICE=re1 BOOTPROTO=none ONBOOT=no HOTPLUG=no IPV6INIT=no
Bringing RoCE interface cards up:
[[email protected] ~]# ifup re0 /sbin/ifup-local: /sbin/ifup-local re0: /sbin/ifup-local: + RoCE configuration... /sbin/ifup-local: + Matched (wildcard) interface re0. /sbin/ifup-local: + RoCE Configuration: /bin/roce_config -i re0... NETDEV=re0; IBDEV=mlx5_0; PORT=1 + RoCE v2 is set as default rdma_cm preference + Tos mapping is set + Default roce tos is set to 32 + Trust mode is set to dscp + PFC is configured as 0,1,1,1,1,1,0,0 + Congestion control algo/mask are set as expected + Buffers are configured as 32768,229120,0,0,0,0,0,0 Finished configuring "re0" ã½(â¢â¿â¢)ã /sbin/ifup-local: + Non-RoCE Configuration... /sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re0. [[email protected] ~]# ifup re1 /sbin/ifup-local: /sbin/ifup-local re1: /sbin/ifup-local: + RoCE configuration... /sbin/ifup-local: + Matched (wildcard) interface re1. /sbin/ifup-local: + RoCE Configuration: /bin/roce_config -i re1... NETDEV=re1; IBDEV=mlx5_0; PORT=2 + RoCE v2 is set as default rdma_cm preference + Tos mapping is set + Default roce tos is set to 32 + Trust mode is set to dscp + PFC is configured as 0,1,1,1,1,1,0,0 + Congestion control algo/mask are set as expected + Buffers are configured as 32768,229120,0,0,0,0,0,0 Finished configuring "re1" ã½(â¢â¿â¢)ã /sbin/ifup-local: + Non-RoCE Configuration... /sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re1.
Now we can see that the interfaces re0 and re1 are up, but with no IPs assigned:
[[email protected] ~]# ifconfig bondeth0: flags=5187<up,broadcast,running,master,multicast> mtu 1500 inet 10.201.80.54 netmask 255.255.254.0 broadcast 10.201.81.255 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 54533 bytes 3767354 (3.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14414 bytes 1349944 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0: flags=4163<up,broadcast,running,multicast> mtu 1500 inet 10.201.84.190 netmask 255.255.254.0 broadcast 10.201.85.255 ether 00:10:e0:ee:c5:6c txqueuelen 1000 (Ethernet) RX packets 279584 bytes 18051211 (17.2 MiB) RX errors 0 dropped 1 overruns 0 frame 0 TX packets 9727 bytes 1720009 (1.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0x9ca00000-9cafffff eth3: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 32071 bytes 2419634 (2.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14414 bytes 1349944 (1.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth4: flags=6211<up,broadcast,running,slave,multicast> mtu 1500 ether bc:97:e1:68:b2:10 txqueuelen 1000 (Ethernet) RX packets 22492 bytes 1349520 (1.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 104 (104.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<up,loopback,running> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 136804 bytes 6157123 (5.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 136804 bytes 6157123 (5.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 re0: flags=4163<up,broadcast,running,multicast> mtu 1500 ether 0c:42:a1:3b:45:12 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 re1: flags=4163<up,broadcast,running,multicast> mtu 1500 ether 0c:42:a1:3b:45:13 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
We can use ibstat again to confirm the interfaces are enabled:
[[email protected] ~]# ibstat | grep -i 'state\|rate' State: Active Physical state: LinkUp Rate: 100 State: Active Physical state: LinkUp Rate: 100
OEDA specifics
To start any Exadata deployment you need the OEDA configuration files. They are a set of files generated by the OEDA (Oracle Exadata Deployment Assistant) tool. OEDA tool is currently a web-based tool that will allow the client to fill up all the IP addresses and hostnames that the new Exadata will be assigned. Normally this step is taken by the client with the support of their network team.
Configuration files needed:
- Clientname-clustername.xml
- Clientname-clustername-InstallationTemplate.html
- Clientname-clustername-preconf.csv
The OEDA tool for Linux is also needed and can be downloaded from the Patch ID 30640393. It is recommended to go with the latest version available, but if the configuration files were generated with a different/older version go with that version to avoid warnings during the execution of the onecommand.
Stage the OEDA for Linux in /u01/onecommand/ and unzip it:
[[email protected] ~]# mkdir -p /u01/onecommand/ [[email protected] ~]# unzip -q p30640393_193800_Linux-x86-64.zip -d /u01/onecommand/ [[email protected] ~]# cd /u01/onecommand/linux-x64
Once in the correct directory run onecommand to list the steps just to make sure it is working:
[[email protected] linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -l Initializing 1. Validate Configuration File 2. Setup Required Files 3. Create Users 4. Setup Cell Connectivity 5. Verify Infiniband 6. Calibrate Cells 7. Create Cell Disks 8. Create Grid Disks 9. Install Cluster Software 10. Initialize Cluster Software 11. Install Database Software 12. Relink Database with RDS 13. Create ASM Diskgroups 14. Create Databases 15. Apply Security Fixes 16. Install Autonomous Health Framework 17. Create Installation Summary 18. Resecure Machine
applyElasticConfig.sh preparation and execution
Technical background
applyElasticConfig.sh is a script, provided by Oracle within the OEDA, which performs the initial setup of the compute nodes and storage servers. That script works with the factory IP range and hostnames by default, but we found a way to trick it and make it work even when the client had already changed the IP addresses and hostnames. The initial setup is basically defining the network configuration, IP addresses, hostnames, DNS and NTP configuration and the script will look for nodes in the IP range of the 172.x.x.x network, so if the client had already changed the IPs and hostnames the script will not find anything. It is worth to mention that there is no documentation about this anywhere in the docs.oracle.com. You can find something here:
- Configuring Oracle Exadata Database Machine
- ApplyElasticConfig failed during the execution of elasticConfig.sh (Doc ID 2175587.1)
- Bug 23064772 OEDA: applyelasticconfig.sh fails with error unable to locate rack item with ulocation
Even though these documents briefly mention the applyElasticConfig.sh script they do not mention how to overcome the issue when the IPs and hostnames were already changed.
Preparation
In order to make the script look for the servers when their hostnames and IPs were changed, you have to edit the es.properties file which is located under /u01/onecommand/linux-x64/properties. Consider changing only the parameters related to the IPs, Subnets, and Hostnames. The variables we care about are: ROCEELASTICNODEIPRANGE, ROCEELASTICILOMIPRANGE, ELASTICSUBNETS and SKIPHOSTNAMECHECK. Change those to the range of IPs found in the Clientname-clustername-InstallationTemplate.html for each network:
- ROCEELASTICNODEIPRANGE expects the range of IPs in the management network.
- ROCEELASTICILOMIPRANGE expects the range of IPs of the ILOM of the servers.
- ELASTICSUBNETS expects the subnet of the management network.
- SKIPHOSTNAMECHECK defaults to false, so if the hostnames were also changed you want to set this to true.
Find some examples below:
[[email protected] linux-x64]# cat properties/es.properties|grep ELASTIC #ROCEELASTICNODEIPRANGE=192.168.1.1:192.168.1.99 ROCEELASTICNODEIPRANGE=10.201.84.190:10.201.84.206 ROCEELASTICILOMIPRANGE=10.201.84.196:10.201.84.201 ELASTICCONFIGMARKERFILE=/.elasticConfig ELASTICRACKNAMES=x5,x6,sl6,x7,x8 QINQELASTICCONFIGMINVERION=20.1.0.0.0.200323 #ELASTICSUBNETS=172.16.2:172.16.3:172.16.4:172.16.5:172.16.6:172.16.7 ELASTICSUBNETS=10.201.84 [[email protected] linux-x64]# grep SKIPHOST properties/es.properties #SKIPHOSTNAMECHECK=false SKIPHOSTNAMECHECK=true
Execution
Now that you have the es.properties ELASTIC* parameters matching your infrastructure configuration you are ready to execute the applyElasticConfig.sh script. To execute it you just need to call the script passing the Clientname-clustername.xml configuration file to it:
[[email protected] linux-x64]# ./applyElasticConfig.sh -cf /root/config/Client-ex03.xml Applying Elastic Config... Discovering pingable nodes in IP Range of 10.201.84.190 - 10.201.84.206..... Found 6 pingable hosts..[10.201.84.193, 10.201.84.194, 10.201.84.195, 10.201.84.190, 10.201.84.191, 10.201.84.192] Validating Hostnames.. Discovering ILOM IP Addresses.. Getting uLocations... Getting Mac Addressess.. Getting uLocations... Mapping Machines with local hostnames.. Mapping Machines with uLocations.. Checking if Marker file exists.. Updating machines with Mac Address for 6 valid machines. Creating preconf.. Writing host-specific preconf files.. Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel02_preconf.csv for ex03cel02 .... Preconf file copied to ex03cel02 as /var/log/exadatatmp/firstconf/ex03cel02_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db01_preconf.csv for ex03db01 .... Preconf file copied to ex03db01 as /var/log/exadatatmp/firstconf/ex03db01_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db03_preconf.csv for ex03db03 .... Preconf file copied to ex03db03 as /var/log/exadatatmp/firstconf/ex03db03_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel03_preconf.csv for ex03cel03 .... Preconf file copied to ex03cel03 as /var/log/exadatatmp/firstconf/ex03cel03_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel01_preconf.csv for ex03cel01 .... Preconf file copied to ex03cel01 as /var/log/exadatatmp/firstconf/ex03cel01_preconf.csv Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db02_preconf.csv for ex03db02 .... Preconf file copied to ex03db02 as /var/log/exadatatmp/firstconf/ex03db02_preconf.csv Running Elastic Configuration on ex03cel02.client.com Running Elastic Configuration on ex03db01.client.com Running Elastic Configuration on ex03db03.client.com Running Elastic Configuration on ex03cel03.client.com Running Elastic Configuration on ex03cel01.client.com Running Elastic Configuration on ex03db02.client.com /////
OEDA onecommand preparation and execution
Technical background
OEDA is a set of scripts, files, and a form we use to plan and deploy an Exadata. Sometimes we refer to it as the onecommand utility. It is called onecommand because with just one command we can deploy everything. This onecommand is the install.sh script.
Preparation
To be able to run the install.sh script we have to prepare some things first in the environment. Some prerequisites:
- The switches must have been already set up by the Field Engineer responsible for the physical installation of the hardware.
- The applyElasticConfig.sh script must have been run and completed successfully.
- The files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html must be staged to /u01/onecommand/linux-x64/WorkDir.
Stage the files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html to /u01/onecommand/linux-x64/WorkDir:
[[email protected] ~]# ls -lh /u01/onecommand/linux-x64/WorkDir total X.9G -rwxr-xr-x 1 root root 355M Jun 9 12:34 ahf_setup -rw-r--r-- 1 root root 2.9G Jun 9 12:54 V982063-01.zip -rw-r--r-- 1 root root 2.7G Jun 9 12:57 V982068-01.zip -rw-r--r-- 1 root root 2.4G Jun 9 12:57 p30805684_190000_Linux-x86-64.zip -rw-r--r-- 1 root root 600M Jun 9 12:57 p6880880_180000_Linux-x86-64.zip -rw-r--r-- 1 root root 1.3G Jun 9 12:57 p30899722_190000_Linux-x86-64.zip
After all of this is done you can run the step 1 to validate the configuration files with the environment:
[[email protected] linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 1 Initializing Executing Validate Configuration File Validating cluster: ex03-clu1 Locating machines... Validating platinum... Checking Disk Tests Status.... Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version. Validating nodes for database readiness... Completed validation... SUCCESS: Ip address: 10.201.84.190 is configured correctly SUCCESS: Ip address: 10.201.80.54 is configured correctly SUCCESS: Ip address: 10.201.84.191 is configured correctly SUCCESS: Ip address: 10.201.80.55 is configured correctly SUCCESS: Ip address: 10.201.84.192 is configured correctly SUCCESS: Ip address: 10.201.80.56 is configured correctly SUCCESS: Ip address: 10.201.80.60 is configured correctly SUCCESS: Ip address: 10.201.80.62 is configured correctly SUCCESS: Ip address: 10.201.80.61 is configured correctly SUCCESS: Ip address: 10.201.80.58 is configured correctly SUCCESS: Ip address: 10.201.80.59 is configured correctly SUCCESS: Ip address: 10.201.80.57 is configured correctly SUCCESS: Validated NTP server 10.248.1.1 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip exists... SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/ahf_setup exists... SUCCESS: Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version. SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db01 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db02 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db03 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on ex03db01 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on ex03db02 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on ex03db03 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1 SUCCESS: DatabaseHome Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1 SUCCESS: Disk size 14000GB on cell ex03cel01.client.com matches the value specified in the OEDA configuration file SUCCESS: Disk size 14000GB on cell ex03cel02.client.com matches the value specified in the OEDA configuration file SUCCESS: Disk size 14000GB on cell ex03cel03.client.com matches the value specified in the OEDA configuration file SUCCESS: Number of physical disks on ex03cel01.client.com matches the value specified in OEDA configuration file SUCCESS: Number of physical disks on ex03cel02.client.com matches the value specified in OEDA configuration file SUCCESS: Number of physical disks on ex03cel03.client.com matches the value specified in OEDA configuration file Successfully completed execution of step Validate Configuration File [elapsed Time [Elapsed = 85395 mS [1.0 minutes] Tue Jun 09 22:51:44 PDT 2020]]
If it finishes successfully you are good to move forward.
Execution
Now we just need to execute the remaining steps. You can execute one-by-one or all in a row. I normally do the step 1 and step 2 separate from the others just because they tend to fail easier than others. Running all of them in a row would not cause any harm since once any step fails the execution will immediately stop. So it is up to you how you would like to execute it.
In case you need to undo any of the steps you can use the -u and the step you would like to undo. You can use the install.sh -h to help you on that:
[[email protected] linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -h Warning: Invalid input(s) for {-h=null} ********************************** install.sh -cf <config.xml> -l [options] install.sh -cf <config.xml> -s <step #=''> | -r <num-num> install.sh ARGUMENTS: -l List all the steps that exist -cf Use to specify the full path for the config file -s <step #=''> Run only the specified step -r <num-num> Run the steps one after the other as long as no errors are encountered -u <num-num> | <step#> Undo a range of steps or a particular step For a range of steps, specify the steps in reverse order -h Print usage information -override Force to run undo steps related to celldisk and grid disk -force Delete binaries under grid home and database home when uninstalling clusterware and database software -delete Delete staging area/directories -nocalibratecell Create the installation summary file without running the calibrate cell command -noinfinicheck Create the installation summary file without running InfiniBand verification -p Prompts for root password for each or all the nodes. This option allows deployments in Exadata environments with non-default and/or different root passwords on each of the nodes in the rack -usesu Use SU with root account to run commands for grid/oracle users -sshkeys Run deployment with root SSH Keys that are setup by setuprootssh.sh or oedacli. Must be used with "-usesu" -customstep Run custom actions. Actions can be: updatecellroute: generate cellroute.ora in domUs -clustername Specify the cluster name, or All. Only used with -customstep to specify the cluster on which to run the custom action -upgradeNetworkFirmware X7 Broadcom network card Firmware upgrade Version : 200519
To undo a step simply execute this one to undo step 2:
[[email protected] linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2
Or to undo from step 2 to step 4:
[[email protected] linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2-4
Here is the execution of step 2:
[[email protected] linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 2 Initializing Executing Setup Required Files Copying and extracting required files... Required files are: /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip /u01/onecommand/linux-x64/WorkDir/V982068-01.zip /u01/onecommand/linux-x64/WorkDir/V982063-01.zip Copying required files... Checking status of remote files... Checking status of existing files on remote nodes... Getting status of local files... Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip at /u01/app/oracle/Oeda/Software/V982063-01.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip at /u01/app/oracle/Oeda/Software/V982068-01.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30805684_190000_Linux-x86-64.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30899722_190000_Linux-x86-64.zip Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/Patches/p6880880_180000_Linux-x86-64.zip Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db02.client.com Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db02.client.com Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db02.client.com Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db03.client.com Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db03.client.com Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db03.client.com Completed copying files... Extracting required files... Copying resourcecontrol and other required files No config Keys in the configuration file.. Creating databasemachine.xml for EM discovery Done Creating databasemachine.xml for EM discovery Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 325110 mS [5.0 minutes] Wed Jun 10 12:16:46 CDT 2020]]
Here is the execution of steps from 3 to 8:
[[email protected] linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 3-8 Initializing Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create Users Creating users... Creating users in cluster ex03-clu1 Validating existing users and groups... Creating required directories on nodes in cluster ex03-clu1 Updating /etc/hosts on nodes in cluster ex03-clu1 Setting up ssh for users in cluster ex03-clu1 Creating cell diag collection user CELLDIAG on cell servers.. Completed creating all users... Successfully completed execution of step Create Users [elapsed Time [Elapsed = 77818 mS [1.0 minutes] Wed Jun 10 12:20:31 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Setup Cell Connectivity Creating cellip.ora and cellinit.ora ... Creating cellip.ora for cluster ex03-clu1 Creating cellinit.ora for cluster ex03-clu1 Done creating cellip.ora and cellinit.ora... Successfully completed execution of step Setup Cell Connectivity [elapsed Time [Elapsed = 14675 mS [0.0 minutes] Wed Jun 10 12:20:52 CDT 2020]] Executing Verify Infiniband Validating infiniband network with rds-ping... Check Admin network connectivity... Running infinicheck to verify infiniband fabric for cluster ex03-clu1... Running verify topology to verify infiniband network... No Infiniband link errors found... SUCCESS: Verify topology does not report any errors on node ex03db01.client.com... ****************ex03db01***************** Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology Verify topology is not supported on RoCE ******************************************** SUCCESS: Verify topology does not report any errors on node ex03db02.client.com... ****************ex03db02***************** Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology Verify topology is not supported on RoCE ******************************************** SUCCESS: Verify topology does not report any errors on node ex03db03.client.com... ****************ex03db03***************** Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology Verify topology is not supported on RoCE ******************************************** Successfully completed execution of step Verify Infiniband [elapsed Time [Elapsed = 280227 mS [4.0 minutes] Wed Jun 10 12:25:37 CDT 2020]] Executing Calibrate Cells Calibrating cells... Successfully completed execution of step Calibrate Cells [elapsed Time [Elapsed = 461064 mS [7.0 minutes] Wed Jun 10 12:33:18 CDT 2020]] Executing Create Cell Disks Validating Self-Signed Certificates on cell servers... Fixing Cell Certificates on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com] Reconfiguring WLS... Cell name attribute does not match hostnames Cell ex03cel03 has cell name ru06, cell name attribute will be reset to ex03cel03 Cell ex03cel01 has cell name ru02, cell name attribute will be reset to ex03cel01 Cell ex03cel02 has cell name ru04, cell name attribute will be reset to ex03cel02 Checking physical disks for errors before creating celldisks Creating cell disks... Dropping Flash Cache before enabling WriteBack on cells [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com] Enable FlashCache mode to WriteBack in [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com] Creating flashcache on cells... Successfully completed execution of step Create Cell Disks [elapsed Time [Elapsed = 218067 mS [3.0 minutes] Wed Jun 10 12:36:56 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create Grid Disks Creating grid disks for cluster ex03-clu1 Checking Cell Disk status... Successfully completed execution of step Create Grid Disks [elapsed Time [Elapsed = 123858 mS [2.0 minutes] Wed Jun 10 12:39:04 CDT 2020]] [[email protected] linux-x64]#
Here is the execution of steps from 9 to 16:
[[email protected] linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 9-16 Initializing Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Install Cluster Software Installing cluster ex03-clu1 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Writing grid response file for cluster ex03-clu1 Running clusterware installer... Setting up Opatch for cluster ex03-clu1 Patching cluster ex03-clu1... Successfully completed execution of step Install Cluster Software [elapsed Time [Elapsed = 667497 mS [11.0 minutes] Wed Jun 10 12:51:15 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Initialize Cluster Software Initializing cluster ex03-clu1 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Writing grid response file for cluster ex03-clu1 Running root.sh on node ex03db01.client.com Checking file root_ex03db01.client.com_2020-06-10_12-54-03-631071286.log on node ex03db01.client.com Running root.sh on node ex03db02.client.com Checking file root_ex03db02.client.com_2020-06-10_13-02-42-916817198.log on node ex03db02.client.com Running root.sh on node ex03db03.client.com Checking file root_ex03db03.client.com_2020-06-10_13-05-42-659221162.log on node ex03db03.client.com Generating response file for Configuration Tools... Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Writing grid response file for cluster ex03-clu1 Running Configuration Assistants on ex03db01.client.com Checking status of cluster... Cluster Verification completed successfully Successfully completed execution of step Initialize Cluster Software [elapsed Time [Elapsed = 1184567 mS [19.0 minutes] Wed Jun 10 13:11:06 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Install Database Software Installing database software ... Validating nodes for database readiness... Installing database software with database home name DbHome1 Installing database software ... Extracting Database Software file /u01/app/oracle/Oeda/Software/V982063-01.zip into /u01/app/oracle/product/19.0.0.0/dbhome_1 Running database installer on node ex03db01.client.com ... Please wait... After running database installer... Patching Database Home /u01/app/oracle/product/19.0.0.0/dbhome_1 Successfully completed execution of step Install Database Software [elapsed Time [Elapsed = 717961 mS [11.0 minutes] Wed Jun 10 13:23:11 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Relink Database with RDS Successfully completed execution of step Relink Database with RDS [elapsed Time [Elapsed = 36009 mS [0.0 minutes] Wed Jun 10 13:23:54 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create ASM Diskgroups Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin Validating ASM Diskgroups.. Successfully completed execution of step Create ASM Diskgroups [elapsed Time [Elapsed = 138147 mS [2.0 minutes] Wed Jun 10 13:26:20 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Create Databases Setting up Huge Pages for Database..[test] Creating database [test]... Patch 30805684 requires specific post-installation steps. Databases will be restarted ... Running datapatch on database [test] Recompiling Invalid Objects (if any) on database [test] Successfully completed execution of step Create Databases [elapsed Time [Elapsed = 1252604 mS [20.0 minutes] Wed Jun 10 13:47:19 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Apply Security Fixes Setting up Huge Pages for ASM Instance.. Bouncing clusterware to set required parameters... Checking and enabling turbo mode if required... ex03db03.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db03.client.com ex03db02.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db02.client.com ex03db01.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db01.client.com Copying over /root/config/client-ex03.xml to all nodes under /etc/exadata/config Successfully completed execution of step Apply Security Fixes [elapsed Time [Elapsed = 436720 mS [7.0 minutes] Wed Jun 10 13:54:43 CDT 2020]] Disabling Exadata AIDE on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com] Executing Install Autonomous Health Framework Copying over AHF to all nodes in the Cluster..[ex03db01, ex03db02, ex03db03] Configuring Autonomous Health Framework(AHF) on all computes nodes.. AHF has been installed on all compute nodes at: /opt/oracle.ahf . EXAchk can be run by invoking ./exachk Generating an EXAchk report... EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_13567.zip Generating the EXAchk Infrastructure Report... EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_141143_infrastructure.zip Successfully completed execution of step Install Autonomous Health Framework [elapsed Time [Elapsed = 2234216 mS [37.0 minutes] Wed Jun 10 14:32:04 CDT 2020]] [[email protected] linux-x64]#
Here is the execution of step 17:
[[email protected] linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 17 Initializing Executing Create Installation Summary Getting system details... Generating Installation Summary report: /u01/onecommand2/linux-x64/ExadataConfigurations/client-Development-InstallationReport.xml... Creating Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html... Created Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html All deployment reports are stored in /u01/onecommand2/linux-x64/ExadataConfigurations/client-AK00625423-deploymentfiles.zip Generating Platinum CSV file and copying it over to /opt/oracle.SupportTools on all compute nodes Writing platinum file : /u01/onecommand2/linux-x64/WorkDir/client_null-platinum.csv Successfully completed execution of step Create Installation Summary [elapsed Time [Elapsed = 53311 mS [0.0 minutes] Wed Jun 10 14:36:07 CDT 2020]]
Just a rac-status.sh run to check how the cluster was setup (learn more about rac-status.sh here):
[[email protected] ~]# ./pythian/rac-status.sh -a Cluster ex03-clu1 is a X8M-2 Elastic Rack HC 14TB Type | Name | db01 | db02 | db03 | ------------------------------------------------------------------ asm | asm | Online | Online | Online | asmnetwork | asmnet1 | Online | Online | Online | chad | chad | Online | Online | Online | cvu | cvu | Online | - | - | dg | DATA | Online | Online | Online | dg | RECO | Online | Online | Online | dg | SPARSE | Online | Online | Online | network | net1 | Online | Online | Online | ons | ons | Online | Online | Online | proxy_advm | proxy_advm | Offline x| Offline x| Offline x| qosmserver | qosmserver | Online | - | - | vip | db01 | Online | - | - | vip | db02 | - | Online | - | vip | db03 | - | - | Online | vip | scan1 | - | Online | - | vip | scan2 | - | - | Online | vip | scan3 | Online | - | - | ------------------------------------------------------------------ x : Resource is disabled : Has been restarted less than 24 hours ago Listener | Port | db01 | db02 | db03 | Type | --------------------------------------------------------------------------------- ASMNET1LSNR_ASM| TCP:1525 | Online | Online | Online | Listener | LISTENER | TCP:1521 | Online | Online | Online | Listener | LISTENER_SCAN1 | TCP:1864 | - | Online | - | SCAN | LISTENER_SCAN2 | TCP:1864 | - | - | Online | SCAN | LISTENER_SCAN3 | TCP:1864 | Online | - | - | SCAN | --------------------------------------------------------------------------------- : Has been restarted less than 24 hours ago DB | Version | db01 | db02 | db03 | DB Type | --------------------------------------------------------------------------------- test | 19.0.0.0 (1) | Open | Open | Open | RAC (P) | --------------------------------------------------------------------------------- ORACLE_HOME references listed in the Version column 1 : /u01/app/oracle/product/19.0.0.0/dbhome_1 oracle oinstall : Has been restarted less than 24 hours ago [[email protected] ~]# ps -ef|grep pmon root 362094 50259 0 14:40 pts/1 00:00:00 grep --color=auto pmon oracle 364290 1 0 13:52 ? 00:00:00 asm_pmon_+ASM1 oracle 367756 1 0 13:53 ? 00:00:00 ora_pmon_test1 [[email protected] ~]#
That’s it. The deployment is finished. Now you just need to patch the compute nodes, storage servers, RoCE switches, GI, and DBs to whatever version you would like to go up to.
You might be thinking “what about step 18”. Well, step 18 “Resecure the machine” means you will harden the servers by dropping SSH keys, enhancing password complexity, expire current passwords, and implement password expiration time, etc. Sometimes those changes make the administration a bit harder and also you might want to implement your own security policies. So we normally skip this step, but again, it is up to you.
See you next time, sincerely,
Franky Faust
1 Comment. Leave new
Hi Franky
I have been following your post in setting up a x5-2 exadata. I keep getting a no output with command uname -a when I try to applyelasticconfig. Is this something you have seen before?