The Brand New Exadata X8M Deployment Process Revealed

Posted in: Oracle, Technical Track

Here we will see how the deployment process of the new Exadata X8M works.

 

RoCE issues from the factory

Exadata X8M servers are coming from the factory with the RoCE private network disabled. In case the Field Engineer assigned to work on the physical setup of the Exadata did not enable the RoCE network it is your job to do so.

RoCE network must be enabled on all Compute Nodes and also on all Storage Servers.

In Exadata X8M the private network is not on InfiniBand switches anymore, but on RoCE (RDMA over Converged Ethernet) Fabric switches. The interface cards we see in the operating system are re0 and re1.

When checking the active interface cards we cannot see re0 and re1:

[root@ex03db01 ~]# ifconfig
bondeth0: flags=5187<up,broadcast,running,master,multicast>  mtu 1500
        inet 10.201.80.54  netmask 255.255.254.0  broadcast 10.201.81.255
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 54309  bytes 3744342 (3.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14088  bytes 1318384 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        inet 10.201.84.190  netmask 255.255.254.0  broadcast 10.201.85.255
        ether 00:10:e0:ee:c5:6c  txqueuelen 1000  (Ethernet)
        RX packets 279171  bytes 18019054 (17.1 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 9553  bytes 1693920 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x9ca00000-9cafffff
 
eth3: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 31847  bytes 2396622 (2.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14088  bytes 1318384 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth4: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 22492  bytes 1349520 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 104 (104.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
lo: flags=73<up,loopback,running>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 136405  bytes 6139347 (5.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 136405  bytes 6139347 (5.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Most of the InfiniBand related commands/tools do not work anymore, but ibstat still does, so we can use that tool to check the state of the private network state:

[root@ex03db01 ~]# ibstat | grep -i 'state\|rate'
                State: Down
                Physical state: Disabled
                Rate: 100
                State: Down
                Physical state: Disabled
                Rate: 100

Checking the config of RoCE interface cards:

[root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re0
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=re0
BOOTPROTO=none
ONBOOT=no
HOTPLUG=no
IPV6INIT=no
 
[root@ex03db01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-re1
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=re1
BOOTPROTO=none
ONBOOT=no
HOTPLUG=no
IPV6INIT=no

Bringing RoCE interface cards up:

[root@ex03db01 ~]# ifup re0
/sbin/ifup-local: /sbin/ifup-local re0:
/sbin/ifup-local:  + RoCE configuration...
/sbin/ifup-local:  + Matched (wildcard) interface re0.
/sbin/ifup-local:  + RoCE Configuration: /bin/roce_config -i re0...
 
NETDEV=re0; IBDEV=mlx5_0; PORT=1
 + RoCE v2 is set as default rdma_cm preference
 + Tos mapping is set
 + Default roce tos is set to 32
 + Trust mode is set to dscp
 + PFC is configured as 0,1,1,1,1,1,0,0
 + Congestion control algo/mask are set as expected
 + Buffers are configured as 32768,229120,0,0,0,0,0,0
 
Finished configuring "re0" ã½(â¢â¿â¢)ã
 
/sbin/ifup-local:  + Non-RoCE Configuration...
/sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re0.
 
 
[root@ex03db01 ~]# ifup re1
/sbin/ifup-local: /sbin/ifup-local re1:
/sbin/ifup-local:  + RoCE configuration...
/sbin/ifup-local:  + Matched (wildcard) interface re1.
/sbin/ifup-local:  + RoCE Configuration: /bin/roce_config -i re1...
 
NETDEV=re1; IBDEV=mlx5_0; PORT=2
 + RoCE v2 is set as default rdma_cm preference
 + Tos mapping is set
 + Default roce tos is set to 32
 + Trust mode is set to dscp
 + PFC is configured as 0,1,1,1,1,1,0,0
 + Congestion control algo/mask are set as expected
 + Buffers are configured as 32768,229120,0,0,0,0,0,0
 
Finished configuring "re1" ã½(â¢â¿â¢)ã
 
/sbin/ifup-local:  + Non-RoCE Configuration...
/sbin/ifup-local: Non-RoCE Configuration: Nothing to do for re1.

Now we can see that the interfaces re0 and re1 are up, but with no IPs assigned:

[root@ex03db01 ~]# ifconfig
bondeth0: flags=5187<up,broadcast,running,master,multicast>  mtu 1500
        inet 10.201.80.54  netmask 255.255.254.0  broadcast 10.201.81.255
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 54533  bytes 3767354 (3.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14414  bytes 1349944 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        inet 10.201.84.190  netmask 255.255.254.0  broadcast 10.201.85.255
        ether 00:10:e0:ee:c5:6c  txqueuelen 1000  (Ethernet)
        RX packets 279584  bytes 18051211 (17.2 MiB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 9727  bytes 1720009 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x9ca00000-9cafffff
 
eth3: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 32071  bytes 2419634 (2.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14414  bytes 1349944 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
eth4: flags=6211<up,broadcast,running,slave,multicast>  mtu 1500
        ether bc:97:e1:68:b2:10  txqueuelen 1000  (Ethernet)
        RX packets 22492  bytes 1349520 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 104 (104.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
lo: flags=73<up,loopback,running>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 136804  bytes 6157123 (5.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 136804  bytes 6157123 (5.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
re0: flags=4163<up,broadcast,running,multicast>  mtu 1500
        ether 0c:42:a1:3b:45:12  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
re1: flags=4163<up,broadcast,running,multicast>  mtu 1500
        ether 0c:42:a1:3b:45:13  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

We can use ibstat again to confirm the interfaces are enabled:

[root@ex03db01 ~]# ibstat | grep -i 'state\|rate'
                State: Active
                Physical state: LinkUp
                Rate: 100
                State: Active
                Physical state: LinkUp
                Rate: 100

OEDA specifics

To start any Exadata deployment you need the OEDA configuration files. They are a set of files generated by the OEDA (Oracle Exadata Deployment Assistant) tool. OEDA tool is currently a web-based tool that will allow the client to fill up all the IP addresses and hostnames that the new Exadata will be assigned. Normally this step is taken by the client with the support of their network team.

Configuration files needed:

  • Clientname-clustername.xml
  • Clientname-clustername-InstallationTemplate.html
  • Clientname-clustername-preconf.csv

The OEDA tool for Linux is also needed and can be downloaded from the Patch ID 30640393. It is recommended to go with the latest version available, but if the configuration files were generated with a different/older version go with that version to avoid warnings during the execution of the onecommand.

Stage the OEDA for Linux in /u01/onecommand/ and unzip it:

[root@ex03db01 ~]# mkdir -p /u01/onecommand/
[root@ex03db01 ~]# unzip -q p30640393_193800_Linux-x86-64.zip -d /u01/onecommand/
[root@ex03db01 ~]# cd /u01/onecommand/linux-x64

Once in the correct directory run onecommand to list the steps just to make sure it is working:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -l
 Initializing
 
1. Validate Configuration File
2. Setup Required Files
3. Create Users
4. Setup Cell Connectivity
5. Verify Infiniband
6. Calibrate Cells
7. Create Cell Disks
8. Create Grid Disks
9. Install Cluster Software
10. Initialize Cluster Software
11. Install Database Software
12. Relink Database with RDS
13. Create ASM Diskgroups
14. Create Databases
15. Apply Security Fixes
16. Install Autonomous Health Framework
17. Create Installation Summary
18. Resecure Machine

applyElasticConfig.sh preparation and execution

Technical background

applyElasticConfig.sh is a script, provided by Oracle within the OEDA, which performs the initial setup of the compute nodes and storage servers. That script works with the factory IP range and hostnames by default, but we found a way to trick it and make it work even when the client had already changed the IP addresses and hostnames. The initial setup is basically defining the network configuration, IP addresses, hostnames, DNS and NTP configuration and the script will look for nodes in the IP range of the 172.x.x.x network, so if the client had already changed the IPs and hostnames the script will not find anything. It is worth to mention that there is no documentation about this anywhere in the docs.oracle.com. You can find something here:

  • Configuring Oracle Exadata Database Machine
  • ApplyElasticConfig failed during the execution of elasticConfig.sh (Doc ID 2175587.1)
  • Bug 23064772 OEDA: applyelasticconfig.sh fails with error unable to locate rack item with ulocation

Even though these documents briefly mention the applyElasticConfig.sh script they do not mention how to overcome the issue when the IPs and hostnames were already changed.

Preparation

In order to make the script look for the servers when their hostnames and IPs were changed, you have to edit the es.properties file which is located under /u01/onecommand/linux-x64/properties. Consider changing only the parameters related to the IPs, Subnets, and Hostnames. The variables we care about are: ROCEELASTICNODEIPRANGE, ROCEELASTICILOMIPRANGE, ELASTICSUBNETS and SKIPHOSTNAMECHECK. Change those to the range of IPs found in the Clientname-clustername-InstallationTemplate.html for each network:

  • ROCEELASTICNODEIPRANGE expects the range of IPs in the management network.
  • ROCEELASTICILOMIPRANGE expects the range of IPs of the ILOM of the servers.
  • ELASTICSUBNETS expects the subnet of the management network.
  • SKIPHOSTNAMECHECK defaults to false, so if the hostnames were also changed you want to set this to true.

Find some examples below:

[root@ex03db01 linux-x64]# cat properties/es.properties|grep ELASTIC
#ROCEELASTICNODEIPRANGE=192.168.1.1:192.168.1.99
ROCEELASTICNODEIPRANGE=10.201.84.190:10.201.84.206
ROCEELASTICILOMIPRANGE=10.201.84.196:10.201.84.201
ELASTICCONFIGMARKERFILE=/.elasticConfig
ELASTICRACKNAMES=x5,x6,sl6,x7,x8
QINQELASTICCONFIGMINVERION=20.1.0.0.0.200323
#ELASTICSUBNETS=172.16.2:172.16.3:172.16.4:172.16.5:172.16.6:172.16.7
ELASTICSUBNETS=10.201.84
 
[root@ex03db01 linux-x64]# grep SKIPHOST properties/es.properties
#SKIPHOSTNAMECHECK=false
SKIPHOSTNAMECHECK=true

Execution

Now that you have the es.properties ELASTIC* parameters matching your infrastructure configuration you are ready to execute the applyElasticConfig.sh script. To execute it you just need to call the script passing the Clientname-clustername.xml configuration file to it:

[root@ex03db01 linux-x64]# ./applyElasticConfig.sh -cf /root/config/Client-ex03.xml
 Applying Elastic Config...
 Discovering pingable nodes in IP Range of 10.201.84.190 - 10.201.84.206.....
 Found 6 pingable hosts..[10.201.84.193, 10.201.84.194, 10.201.84.195, 10.201.84.190, 10.201.84.191, 10.201.84.192]
 Validating Hostnames..
 Discovering ILOM IP Addresses..
 Getting uLocations...
 Getting Mac Addressess..
 Getting uLocations...
 Mapping Machines with local hostnames..
 Mapping Machines with uLocations..
 Checking if Marker file exists..
 Updating machines with Mac Address for 6 valid machines.
 Creating preconf..
 Writing host-specific preconf files..
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel02_preconf.csv for ex03cel02 ....
 Preconf file copied to ex03cel02 as /var/log/exadatatmp/firstconf/ex03cel02_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db01_preconf.csv for ex03db01 ....
 Preconf file copied to ex03db01 as /var/log/exadatatmp/firstconf/ex03db01_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db03_preconf.csv for ex03db03 ....
 Preconf file copied to ex03db03 as /var/log/exadatatmp/firstconf/ex03db03_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel03_preconf.csv for ex03cel03 ....
 Preconf file copied to ex03cel03 as /var/log/exadatatmp/firstconf/ex03cel03_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03cel01_preconf.csv for ex03cel01 ....
 Preconf file copied to ex03cel01 as /var/log/exadatatmp/firstconf/ex03cel01_preconf.csv
 Writing host specific file /u01/onecommand2/linux-x64/WorkDir/ex03db02_preconf.csv for ex03db02 ....
 Preconf file copied to ex03db02 as /var/log/exadatatmp/firstconf/ex03db02_preconf.csv
 Running Elastic Configuration on ex03cel02.client.com
 Running Elastic Configuration on ex03db01.client.com
 Running Elastic Configuration on ex03db03.client.com
 Running Elastic Configuration on ex03cel03.client.com
 Running Elastic Configuration on ex03cel01.client.com
 Running Elastic Configuration on ex03db02.client.com
 /////

OEDA onecommand preparation and execution

Technical background

OEDA is a set of scripts, files, and a form we use to plan and deploy an Exadata. Sometimes we refer to it as the onecommand utility. It is called onecommand because with just one command we can deploy everything. This onecommand is the install.sh script.

Preparation

To be able to run the install.sh script we have to prepare some things first in the environment. Some prerequisites:

  • The switches must have been already set up by the Field Engineer responsible for the physical installation of the hardware.
  • The applyElasticConfig.sh script must have been run and completed successfully.
  • The files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html must be staged to /u01/onecommand/linux-x64/WorkDir.

Stage the files listed in the “Appendix B” of the Clientname-clustername-InstallationTemplate.html to /u01/onecommand/linux-x64/WorkDir:

[root@ex03db01 ~]# ls -lh /u01/onecommand/linux-x64/WorkDir
total X.9G
-rwxr-xr-x 1 root root 355M Jun  9 12:34 ahf_setup
-rw-r--r-- 1 root root 2.9G Jun  9 12:54 V982063-01.zip
-rw-r--r-- 1 root root 2.7G Jun  9 12:57 V982068-01.zip
-rw-r--r-- 1 root root 2.4G Jun  9 12:57 p30805684_190000_Linux-x86-64.zip
-rw-r--r-- 1 root root 600M Jun  9 12:57 p6880880_180000_Linux-x86-64.zip
-rw-r--r-- 1 root root 1.3G Jun  9 12:57 p30899722_190000_Linux-x86-64.zip

After all of this is done you can run the step 1 to validate the configuration files with the environment:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 1
 Initializing
 Executing Validate Configuration File
 Validating cluster: ex03-clu1
  Locating machines...
 Validating platinum...
 Checking Disk Tests Status....
 Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version.
 Validating nodes for database readiness...
 Completed validation...
 
 SUCCESS: Ip address: 10.201.84.190 is configured correctly
 SUCCESS: Ip address: 10.201.80.54 is configured correctly
 SUCCESS: Ip address: 10.201.84.191 is configured correctly
 SUCCESS: Ip address: 10.201.80.55 is configured correctly
 SUCCESS: Ip address: 10.201.84.192 is configured correctly
 SUCCESS: Ip address: 10.201.80.56 is configured correctly
 SUCCESS: Ip address: 10.201.80.60 is configured correctly
 SUCCESS: Ip address: 10.201.80.62 is configured correctly
 SUCCESS: Ip address: 10.201.80.61 is configured correctly
 SUCCESS: Ip address: 10.201.80.58 is configured correctly
 SUCCESS: Ip address: 10.201.80.59 is configured correctly
 SUCCESS: Ip address: 10.201.80.57 is configured correctly
 SUCCESS: Validated NTP server 10.248.1.1
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip exists...
 SUCCESS: Required file /u01/onecommand/linux-x64/WorkDir/ahf_setup exists...
 SUCCESS: Disks Tests are not running/active on any of the Storage Servers or not applicable for this Image Version.
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db01
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db02
 SUCCESS: Required Kernel Version 4.14.35.1902.9.2 for Oracle19c found on ex03db03
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db01
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db02
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with UEK5 on  ex03db03
 SUCCESS: Cluster Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1
 SUCCESS: DatabaseHome Version 19.7.0.0.200414 is compatible with image version 19.3.6.0.0 on Cluster ex03-clu1
 SUCCESS: Disk size 14000GB on cell ex03cel01.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Disk size 14000GB on cell ex03cel02.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Disk size 14000GB on cell ex03cel03.client.com matches the value specified in the OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel01.client.com matches the value specified in OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel02.client.com matches the value specified in OEDA configuration file
 SUCCESS: Number of physical disks on ex03cel03.client.com matches the value specified in OEDA configuration file
 Successfully completed execution of step Validate Configuration File [elapsed Time [Elapsed = 85395 mS [1.0 minutes] Tue Jun 09 22:51:44 PDT 2020]]

If it finishes successfully you are good to move forward.

Execution

Now we just need to execute the remaining steps. You can execute one-by-one or all in a row. I normally do the step 1 and step 2 separate from the others just because they tend to fail easier than others. Running all of them in a row would not cause any harm since once any step fails the execution will immediately stop. So it is up to you how you would like to execute it.

In case you need to undo any of the steps you can use the -u and the step you would like to undo. You can use the install.sh -h to help you on that:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -h
 Warning: Invalid input(s) for {-h=null}
 **********************************
 
  install.sh -cf <config.xml> -l [options]
  install.sh -cf <config.xml> -s <step #=''> | -r <num-num>
  install.sh
  ARGUMENTS:
   -l                 List all the steps that exist
   -cf                Use to specify the full path for the config file
   -s <step #=''>        Run only the specified step
   -r <num-num>       Run the steps one after the other as long as no errors
                      are encountered
   -u <num-num> | <step#> Undo a range of steps or a particular step
                      For a range of steps, specify the steps in reverse order
   -h                 Print usage information
   -override          Force to run undo steps related to celldisk and grid disk
   -force             Delete binaries under grid home and database home when
                      uninstalling clusterware and database software
   -delete            Delete staging area/directories
   -nocalibratecell   Create the installation summary file without running the calibrate cell command
   -noinfinicheck     Create the installation summary file without running InfiniBand verification
   -p                 Prompts for root password for each or all the nodes. This option allows
                      deployments in Exadata environments with non-default and/or different
                       root passwords on each of the nodes in the rack
   -usesu             Use SU with root account to run commands for grid/oracle users
   -sshkeys           Run deployment with root SSH Keys that are setup by setuprootssh.sh or oedacli. Must be used with "-usesu"
   -customstep        Run custom actions. Actions can be:
                           updatecellroute:  generate cellroute.ora in domUs
   -clustername       Specify the cluster name, or All. Only used with -customstep to specify
                       the cluster on which to run the custom action
   -upgradeNetworkFirmware  X7 Broadcom network card Firmware upgrade
  Version : 200519

To undo a step simply execute this one to undo step 2:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2

Or to undo from step 2 to step 4:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/Client-ex03.xml -u 2-4

Here is the execution of step 2:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 2
 Initializing
 Executing Setup Required Files
 Copying and extracting required files...
 Required files are:
 /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip
 /u01/onecommand/linux-x64/WorkDir/V982068-01.zip
 /u01/onecommand/linux-x64/WorkDir/V982063-01.zip
 Copying required files...
 Checking status of remote files...
 Checking status of existing files on remote nodes...
 Getting status of local files...
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982063-01.zip at /u01/app/oracle/Oeda/Software/V982063-01.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/V982068-01.zip at /u01/app/oracle/Oeda/Software/V982068-01.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30805684_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30805684_190000_Linux-x86-64.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p30899722_190000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/p30899722_190000_Linux-x86-64.zip
 Creating symbolic link for file /u01/onecommand/linux-x64/WorkDir/p6880880_180000_Linux-x86-64.zip at /u01/app/oracle/Oeda/Software/Patches/p6880880_180000_Linux-x86-64.zip
 Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db02.client.com
 Copying file: p30805684_190000_Linux-x86-64.zip to node ex03db03.client.com
 Copying file: p30899722_190000_Linux-x86-64.zip to node ex03db03.client.com
 Copying file: p6880880_180000_Linux-x86-64.zip to node ex03db03.client.com
 Completed copying files...
 Extracting required files...
 Copying resourcecontrol and other required files
 No config Keys in the configuration file..
 Creating databasemachine.xml for EM discovery
 Done Creating databasemachine.xml for EM discovery
 Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 325110 mS [5.0 minutes] Wed Jun 10 12:16:46 CDT 2020]]

Here is the execution of steps from 3 to 8:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 3-8
 Initializing
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Users
 Creating users...
 Creating users in cluster ex03-clu1
 Validating existing users and groups...
 Creating required directories on nodes in cluster ex03-clu1
 Updating /etc/hosts on nodes in cluster ex03-clu1
 Setting up ssh for users in cluster ex03-clu1
 Creating cell diag collection user CELLDIAG on cell servers..
 Completed creating all users...
 Successfully completed execution of step Create Users [elapsed Time [Elapsed = 77818 mS [1.0 minutes] Wed Jun 10 12:20:31 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Setup Cell Connectivity
 Creating cellip.ora and cellinit.ora  ...
 Creating cellip.ora for cluster ex03-clu1
 Creating cellinit.ora for cluster ex03-clu1
 Done creating cellip.ora and cellinit.ora...
 Successfully completed execution of step Setup Cell Connectivity [elapsed Time [Elapsed = 14675 mS [0.0 minutes] Wed Jun 10 12:20:52 CDT 2020]]
 Executing Verify Infiniband
 Validating infiniband network with rds-ping...
 Check Admin network connectivity...
 Running infinicheck to verify infiniband fabric for cluster ex03-clu1...
 Running verify topology to verify infiniband network...
 No Infiniband link errors found...
 SUCCESS: Verify topology does not report any errors on node ex03db01.client.com...
 ****************ex03db01*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 SUCCESS: Verify topology does not report any errors on node ex03db02.client.com...
 ****************ex03db02*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 SUCCESS: Verify topology does not report any errors on node ex03db03.client.com...
 ****************ex03db03*****************
 Command: /opt/oracle.SupportTools/ibdiagtools/verify-topology
 Verify topology is not supported on RoCE
 ********************************************
 Successfully completed execution of step Verify Infiniband [elapsed Time [Elapsed = 280227 mS [4.0 minutes] Wed Jun 10 12:25:37 CDT 2020]]
 Executing Calibrate Cells
 Calibrating cells...
 Successfully completed execution of step Calibrate Cells [elapsed Time [Elapsed = 461064 mS [7.0 minutes] Wed Jun 10 12:33:18 CDT 2020]]
 Executing Create Cell Disks
 Validating Self-Signed Certificates on cell servers...
 Fixing Cell Certificates on [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Reconfiguring WLS...
 Cell name attribute does not match hostnames
 Cell ex03cel03 has cell name ru06, cell name attribute will be reset to ex03cel03
 Cell ex03cel01 has cell name ru02, cell name attribute will be reset to ex03cel01
 Cell ex03cel02 has cell name ru04, cell name attribute will be reset to ex03cel02
 Checking physical disks for errors before creating celldisks
 Creating cell disks...
 Dropping Flash Cache before enabling WriteBack on cells [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Enable FlashCache mode to WriteBack in [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com]
 Creating flashcache on cells...
 Successfully completed execution of step Create Cell Disks [elapsed Time [Elapsed = 218067 mS [3.0 minutes] Wed Jun 10 12:36:56 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Grid Disks
 Creating grid disks for cluster ex03-clu1
 Checking Cell Disk status...
 Successfully completed execution of step Create Grid Disks [elapsed Time [Elapsed = 123858 mS [2.0 minutes] Wed Jun 10 12:39:04 CDT 2020]]
[root@ex03db01 linux-x64]#

Here is the execution of steps from 9 to 16:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -r 9-16
 Initializing
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Cluster Software
 Installing cluster ex03-clu1
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running clusterware installer...
 Setting up Opatch for cluster ex03-clu1
 Patching cluster ex03-clu1...
 Successfully completed execution of step Install Cluster Software [elapsed Time [Elapsed = 667497 mS [11.0 minutes] Wed Jun 10 12:51:15 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Initialize Cluster Software
 Initializing cluster ex03-clu1
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running root.sh on node ex03db01.client.com
 Checking file root_ex03db01.client.com_2020-06-10_12-54-03-631071286.log on node ex03db01.client.com
 Running root.sh on node ex03db02.client.com
 Checking file root_ex03db02.client.com_2020-06-10_13-02-42-916817198.log on node ex03db02.client.com
 Running root.sh on node ex03db03.client.com
 Checking file root_ex03db03.client.com_2020-06-10_13-05-42-659221162.log on node ex03db03.client.com
 Generating response file for Configuration Tools...
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Writing grid response file for cluster ex03-clu1
 Running Configuration Assistants on ex03db01.client.com
 Checking status of cluster...
 Cluster Verification completed successfully
 Successfully completed execution of step Initialize Cluster Software [elapsed Time [Elapsed = 1184567 mS [19.0 minutes] Wed Jun 10 13:11:06 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Database Software
 Installing database software ...
 Validating nodes for database readiness...
 Installing database software with database home name DbHome1
 Installing database software ...
 Extracting Database Software file /u01/app/oracle/Oeda/Software/V982063-01.zip into /u01/app/oracle/product/19.0.0.0/dbhome_1
 Running database installer on node ex03db01.client.com ... Please wait...
 After running database installer...
 Patching Database Home /u01/app/oracle/product/19.0.0.0/dbhome_1
 Successfully completed execution of step Install Database Software [elapsed Time [Elapsed = 717961 mS [11.0 minutes] Wed Jun 10 13:23:11 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Relink Database with RDS
 Successfully completed execution of step Relink Database with RDS [elapsed Time [Elapsed = 36009 mS [0.0 minutes] Wed Jun 10 13:23:54 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create ASM Diskgroups
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Getting grid disks using utility in /u01/app/19.0.0.0/grid/bin
 Validating ASM Diskgroups..
 Successfully completed execution of step Create ASM Diskgroups [elapsed Time [Elapsed = 138147 mS [2.0 minutes] Wed Jun 10 13:26:20 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Create Databases
 Setting up Huge Pages for Database..[test]
 Creating database [test]...
 Patch 30805684 requires specific post-installation steps. Databases will be restarted ...
 Running datapatch on database [test]
 Recompiling Invalid Objects (if any) on database [test]
 Successfully completed execution of step Create Databases [elapsed Time [Elapsed = 1252604 mS [20.0 minutes] Wed Jun 10 13:47:19 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Apply Security Fixes
 Setting up Huge Pages for ASM Instance..
 Bouncing clusterware to set required parameters...
 Checking and enabling turbo mode if required...
 ex03db03.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db03.client.com
 ex03db02.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db02.client.com
 ex03db01.client.com Command: /opt/oracle.SupportTools/fix_17898503_Enable_Turbo_Mode.sh produced null output but executed successfully on ex03db01.client.com
 Copying over /root/config/client-ex03.xml to all nodes under /etc/exadata/config
 Successfully completed execution of step Apply Security Fixes [elapsed Time [Elapsed = 436720 mS [7.0 minutes] Wed Jun 10 13:54:43 CDT 2020]]
 Disabling Exadata AIDE on  [ex03cel01.client.com, ex03cel02.client.com, ex03cel03.client.com, ex03db01.client.com, ex03db02.client.com, ex03db03.client.com]
 Executing Install Autonomous Health Framework
 Copying over AHF to all nodes in the Cluster..[ex03db01, ex03db02, ex03db03]
 Configuring Autonomous Health Framework(AHF) on all computes nodes..
 AHF has been installed on all compute nodes at: /opt/oracle.ahf . EXAchk can be run by invoking ./exachk
 Generating an EXAchk report...
 EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_13567.zip
 Generating the EXAchk Infrastructure Report...
 EXAchk zip file in ex03db01:/u01/app/oracle.ahf/data/ex03db01/exachk/exachk_ex03db01_test_061020_141143_infrastructure.zip
 Successfully completed execution of step Install Autonomous Health Framework [elapsed Time [Elapsed = 2234216 mS [37.0 minutes] Wed Jun 10 14:32:04 CDT 2020]]
[root@ex03db01 linux-x64]#

Here is the execution of step 17:

[root@ex03db01 linux-x64]# ./install.sh -cf /root/config/client-ex03.xml -s 17
 Initializing
 Executing Create Installation Summary
 Getting system details...
 Generating Installation Summary report: /u01/onecommand2/linux-x64/ExadataConfigurations/client-Development-InstallationReport.xml...
 Creating Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html...
 Created Installation template /u01/onecommand2/linux-x64/ExadataConfigurations/client-InstallationTemplate.html
 All deployment reports are stored in /u01/onecommand2/linux-x64/ExadataConfigurations/client-AK00625423-deploymentfiles.zip
 Generating Platinum CSV file and copying it over to /opt/oracle.SupportTools on all compute nodes
 Writing platinum file  : /u01/onecommand2/linux-x64/WorkDir/client_null-platinum.csv
 Successfully completed execution of step Create Installation Summary [elapsed Time [Elapsed = 53311 mS [0.0 minutes] Wed Jun 10 14:36:07 CDT 2020]]

Just a rac-status.sh run to check how the cluster was setup (learn more about rac-status.sh here):

[root@ex03db01 ~]# ./pythian/rac-status.sh -a
 
                Cluster ex03-clu1 is a X8M-2 Elastic Rack HC 14TB
 
        Type      |      Name      |   db01   |   db02   |   db03   |
  ------------------------------------------------------------------
   asm            | asm            |  Online  |  Online  |  Online  |
   asmnetwork     | asmnet1        |  Online  |  Online  |  Online  |
   chad           | chad           |  Online  |  Online  |  Online  |
   cvu            | cvu            |  Online  |     -    |     -    |
   dg             | DATA           |  Online  |  Online  |  Online  |
   dg             | RECO           |  Online  |  Online  |  Online  |
   dg             | SPARSE         |  Online  |  Online  |  Online  |
   network        | net1           |  Online  |  Online  |  Online  |
   ons            | ons            |  Online  |  Online  |  Online  |
   proxy_advm     | proxy_advm     | Offline x| Offline x| Offline x|
   qosmserver     | qosmserver     |  Online  |     -    |     -    |
   vip            | db01           |  Online  |     -    |     -    |
   vip            | db02           |     -    |  Online  |     -    |
   vip            | db03           |     -    |     -    |  Online  |
   vip            | scan1          |     -    |  Online  |     -    |
   vip            | scan2          |     -    |     -    |  Online  |
   vip            | scan3          |  Online  |     -    |     -    |
  ------------------------------------------------------------------
    x  : Resource is disabled
       : Has been restarted less than 24 hours ago
 
      Listener    |      Port      |   db01   |   db02   |   db03   |     Type     |
  ---------------------------------------------------------------------------------
   ASMNET1LSNR_ASM| TCP:1525       |  Online  |  Online  |  Online  |   Listener   |
   LISTENER       | TCP:1521       |  Online  |  Online  |  Online  |   Listener   |
   LISTENER_SCAN1 | TCP:1864       |     -    |  Online  |     -    |     SCAN     |
   LISTENER_SCAN2 | TCP:1864       |     -    |     -    |  Online  |     SCAN     |
   LISTENER_SCAN3 | TCP:1864       |  Online  |     -    |     -    |     SCAN     |
  ---------------------------------------------------------------------------------
       : Has been restarted less than 24 hours ago
 
         DB       |     Version    |   db01   |   db02   |   db03   |    DB Type   |
  ---------------------------------------------------------------------------------
   test           | 19.0.0.0   (1) |   Open   |   Open   |   Open   |    RAC (P)   |
  ---------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column
 
         1 : /u01/app/oracle/product/19.0.0.0/dbhome_1  oracle oinstall
 
       : Has been restarted less than 24 hours ago
 
 
[root@ex03db01 ~]# ps -ef|grep pmon
root     362094  50259  0 14:40 pts/1    00:00:00 grep --color=auto pmon
oracle   364290      1  0 13:52 ?        00:00:00 asm_pmon_+ASM1
oracle   367756      1  0 13:53 ?        00:00:00 ora_pmon_test1
[root@ex03db01 ~]#

That’s it. The deployment is finished. Now you just need to patch the compute nodes, storage servers, RoCE switches, GI, and DBs to whatever version you would like to go up to.

You might be thinking “what about step 18”. Well, step 18 “Resecure the machine” means you will harden the servers by dropping SSH keys, enhancing password complexity, expire current passwords, and implement password expiration time, etc. Sometimes those changes make the administration a bit harder and also you might want to implement your own security policies. So we normally skip this step, but again, it is up to you.

See you next time, sincerely,

Franky Faust

 

email

Interested in working with Franky? Schedule a tech call.

About the Author

Senior Oracle Database Consultant
Franky works for Pythian as Senior Oracle Database Consultant. He has extensive knowledge in Oracle Exadata and High Availability technologies and in other databases like MySQL, Cassandra and SQL Server. He is always improving his skills focusing on researching Oracle performance and HA. Franky has been involved in some major implementations of multinode RAC in AIX, Linux and Exadata and multisite DataGuard environments. The guy is OCP 12c, OCE SQL, OCA 11g, OCS Linux 6 and RAC 12c and was nominated Oracle ACE in 2017. He is well known in the Brazilian community for his blog https://loredata.com.br/blog and for all the contribution he provides to the Oracle Brazilian community. Franky is also a frequent writer for OTN and speaker at some Oracle and database conferences around the world. Feel free to contact him in social media.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *