Orphaned disks in OVM and what to do with them

Posted in: Oracle, Technical Track

Some time ago I was doing a maintenance on an OVM and noticed that it had significant number of disks without mapping to any virtual machine (I need to mention that the OVM cluster was a home for more than 400 VMs). Having about 1800 virtual disks it was easy to miss some lost disks without any mapping to VMs. Some of them were created on purpose and were possibly forgotten but the most looked like leftovers from an automatic deployment. I attached several of the disks to a test VM and checked the contents:

[[email protected] ~]# fdisk -l /dev/xvdd

Disk /dev/xvdd: 3117 MB, 3117416448 bytes, 6088704 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[[email protected] ~]# dd if=/dev/xvdd bs=512 count=100 | strings 
100+0 records in
100+0 records out
51200 bytes (51 kB) copied, 0.00220462 s, 23.2 MB/s
[[email protected] ~]#

And checked other attributes for the disks from OVM cli:

OVM> show VirtualDisk id=0004fb0000120000d5e0235900f63355.img
Command: show VirtualDisk id=0004fb0000120000d5e0235900f63355.img
Status: Success
Time: 2017-05-19 09:11:26,664 PDT
  Absolute Path = nfsserv:/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000d5e0235900f63355.img
  Mounted Path = /OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000d5e0235900f63355.img
  Max (GiB) = 2.9
  Used (GiB) = 0.0
  Shareable = Yes
  Repository Id = 0004fb0000030000998d2e73e5ec136a  [crepo1]
  Id = 0004fb0000120000d5e0235900f63355.img  [6F4dKi9hT0cYW_db_asm_disk_0 (21)]
  Name = 6F4dKi9hT0cYW_db_asm_disk_0 (21)
  Locked = false
  DeprecatedAttrs = [Assembly Virtual Disk]

The disk was completely empty and, according to the name and one of the deprecated attributes, it was clear that the disk was a leftover from a deployed assembly. I remembered one issue in the past when shared disks were not deleted if you were using one of assemblies for Oracle RAC deployed and deleted through Oracle Enterprise Manager Self Service Portal (OEM SS). It was noticed on OVM 3.2.x with OEM 12c. In that case, if you had two or more VMs working with the same shared disks those shared disks were not deleted when all VMs and local disks had been destroyed. The issue has been gone for long time but the lost disks were left behind.

I created a script to find all the disks without a mapping to any existing VM. The script was written using expect language and ssh cli for OVM. To run the script you need connection to OVM manager using ssh to port 10000 and expect language working on your machine. I used one of the oracle sample scripts to build my own.
Here is the script body:


set username [lindex $argv 0];
set password [lindex $argv 1];
set prompt "OVM> "

set timeout 3
log_user 0

spawn ssh -l $username -p 10000
expect_after eof {exit 0}

##interact with SSH
expect "yes/no" {send "yes\r"}
expect "password:" {send "$password\r"}

#################### Execute Command passed in ##################
expect "OVM> "
set timeout 20

match_max 100000

log_user 0
send "list virtualdisk\r"
expect "OVM> "
set resultdata $expect_out(buffer)
set resultlength [string length $resultdata]
set idindex 0
set id ""
set done 0
while {$done != 1} {
     set idindex [string first "id:" $resultdata]
     set nameindex [string first "name:" $resultdata]
        if {$idindex != -1 && $nameindex != -1 && $idindex < $nameindex} { set id [string range $resultdata [expr {$idindex+3}] [expr {$nameindex-3}]] send "show VirtualDisk id='$id'\r" expect "OVM> "
            set getVirtualDiskInfo $expect_out(buffer)
            set getVirtualDiskInfoLength [string length $getVirtualDiskInfo]
            set getVirtualDiskInfoIndex 0
            set getVirtualDiskInfoMapping ""
            set doneProcessingVirtualDisk 0
            while {$doneProcessingVirtualDisk != 1} {
                set getVirtualDiskInfoIndex [string first "VmDiskMapping" $getVirtualDiskInfo]
                     if {$getVirtualDiskInfoIndex != -1} {
                           puts "Disk with mapping: '$id  \r"
                           set doneProcessingVirtualDisk 1
                        } else {
                           puts "Disk without mapping:'$id  \r"
                           set doneProcessingVirtualDisk 1
       set resultdata [string range $resultdata [expr {$nameindex+1}] $resultlength]
       set resultlength [string length $resultdata]
        } else {
                set done 1

log_user 1

expect "OVM> "
send "exit\r"

You can see the script is simple enough and doesn’t require a lot of time to write. I redirected output of the script to a file in order to analyze the output.

[[email protected] ~]$ ./dsk_inventory admin password >dsk_iventory.out
[[email protected] ~]$ wc -l dsk_inventory.out 
1836 dsk_inventory.out
[[email protected] ~]$ grep "Disk without mapping" dsk_inventory.out | wc -l
[[email protected] ~]$ 

As you could see, I had 482 orphaned disks out of 1836. It was more than 25% of all disks and it was not only wasting space but it also had a significant impact to interface performance. Every time when you tried to add, modify or delete a disk through OEM SS it took a long pause to retrieve information about the disks. I decided to delete all those disks using the same script but just added a couple of lines to delete the disk if it doesn’t have a mapping.
Here is modified section of the script:

            while {$doneProcessingVirtualDisk != 1} {
                set getVirtualDiskInfoIndex [string first "VmDiskMapping" $getVirtualDiskInfo]
                     if {$getVirtualDiskInfoIndex != -1} {
                           puts "Disk with mapping:'$id'\r"
                           set doneProcessingVirtualDisk 1
                        } else {
                           puts "Disk without mapping:'$id'\r"
                           send "delete VirtualDisk id='$id'\r"
                           expect "OVM> "
                           set doneProcessingVirtualDisk 1

The changes were minimal and send “delete” command to OVM if a disk doesn’t have any mapping. Of course if you want to exclude certain disks you should add more conditions with “if” using disks ids to prevent them from being deleted.

And it is safe since you are using an approved standard interface and it will not allow you to delete a disk if it has an active mapping to any VM. If you try to delete a disk with an active mapping you are going to get an error:

OVM> delete VirtualDisk id=0004fb0000120000493379bb12928c33.img
Command: delete VirtualDisk id=0004fb0000120000493379bb12928c33.img
Status: Failure
Time: 2017-05-19 09:28:13,046 PDT
JobId: 1495211292856
Error Msg: Job failed on Core: OVMRU_002018E crepo1 - Cannot delete virtual device 6F4dKi9hT0cYW_crs_asm_disk_1 (23), it is still in use by [DLTEST0:vm129-132 ]. [Fri May 19 09:28:12 PDT 2017]

I ran my script, deleted all the non-mapped disks and repeated the inventory script to verify results. I found a couple of disks which were not deleted.

[[email protected] ~]$ ./del_orph_dsk admin Y0u3uck2 > del_dsk_log.out
[[email protected] ~]$ ./dsk_inventory admin Y0u3uck2 >dsk_inventory_after.out
[[email protected] ~]$ wc -l dsk_inventory_after.out
1356 dsk_inventory_after.out
[[email protected] ~]$ grep "Disk without mapping" dsk_inventory_after.out | wc -l
[[email protected] ~]$ grep "Disk without mapping" dsk_inventory_after.out 
Disk without mapping:0004fb0000120000a2d31cc7ef0c2d86.img 
Disk without mapping:0004fb0000120000da746f417f5a0481.img 
[[email protected] ~]$ 

It appeared that the disks didn’t have any existing files on the repository filesystem. It looked like the files were lost some time ago due to a bug or maybe some past issues on the file system.

OVM> show VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Command: show VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Status: Success
Time: 2017-05-19 12:35:13,383 PDT
  Absolute Path = nfsserv:/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img
  Mounted Path = /OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img
  Max (GiB) = 40.0
  Used (GiB) = 22.19
  Shareable = No
  Repository Id = 0004fb0000030000998d2e73e5ec136a  [crepo1]
  Id = 0004fb0000120000a2d31cc7ef0c2d86.img  [ovmcloudomsoh (3)]
  Name = ovmcloudomsoh (3)
  Locked = false
  DeprecatedAttrs = [Assembly Virtual Disk]
OVM> delete VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Command: delete VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Status: Failure
Time: 2017-05-19 12:36:39,479 PDT
JobId: 1495222598733
Error Msg: Job failed on Core: OVMAPI_6000E Internal Error: OVMAPI_5001E Job: 1495222598733/Delete Virtual Disk: ovmcloudomsoh (3) from Repository: crepo1/Delete Virtual Disk: ovmcloudomsoh (3) from Repository: crepo1, failed. Job Failure Event: 1495222599299/Server Async Command Failed/OVMEVT_00C014D_001 Async command failed on server: vms01.dlab.pythian.com. Object: ovmcloudomsoh (3), PID: 27092, 

                                                                                                                                                                  Server error: [Errno 2] No such file or directory: '/OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img'

                                                                           , on server: vms01.dlab.pythian.com, associated with object: 0004fb0000120000a2d31cc7ef0c2d86.img [Fri May 19 12:36:39 PDT 2017] 

So, we had information about disks in the repository database but didn’t have the disks themselves. To make the repository consistent, I created empty files with the same names as the nonexistent virtual disks and deleted them using OVM CLI interface.

[email protected]:~# ll /data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img
/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img: No such file or directory
[email protected]:~# touch /data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img
[email protected]:~# 

OVM> delete VirtualDisk id=0004fb0000120000da746f417f5a0481.img
Command: delete VirtualDisk id=0004fb0000120000da746f417f5a0481.img
Status: Success
Time: 2017-05-23 07:41:43,195 PDT
JobId: 1495550499971

I think it can be worth to check from time to time whether you have any disks without mapping to any VM, especially if your environment has a considerable number of disks and has long story of upgrades, updates and high users activity. And now a couple of words about OVM CLI and using “expect” language for scripting… As you can see, the combination provides good options to program your daily routine maintenance on OVM. It would take ages to find and clear all those disks manually using GUI.

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Regarded by his peers as an Oracle guru, Gleb is known for being able to resolve any problem related to Oracle. He loves the satisfaction of troubleshooting, and his colleagues even say that seeking Gleb’s advice regarding an issue is more efficient than looking it up. Gleb enjoys the variety of challenges he faces while working at Pythian, rather than working on the same thing every day. His areas of speciality include Oracle RAC, Exadata, RMAN, SQL tuning, high availability, storage, performance tuning, and many more. When he’s not working, running, or cycling, Gleb can be found reading.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *