I ran into an very interesting issue today with a client who is using Veeam
Backup and Replication to keep their virtual machines replicated to a remote
ESX server for disaster recovery. Veeam starts a replication job and will
take a snapshot of the virtual machine and then replicate the main VMDK disk
file to the remote site. When the backup job finishes Veeam will tell VMware
to remove the snapshot until the next replication schedule runs. Since we are
replicating our VM’s across a slow WAN connection (600Kbps optimized with
Citrix WANScalers) the replication can often timeout, or hang. Today I
noticed that the replication had not updated since last night. So I needed to
stop the replication and re-start it. Since the Citrix WANScalers can cache
as well as compress, restarting a failed replication job is usually pretty
quick, as most of the data was previously cached on the Citrix boxes. Here are
the details of what I found, and how I fixed it…
To make the snapshot management easier, I store the VM configuration files on a separate LUN, and where you store the VM configuration files is where the snapshot deltas are created. This lets us keep the main VMDK’s LUN’s fairly static, without worry of snapshots filling up our available space. When looking at a specific VM today, I noticed that the data stores listed only showed the Snapshot LUN. This had meant that there was a snapshot taken which had not been removed. This particular VM was not currently replicating, so I knew that snapshot should not have existed. Normal operation should show both the snapshot LUN and the VMDK LUN.
This VM:
Working VM:
When going to the Snapshot Manager, I was not able to see any snapshots on that VM.
I accessed the Datastore Browser to see if there were any delta VMDK’s on the disk; there ended up being 2 delta’s on my datastore.
I wanted to confirm that the virtual machine was indeed running off the delta disk. To check that, I simply went to edit the settings of this virtual machine, and looked at the virtual disk object. In this instance it was accessing the disk “exch-000002-delta.vmdk”, which was one of my delta disks.
There is a command you can run on the service console to try to remove snapshots if you are unable to with the VI Client.
When I ran this command, I received the following error:
Doing some research on the VMware communities website, I found a recommendation to create a new snapshot excluding the VM memory, and then removing the snapshot. When I created a new snapshot on my virtual machine, I saw something very interesting. I saw an additional snapshot called “Consolidate-Helper-0”
At this point I deleted all the snapshots from the VM, and waited for the process to finish. A couple of my snapshots were pretty large, so vCenter timed out before they finished. I waited an hour, and then confirmed they were gone by checking the virtual disk resource in the VM settings.
To make the snapshot management easier, I store the VM configuration files on a separate LUN, and where you store the VM configuration files is where the snapshot deltas are created. This lets us keep the main VMDK’s LUN’s fairly static, without worry of snapshots filling up our available space. When looking at a specific VM today, I noticed that the data stores listed only showed the Snapshot LUN. This had meant that there was a snapshot taken which had not been removed. This particular VM was not currently replicating, so I knew that snapshot should not have existed. Normal operation should show both the snapshot LUN and the VMDK LUN.
This VM:
Working VM:
When going to the Snapshot Manager, I was not able to see any snapshots on that VM.
I accessed the Datastore Browser to see if there were any delta VMDK’s on the disk; there ended up being 2 delta’s on my datastore.
I wanted to confirm that the virtual machine was indeed running off the delta disk. To check that, I simply went to edit the settings of this virtual machine, and looked at the virtual disk object. In this instance it was accessing the disk “exch-000002-delta.vmdk”, which was one of my delta disks.
There is a command you can run on the service console to try to remove snapshots if you are unable to with the VI Client.
/usr/bin/vmware-cmd removesnapshots
When I ran this command, I received the following error:
VMControl error -3: Invalid arguments: Virtual machine has no snapshots
Doing some research on the VMware communities website, I found a recommendation to create a new snapshot excluding the VM memory, and then removing the snapshot. When I created a new snapshot on my virtual machine, I saw something very interesting. I saw an additional snapshot called “Consolidate-Helper-0”
At this point I deleted all the snapshots from the VM, and waited for the process to finish. A couple of my snapshots were pretty large, so vCenter timed out before they finished. I waited an hour, and then confirmed they were gone by checking the virtual disk resource in the VM settings.
No comments:
Post a Comment