How do I get out of this situation safely?
Details are as follow:
A xen server has got block devices allocated to VMs. But these devices have also been mounted inside Xen.
In fact 44 of these block devices have been mounted like this. To make matters worse, each physical device is seen over 4 paths and each of those are mounted on a separate mountpoint. In other words the devices are actually mounted 5 times each.
The VM guest OS sees the path via a PowerPath pseudo device (allocated as a phy: block device to the domU)
Some of the devices are formatted as ext2 and reiserfs.
No need to explain to me the file system corruption risks involved here.
I am afraid that even just unmounting the file systems may cause corruption, and feel that at this point pulling the power from the host, is the safest option.
Note that the applications, Oracle databases for the most part, in all the VMs are still running and in use.
I discovered this when investigating high CPU usage on the dom0. There is an unkillable "find" process, with cwd -> /media/disk-12 which is mounted from /dev/sdf1, which belongs to /dev/emcpowerr
Before anybody asks, the one time I've seen processes cannot be killed and continue to use CPU and RAM (unlike a defunct/zombie process), is when there is outstanding commited I/Os, eg sync returned but not physically on disk yet. More commonly this occurs on tape I/O.
P.S. I would have expected devices to be "reserved" once mounted, to prevent this kind of thing? Or is that not possible on Linux?
EDIT: Firstly I am convinced that KDE within the hypervisor) is the culprit. It looks like KDE is mounting the devices it can on logging to create desktop icons. The same thing is however not happening on other Xen servers, but all the other servers are running a much older version of SLES and KDE ... V4 appears to be the offending one, with 3.4 behaving better).
Furthermore two non-critical VMs have become hung. After shutting them down they would not boot up again due to file system corruption. The main/production VM is still running and the database on it still working, but clearly this is a time bomb. The customer is attempting to re-build the environment on another VM on another server but is stuck on issues configuring some of the components, so we are waiting...
In any case I feel that none of the answers have so far been more than "best practice is always shut down gracefully" And I hope to get something more concrete... In any case, I feel that this situation may warrant some more careful thinking. Will shutting down cause outstanding IO, in particular file system meta data updates from the hypervisor, to be synced and cause potentially major file system corruption?
If the disks are being written from a single mount point no harm is being done. Do a clean shutdown, (back it up from suspended state if you will) fix the mounts. Do not run anything but the bare needed apps on the Dom0. If, OTOH, partitions are being written from multiple paths, that's BAD and getting worse by the second. Pull the plug.
I have no concrete reason but my gut-feeling tells me that the following may be the best approach:
- Shut down applications.
- Copy all data from the VM via the network to a backup location.
- Un-mount the file systems from within the VM.
- Shut down the VM. (There is only one VM running on this host now).
- Ensure no domUs are set to start automatically.
- Pull out the power on the host to prevent the hypervisor from performing any "closing" actions, sync of outstanding I/O, etc.
- Boot up the VM, hoping that the hypervisor itself survived the power-yank.
- If it fails, re-build the environment. (The VMs boot disks are file based, but data mount points reside on external disk allocated as block devices)
- Check if the hypervisor is mounting any file systems belonging to the domUs. Un-mount these before any domUs are started)
- Turn KDE auto-mounting off.
- Start-up the VM and force a full FS check.
Alternative to 11: Start-up the VM and mount the file systems without a full fsck.
The reasoning is that I do not want the Xen hypervisor to have any more chance that absolutely necessary to cause corruption on the domU file systems.
I am no Xen expert and had no experience with it yet. But my approach if I was in your place would be: first I know I might lose data (maybe even all); second I would try to create snapshots and then suspending the VMs, restoring them in safe different environment.
I do not want to give you false hopes, but I think you will be lucky if you can recover anything.
Warning: following these advices could make you lose all data. This is up to you to see if it is worth the risk or not.
With a lot of luck, your applications are still working because the data they are using is all in volatile memory. You should try to get advantage of this situation (try to evaluate if that could be the case on a per apps basis) and export the live data to a network share if the applications offer such a feature. If any data is on disk, this export function could either be "locked" much like your
find statement or crash (and crash the application or OS) because of the changed/corrupted disk data.
Then you could try to do a live snapshot, the instructions in the following article: Creating snapshots in Xen. I would go for the byte-by-byte snapshot, although it could get stuck much like your
find command... However, I would not give this much hope.
Before doing the previous command, you ought to read this document from Citrix which helps understanding snapshots in Xen (PDF).
I wish you good luck.