Friday, May 18, 2012

XenServer losing space on SR, and Snapshots fail due to chain count at 30




I have had 2 issues lately with my XenServer (6.0.2).  First of all I was using a snapshot technique to make a complete image backup of each VM.  This was working great... until I reached about 30 backups of any particular VM, then the snapshot would fail, with error that the chain count was too high.  Apparently, coalescing is supposed to put the pieces back together and reset the chain count.  Therefore, I gather coalescing was not happening.


As I began to look into this issue more, I found that I had lost a ton of space on my SR also.  I appeared that there were numerous lost or orphaned VDIs that Xen no longer knew about but were taking up space on the SR.  I believe these lost VDIs were also a part of the chain count snapshot problem.


I've learned that the xe sr-scan uuid= may be the key to all of this.


When you run this from the console (I believe it should be run from the host master).  It usually returns to the command prompt fairly quickly (like in a few seconds).  What I've learned is that running this command simply starts the SR scan.  Then you must monitor the SMLog /var/log/SMlog .


If you check it right away, sr-scan may not have finished.  I found that using XenCenter, I could see a lot of activity on the disk (or network) of the host I was running the scan from.  Obviously, this will also be dependent on the amount of activity on your VMs.  Once you see the stats drop on your host, you may want to check the log again.  When it was finished I was seeing the exception logged as below:


<24565> 2012-05-17 16:45:28.148185               ***********************
<24565> 2012-05-17 16:45:28.148237               *  E X C E P T I O N  *
<24565> 2012-05-17 16:45:28.148288               ***********************
<24565> 2012-05-17 16:45:28.148350      gc: EXCEPTION util.SMException, os.unlin
k(/var/run/sr-mount/2fb93e09-a88c-e02e-183b-cfb1b3386f6c/7d1b380a-d812-4270-a673
-358cfefa8ead.vhd) failed
<24565> 2012-05-17 16:45:28.148402        File "/opt/xensource/sm/cleanup.py", l
ine 2515, in gc
    _gc(None, srUuid, dryRun)
  File "/opt/xensource/sm/cleanup.py", line 2417, in _gc
    _gcLoop(sr, dryRun)
  File "/opt/xensource/sm/cleanup.py", line 2387, in _gcLoop
    sr.garbageCollect(dryRun)
  File "/opt/xensource/sm/cleanup.py", line 1438, in garbageCollect
    self.deleteVDIs(vdiList)
  File "/opt/xensource/sm/cleanup.py", line 1862, in deleteVDIs
    SR.deleteVDIs(self, vdiList)
  File "/opt/xensource/sm/cleanup.py", line 1452, in deleteVDIs
    self.deleteVDI(vdi)


Now, if you note the os.unlink line.  (I have seen another post that mentions another error, but the idea is the same; find the vdi is that causing trouble and deal with it)  It has a path that you can change into, with the SR uuid, then the uuid of the vhd or vdi.  So at this point, I knew there was some problem with that vdi.  Then I ran: xe vdi-list


[root@v1 2fb93e09-a88c-e02e-183b-cfb1b3386f6c]# xe vdi-list uuid=7d1b380a-d812-4270-a673-358cfefa8ead

uuid ( RO)                : 7d1b380a-d812-4270-a673-358cfefa8ead
          name-label ( RW): base copy
    name-description ( RW): 
             sr-uuid ( RO): 2fb93e09-a88c-e02e-183b-cfb1b3386f6c
        virtual-size ( RO): 536870912000
            sharable ( RO): false
           read-only ( RO): true

Sometimes this would return nothing.  If this was the case, I would just start the sr-scan again.  If it returned a vdi, as above, then I would check to see if the file existed.  I would look in 

/var/run/sr-mount// (this is the location where valid vdis/vhds should be)

and do an 

ls -l uuid=7d1b380a-d812-4270-a673-358cfefa8ead.vhd

In my case, this file never existed.  So I was pretty certain it was safe to "delete" this vdi, as it didn't really exist anyway.  To do this, I used: xe vdi-forget

[root@v1 2fb93e09-a88c-e02e-183b-cfb1b3386f6c]# xe vdi-forget uuid=7d1b380a-d812-4270-a673-358cfefa8ead

if the vdi had existed when I did the ls, I would have used vdi-destroy.  I probably would have backed up the file before doing a destroy, just in case...

Once I had done the vdi-forget, then I re-ran the sr-scan, and dealt with the next error as it arose...

As I slowly cleaned up more and more errors, sometimes the scan would take 20+ minutes before it gave me an exception.

---

I have been monitoring and doing this for about 24 hours now.  The number of vdis that are marked "base-copy" has been lowered from over 200 to under 80 so far...  The SR is reporting 600Gb less used so far as well (600Gb reclaimed)...  I'm hopeful that once all the errors are cleaned up, sr-scan will run successfully, coalesce the chains (so snapshots work again) and restore some or all of the lost space on the SR...

Also to note, at least so far for me, none of the problem vdis have existed; i guess they were just still listed in the Xen SR DB or something...


--- 


Update 2012-05-20


After continuing the above process for about 3 days, sr-scan finally completed successfully.  I'm happy to report that my vdi chain-counts have been reset, and I have regained much of my "lost" space from my SR.




Monday, March 26, 2012

Exchange Battery Consumption on Android

[Update II - April 2, 2012]

In my case, it is definitely directly related to certain email messages.

I've had the problem on a Nexus S, with various ROMs, and on a Galaxy Nexus, stock.  In my case, it seems to be directly related to certain emails.  That is, when I receive a certain email, it will hang the connection and kill the battery.  In my case it is a status email I receive from a server 7pm every day.  It has some fishy header info, and Outlook reports that it may be a phishing message.  I move this email into a folder that doesn't sync with my phone, and all is good.

I wonder if this is due to an Exchange update, or an update that is in all Android code now...  If it is in the android code, it must be pretty low level, as I've read reports that people using Touchdown are also experiencing the same issue...

[Update - March 30, 2012]
My theories below didn't last.  The issue reared it's ugly head gain.  I have a new theory, though.

Right about the time this started I had set up a cron job on a linux box.  Part of this cron job sent me some emails using the ssmtp app on linux.  I configured ssmtp to rewrite the email address so that it would look like it was coming from my home domain.  When these messages are received in Outlook/Exchange they are tagged as potential Phishing threats.  When I click on the warning in Outlook, there are various options to resolve the issue.  My latest theory is that Android/Exchange are choking on the transfer of these messages that are marked as Phishing threats.  When I mark the message as safe, it seems to transfer normally.  If I leave it tagged as Phishing threat, Android Exchagne client spins and Exchange services sucks the battery...  I'll test this theory for a few days...

---

About 1 month ago, I started noticing heavy battery drain on my phone (Nexus S, Cyanogen mod 9 RC).  I hadn't changed anything in the days leading up to the failure.  This was completely abnormal battery drain - like 30-40% in one hour, the phone getting hot.  Email would no longer sync with my Exchange account.  While in the email client, the syncing icon would just continue to spin.  When I went to battery use, Exchange Services was using approx 50%.

I tried many things to rectify this.  Deleting and recreating the account on the phone.  New/different versions of ICS.  Different versions of GApps.  Disabling/Enabling various combinations of email, contacts, and calendar for syncing.  Changing from push to never and various settings in between.  Nothing seemed to make any difference.

Fortunately for me, I had access to the exchange server (in my case SBS2003).  I hoped if I could get at some detailed logging there, I might be able to figure out what was causing the problem.  In trying to enable and configure more verbose logging, I came across a tool called, Microsoft Exchange Server ActiveSync Web Administration Tool.  It can be found here.  I installed this on my Exchange server, hoping it might shed some insight, or show me some more log info, or something...

Once I had it installed and working (I had to use FF to view the page; it wouldn't work in IE for me), it showed many ActiveSync accounts for my mailbox.  These were maybe 10-15 "accounts" dating back 5+ years, showing many different phones that I had had.

I should note here, that the idea of this tool is to allow remote wiping (for example, if a phone was stolen).  Anyway, I started deleting all these accounts from the oldest to the newest.  Once I had deleted them all, it seemed my phone recreated the ActiveSync account, and since then all has been well.  My phone is behaving normal again.  Exchange email/contacts/calendar are all now syncing once again, and not draining my battery.

Maybe this will help some others as well...

Monday, March 12, 2012

XenServer - Install Tools from linux cmd line

Make sure the virtual CD drive has the XenTools image in it.
mkdir /mnt/xs-tools
mount /dev/xvdd /mnt/xs-tools
cd /mnt/xs-tools/Linux/
bash install.sh

XenServer backups

These articles detail how to backup XenServer VMs on the fly.

http://www.8layer8.com/?p=260

http://blog.andyburton.co.uk/index.php/2009-11/updated-citrix-xenserver-5-5-automatic-vm-backup-scripts/

---


Daily backups are still the best way to get a virtual machine back on it's feet, and then restore them the rest of the way with CDP if you have it.
Backing up a live server in Xen is super simple, and restoring it is as well. The trick with backups is always getting them to run consistently, automatically, and verifiably.
There is a great script here from Andy Burton that I use daily to export ~20 virtual servers to a USB disk attached to one of the Xen Servers.
I have tweaked it very slightly and have some cleanup scripts to handle disk remounts, removal of older backup images, and some logic to not back up if the backup drive is not present and mounted.
audit.sh - A plaintext dump of all the info needed to figure out what used to be connected to what and where it used to live, all the SR, VM, VIF, UUID's etc. are here in a reasonably readable format if needed.
cleanup.sh - unmounts and remounts the backup disk, and then cleans it up so that we only have the last two backups on it. Needs some logic to abort if the drive isn't, or can't be, mounted.
crontab.txt - My listing of jobs and order of them to run. Times are up to you.
meta-backup.sh - Backs up the metadata of the Xen Pool in a restorable format. Backs up the host machines over to the backup drive as well.
mailheader.txt - The simple header for outbound emails
and be sure to download the xenserver_backup.tar.gz script too from above .
You will need somewhere to put the backups, it can be an NFS share, SMB share, USB disk, flash drive, or anything else you can get mounted up.
DO NOT BACKUP VM'S TO THE XENSERVER "/" PARTITION! It does not have enough space to backup more than the tiniest VM and you WILL crash your XenServer and have to spin it up with a live CD and clear out whatever 2GB+ file you just accidentally made!
Note that you can back up all the VM's, Xen hosts, and metadata from a single Xen host, so you only need to set this up on one machine. I use a KingWin USB "Toaster" style dock to keep a 320GB SATA disk in and storing daily backups, which currently is enough for two days worth. It's a two slot toaster, so I bring in a 1.5TB disk monthly for extra copies of snapshot VM's as well as encrypted file-by-file backups of the file servers.
Setting up an external disk(s):
fdisk -l (that's an L)
(look for the backup drive, probably /dev/sdb but not always)
Partitioning
fdisk -l /dev/sdb     (that's an L)
Press p to see current partitions
If there is more than one, delete it with d
select the highest number partition
keep going until they are all gone
Press n for new partition
p for Primary
Partition 1
defaults on the rest
Press w to write the changes and exit
Formatting
mkfs.ext3 /dev/sdb1
(this will take a little while for a large drive)
Make it usable in the filesystem:
mkdir /mnt/backup
mount /dev/sdb1 /mnt/backup
From here, you can export and import by hand:
**************************************
Export
(Tip! Use TAB to auto complete it all!)
[root@xenshuttle ~]# xe vm-export         {TAB}{TAB}
filename=              preserve-power-state=  vm=
[root@xenshuttle ~]# xe vm-export vm=  {[TAB][TAB]}
{A list of servers appears, We want to export Pokey Server}
[root@xenshuttle ~]# xe vm-export vm=Pokey\ Server  {Type in Pokey then [TAB][TAB]}
filename=              preserve-power-state=  vm=
[root@xenshuttle ~]# xe vm-export vm=Pokey\ Server filename=
[root@xenshuttle ~]# xe vm-export vm=Pokey\ Server filename=/mnt/backup/pokey.xva {Type in /mnt/backup/pokey.xva then press ENTER}
After a bit, the pokey machine is backed up onto the external hard disk. This is a fairly quick operation, usually far faster than a file-by-file backup (500MB/minute on average) and note that it compresses the backup on the fly.
Setting up the automatic backups:
***************************************
From a XenServer with a USB drive attached:
chmod +x *.sh
chmod +x dbutil
cp ./dbutil /sbin/dbutil
chmod +x /sbin/dbutil
tar -zxvf xenserver_backup.tar.gz
nano meta-backup.sh
(change the names and filenames to the names you have for your XenServers)
Control-X to exit, answer Y to save changes
nano vm_backup.cfg
Edit the log_path to be "/mnt/backup/vm_backup.log"
Edit backup_dir to be "/mnt/backup"
Edit backup_vms to be "all"
Enable email on the XenServer:
nano /etc/ssmtp/ssmtp.conf
Change mailhub=mail to mailhub=(your mail server goes here)
Change rewriteDomain=yourdomain.com
Save and Exit
Schedule the backups:
Note the times here are reverse military, so 0 19 * * *  means run at 19:00 every day, change them if you want, just keep them sequential top to bottom time-wise so the scripts run in the right order.
crontab -e
(Press i to insert text)
Paste in: (be sure to use your email address!)
MAILTO="myaddress@mydomain.com"
0 19 * * * /root/cleanup.sh
10 19 * * * /root/audit.sh
11 19 * * * /root/meta-backup.sh
0 20 * * * /root/vm_backup.sh
Then press Escape
Type in :wq [enter]
Done.
Check the backups tomorrow and for several days to make sure that they rotate properly.
Restoring from a backup is very simple:
Locate the backup you want to restore from, probably in /mnt/backup/thisserver_09_25_2010.xva
Locate the storage repository you want to restore it into: run xe sr-list, and find the storage you need to use, note the first 4 characters of the UUID (ex: 28f2)
Run xe vm-import like this:
xe vm-import filename=/mnt/backup/thisserver_09_25_2010.xva sr-uuid=28f25ea1-4c49-5346-4a86-d37560bd07b7 [ENTER]

Wednesday, March 7, 2012

XenServer upgrade via NFS/HTTP/FTP

I couldn't get this to work with NFS, so I tried ftp.

I did get it to work, but only by using this format:

user@/Xen

and leaving the user and pass fields blank.  If I entered the username on the dedicated line, it did not work.

Referenced this post:

http://forums.citrix.com/thread.jspa?threadID=290031

Monday, March 5, 2012

How to get account creation date/time in AD

in Server 2003 you can install ACCTINFO.dll

In server 2008 and later, there is a built in way:

http://blogs.technet.com/b/askds/archive/2011/04/12/you-probably-don-t-need-acctinfo2-dll.aspx

XenServer 6.x removes auto-start vm option!?

Yes, they removed this feature.  So basically, if you physical machine reboots, none of the VMs will come up.  Anyway, there are various solutions, but the simplest one is to add to the bottom of


/etc/rc.d/rc.local

the following 2 lines:


sleep 30
xe vm-start tags=autostart --multiple

Then, add a tag to each VM, autostart.
This info comes from the post here: