Posts Tagged ‘esxi’

VMware ESXi 3.5u4, Intel SATA, and local datastores

1 Comment »
No Gravatar

This morn­ing I rebooted my test box run­ning VMware ESXi 3.5 to com­plete the upgrade from Update 3 to Update 4. The hyper­vi­sor came back up, but no guests were run­ning and when I popped open the VI Client it indi­cated that there were no data­s­tores con­fig­ured and it could not find any of the vir­tual machines I had in inven­tory. It saw the inter­nal disks and that they were for­mat­ted VMFS, but would not allow me to do any­thing other than for­mat them over again.

Nor­mally this would have sim­ply annoyed me since I would have lost my test VMs, but they don’t take long to build so I’d have just for­mat­ted them and gone on with my day. Unfor­tu­nately within the last week we had tem­porar­ily moved a crit­i­cal application’s VM to this box and we had not prop­erly recon­fig­ured backup. I could restore from the week old backup, but there would be hell to pay.

Since the VMFS par­ti­tions were clearly vis­i­ble I felt I had a chance, but I’m still new to ESX/ESXi so my first step was to flip over to my always run­ning irssi ses­sion (if you use IRC and do not use screened irssi, go Google it now and enjoy) and ask for help in #shsc and #vmware. #shsc always has a few guys who work on large VMware installs idling, and of course #vmware is obvi­ous. While wait­ing for any input from IRC, I went to Google for my next step. I knew ESXi has the capa­bil­ity to be accessed via SSH, but it’s dis­abled by default, so I looked up how to turn it on. A few min­utes later after bring­ing a mon­i­tor over to the machine and reboot­ing it I had SSH access and could go through sys­tem logs from the com­fort of my laptop.

In /var/log/messages I found two entries ref­er­enc­ing my SATA con­troller which looked inter­est­ing:
May 5 14:34:35 vmkernel: 0:00:06:39.406 cpu0:3616)ALERT: LVM: 4482: vmhba000:0:0:3 may be snapshot: disabling access. See resignaturing section in SAN config guide.
May 5 14:34:35 vmkernel: 0:00:06:39.408 cpu0:3616)ALERT: LVM: 4482: vmhba0:0:0:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

This infor­ma­tion, after a quick trip to Google, led to VMware’s SAN con­fig­u­ra­tion guide which ref­er­ences sim­i­lar issues occur­ring on SANs, so I tried enabling the res­ig­na­tur­ing option and mag­i­cally my data­s­tores reap­peared. After renam­ing them back to their orig­i­nal names and turn­ing the res­ig­na­tur­ing option back off I had all my data and was able to down­load the disk images and VMX files so I was safe in the event of a major problem.

At this point, I could see my VMs but the VI inven­tory was still con­vinced that they were on the “old dri­ves”, so after a bit more time on Google I dis­cov­ered the Import fea­ture within the data­s­tore browser and I was able to bring the VMs back in and get them boot­ing up.

Screenshot showing my datastores and two VMs running

Screen­shot show­ing my data­s­tores and two VMs running

After con­firm­ing that the VMs I really needed were boot­ing and oper­a­tional, I shut every­thing down to move the server back to its spot in my rack. For­tu­nately every­thing came right back up so the pres­sure was now off.

Now my con­cerns shifted. If this hap­pened once, what’s to stop it from hap­pen­ing again? I needed to fig­ure out why it hap­pened. For­tu­nately at nearly the exact moment I started think­ing about this IRC came through for me. “jidar” in #shsc linked to this thread on VMware’s forum with lit­er­ally the exact same symp­toms. A few posts down was a link to this page which again matched my expe­ri­ence exactly and says that U4 updated a num­ber of SATA dri­vers includ­ing the one for the ICH9 con­troller in my Pow­erEdge and changed the way they appear to the hyper­vi­sor, which led to it not rec­og­niz­ing the dri­ves for what they are.

Right now I’m mod­er­ately annoyed at an update that’s not even enough to earn it a minor ver­sion num­ber bump on a piece of soft­ware intended for enter­prise use hav­ing a change with the poten­tial to cause this, but on the other hand I don’t expect any­one who really cares about reli­a­bil­ity to be using SATA local stor­age. Ah well, I learned a bit about nav­i­gat­ing around ESXi’s internals.


$words[rand()] is using WP-Gravatar

SEO Powered by Platinum SEO from Techblissonline