Despite My Best Intentions…

This is a quick addendum to the recent posts on my home lab, showing what can happen when you don’t think all the way through a migration project!

As part of building out the new lab, I needed to move all of my internal management VMs over. They had been residing on an HP DL360 G5 with an external storage shelf attached, so part of this process included a storage vMotion over to my trusty ReadyNAS NV+ which I’ve had for years. I didn’t expect a lot of performance out of it, but it did well enough that I ended up moving ALL of my lab VMs over to it. I can definitely put it in the dirt if I’m not careful, but overall it does well, and it meant I didn’t have to spring for new storage!

So the SvMotion went fine, and everything is working perfectly. Both AD controllers, vCenter, vCOps, vMA, everything goes smoothly. I even anticipate one issue, making sure that I have the DNS name of the NAS array in the hosts file on the host, since DNS may not be available when it first boots up. Once everything is over, I pull all of the old gear out, shut the new hosts down, rack and cable everything and then boot the management host back up. It comes up, and there’s no connection to the NAS. I make sure I can ping the array by name from the host, and there’s no problem there. I start debugging the vmwarning logs, and I see that it’s getting a “permission denied” error when it tries to connect. It also looks like it’s trying to query LDAP? Strange. Let’s go look at the array.

6a0128779ad48c970c0154337fb51b970cThe array looks fine, nothing has changed and I didn’t even reboot it, so I’m puzzled. I see that the NFS share that the array is using has been added to the AD domain as well as allowing the ESXi hosts root NFS access, so nothing strange there. Very perplexing. Some digging on Google finally reveals the issue, and it’s a doozy…

The basic problem is that the ReadyNAS NV+ can be added to an AD domain, but it doesn’t CACHE any of the credentials! This means that when the domain is unavailable, so are the files for every share that uses CIFS sharing, including all of the VMs located on it. Including the AD controller… #facepalm

Now I have two issues: one, how do I fix the permissions enough to get the files OFF the NAS, and two, once I do, how do I get the AD controller working? It took a little hacking, a download off the ReadyNAS community site and a reboot, but I was able to access the NAS via SSH. Then I was able to manually change all of the permissions on the files that corresponded to my AD controller. Then I enabled FTP and pulled all of the files to my desktop. Once there, I was able to boot my AD controller using a copy of VMware Workstation. After it booted, the host was able to mount the NFS share again, and I was able to bring the rest of the environment up. Once the secondary AD controller was up, I stopped the one on my desktop and booted it back where it belonged…

The moral of the story for me was to make sure my next array can cache AD credentials in case of an outage. And to keep a copy of my AD controller up-to-date on my desktop, just in case!