What are good and bad Active Directory snapshots
In this post I would like to explain a little bit more about Active Directory snapshots, and how you can or can’t use them.
‘First of all, let’s make one thing very clear: VM Snapshots of Domain Controllers are not supported!
Let me say that again if you have a virtualized domain controller, and you’re planning on taking a snapshot of that VM from your host (Hyper-V, VMWare or any public cloud provider) just stop!
You can read here, as stated officially by Microsoft http://technet.microsoft.com/en-us/library/virtual_active_directory_domain_controller_virtualization_hyperv(WS.10).aspx
‘Do not take or use a Snapshot of a virtual domain controller.’
Same thing goes to copying the Virtual hard disk of a Domain controller, using differencing disks, or any other yet un-invented feature of rolling the VM itself back in-time without using a supported backup and restore method.
Now, let’s discuss a little bit about the why.
When Domain Controllers replicate they use a mechanism called up-to-dateness vector in order to track the replicated changes.
Say we have two domain controllers DC1 and DC2, both DCs in the corp.contoso.com domain. Each of those domain controllers will have a local update sequence number (USN) which tracks changes made in the local database. Those USNs are local, and are not replicated (hence they are different between the two domain controllers).
Think about USNs as counters counting the changes performed on each DC locally, a real life example could be a security guard at an entrance to a concert counting the people coming in. Now think this concert has two doors, with two security guards. Would the number be the same? Definitely not!
So let’s say our DC1 has a highest committed USN of 5000, and DC2 has a highest committed USN of 6000. (going back to the security guards example security guard #1 has counted 5000 people coming in and security guard #2 has counted 6000 people coming in).
So what does it have to do with the up to dateness vector? Well, the UTD vector keeps a list locally on each DC what was the highest committed USN in the previous replication attempt. So, if we trigger replication in the current state, after it finishes successfully the UTD vector on DC1 would show:
And the UTD Vector on the DC2 would show:
Note: you can actually display the up to dateness vector by using the repadmin command:
Repadmin /showutdvec DCName PartitionDN
So if we take it back to the security guards example, that means that at some point in time they give each other a call saying ‘Hey, I counted to 5000 people’ here’s their names, and the other one is saying: Great, I’ve counted to 6000, here’s their names.
Now let’s look at what happens during a snapshot recovery (rolling the server back in-time).
Say we recover DC1 to a point in time where it’s highest commited USN was 3000. Now remember, the UTD vector table on DC2 still store the value of DC1 highest committed USN as 5000. When they try to replicate DC1 says, hey I’ve got changes up to 3000. DC2 ‘looks at him funny’ and says’ but last time we replicated you had changes up to 5000′ what’s wrong?’.
Security guards example? After that first phone call, comes another phone call where security guard #1 says ‘Hey, I counted to 3000 people.’, security guard #2 response? ‘Last time we talked you said you counted to 5000′ what’s wrong?’
See the problem in this?
This situation is called USN rollback, and at that point the DCs stopped talking to each other and replicating any information.
So how is this solved?
When you restore using a supported method the restore process actually changes the way the Domain Controller appears in the up to dateness vector table, by changing the GUID of the database (also called InvocationID). The entries in the up to dateness vector aren’t actually DC names, but rather the Database GUIDs of the domain controllers being replicated for the specific naming context.
Note: running Repadmin /showutdvec DCName PartitionDN /nocache will display the GUIDs mentioned.
So when a DC is restored using a supported method the GUID of the database is changed, hence DC2 knows that now it’s not replicated with DC1, but rather with DC1a (which still has the same computer name, server GUID, etc, but has a different AD database, which is what matters for replication).
So, in that scenario it’s just like DC2 has started replicating with a brand new Domain Controller, so no USN rollback happens.
That’s the full explanation for why Snapshots are not supported!
Now, with that clearly out of the way what are good snapshots?
Good snapshots are the ones made using the ntdsutil command, which uses VSS under the hoods to create a snapshot of the Active Directory database. Those snapshots cannot be used to restore the database though (for exactly the same reason as explained above), but they can be mounted, inspected, and some information can be extracted from them, and inserted into the production database by tools such as ldifde.exe and others, but more on AD snapshots in later posts.
One important takeaway from this article. Use system state backup to backup your domain controllers, and make sure you know how to restore from it in every single possible scenario.