This is part two of a blog that I had written earlier. The premise of part one was to better understand what are the options that companies face should their Active Directory be compromised. How can they get back up and running as quickly as possible? How can it be done with as many assurances as possible that they are not restoring malware back into the instance of the restored Active Directory Forest?
For those of you who have already read part one, thank you for the great feedback. All good points and comments.
So, let’s start by reviewing the use case I left off with in part one…
Everything is encrypted. Parking lots are empty. Factories are silent, and the CEO is on the phone every 30 seconds saying something “colorful” about your ancestry. Unfortunately, this isn’t a scenario I fabricated. It’s a monthly and pretty much nowadays weekly conversation I have. It’s part of the business that I am in and lately it’s a very tough, realistic real-world dilemma.
The comments I received to the first part of the blog fell into two areas. The first suggested using a more secure permission management system. Always a good idea but in this case it’s too late. Though we can look at that as part of the recovery process. More on that later. The second was a partial rebuild of a separate forest. This second Forest would essentially be a “VIP” Forest where all of the company’s important assetsinformation would sit. Maybe. But how long would this take? Does the company have the resources to rearchitect an entire Forest on the fly? Bear in mind that I need to be up and running yesterday.
If the above scenario is an attack and you know the time you got infected and how they got in, then the answer of what you need to do in my eyes is reasonably straightforward. It’s a matter of taking a backup of your Active Directory from the day before the attack and doing a Forest restoration. (I will get to the rootkit challenge in a minute). Then the only question is – did you invest in a solution that automates a Forest restoration and brings it back in a couple of hours or is your back-up more of a traditional system state Active Directory backup, where you do not have specialized automated processes to restore Active Directory in its entirety. If it is the latter, then you should plan to be down four or five business days manually reconstructing your Forest back to its full capacity.
In reality, some of the above attack scenarios could have an embedded rootkit. It’s an interesting riddle…with the above scenario how do you remove a rootkit from a Domain Controller. It’s tough because to restore you need to have System State brought over, where you could very well be reintroducing the attacker. This also means that a bare metal backuprestore will not work since you still have the same risk.
Does everyone remember the Die-Hard movie where the bad guys know that in hostage crisis situations, the FBI cuts off the power and that is ultimately how they get into the bank vault? It’s the same drill here. I know, and you know what it takes to restore a DC, but the real problem is– so do the bad guys. They are going to rely on everyone following Microsoft’s “standard operating procedure”. What better way to stay persistent.
OK. So, what would I do in the scenario where the attacker has been in your system for a while or as in most cases, you aren’t really sure when they got in, how they got in, etc.
The first part would be to reset all passwords, or at least all the privilege/sensitive accounts. Yes, a very challenging task, but I would argue a far easier and faster than a full or partial rebuild. Bear in mind the clock is ticking. Of course, it should go without saying that as part of the permission restoration you embed best practice permission setting or in this case resetting.
The next step would be to adopt an automated Forest Recovery solution. To be clear this needs to be in place before any of this happens. It’s not a tool that you can bring in the day after. Multiple-day recovery versus a couple of hours speaks for itself in terms of the benefit that it brings. Another consideration that many do not realize is that for many of your other DR solutions to work, AD needs to be functioning, because they often rely on AD for authentication and authorization to gain access to the DR solution.
Built into our Forest Recovery solution is the answer to the rootkit problem. Part of our Forest Restoration process is something we call “clean restore”. I won’t go into the how (that’s a whole other blog) but the result is that we are ensuring that only the AD portions of system state are restored—not the OS itself. You can now take a new hardware server, load a fresh Windows ISO, and bring over AD without the risk of reintroducing the malware.
There is also another benefit built into the Forest Recovery solution worth mentioning and that is the ability to be hardware agnostic when restoring AD to a Domain Controller. Prior to this functionality, if you wanted to restore AD you had to restore AD to the same hardware, the same OS, the same patch levels and the same service pack levels. If not, you are likely to experience a blue screen during your restoration and spend a few hours figuring out the missing drivers and similar. Today as part of our Forest Recovery solution we are agnostic to the hardware, service packs or patch levels of the targeted domain controller. The only item that needs to match-up is it needs to be restored to the same Operating System number as it was backed up on.
This leads to a lot more flexibility for scenarios like physical to virtual restoral, lab creation and restoring AD to the cloud.
I mentioned at the beginning of part one that I wanted to write about this because while there is some documentation and some “guidance” out there on what to do, I found it to be slow, incomplete and in many cases unrealistic. The other motivation came from a timing perspective. When I started looking at this a few years ago, it was more of an academic or theoretical exercisediscussion. A paper here or there. Now IMO it’s gone from an interesting edge case to a regular “what if” contingency planning question. Organizations are beginning to realize that if they’ve invested in having an SLA of, let’s say a day, for their disaster recovery planning, they very well may have to add on many more days, since nothing will work without the AD service up and running.
Finally, the stakes have never been higher. Entire companies’ survivability is at stake. Don’t get me wrong– I don’t have THE “silver bullet”. No one does. But hopefully I’ve added some new considerations on a very difficult problem.
If you would like to learn more about our technology or if you just want to debate me ? Feel free to reach out.