Darren Mar-Elia | Principal Security Strategist

What does identity resilience actually mean?

Our industry loves buzzwords. After 40+ years in this business, that is something I can say with a fair bit of understatement. The latest buzzword darling is resilience. I see it everywhere, pertaining to everything. I even hear it discussed with respect to athletic performance nowadays!

But when it comes to a topic that is near and dear to my heart—identity—the idea of identity resilience actually does have meaning. A resilient identity system is one that can “take a licking and keep on ticking” (to borrow a tagline from the past).

Resilience means that your identity system, once compromised and even potentially wiped out, can be not just recovered—but recovered in a trustworthy way.


Identity resilience requires a unique kind of recovery

Understanding the specialized nature of identity recovery is important because, as we all now know, most organizations live and die by their identity systems.

When identity is down, the organization is down—full stop. So your ability to recover that identity system as quickly as possible, to a level that can service the core capabilities of the business, is the one of the most important tasks you will be entrusted with. This is career-making (or ending) stuff, and I’ve both happen time and again.

When it comes to Active Directory (AD), the core identity system for most of the planet, identity resilience has a very specific size and shape. In the 8+ years I’ve been talking to customers about this topic, the point I make is one that is easily understood by anyone who has ventured down the rabbit hole of actually performing a full AD forest recovery:

Recovering AD is not like recovering a file server, or a database server, or an application server, or…you get the idea.

AD is a complex, multi-master, replicated database. You must recover it in a very precise way, following a set of very precise steps, in the right order…or you start over again.

And, if you are a reasonably large-sized organization with lots of geographic locations, multiple domains, and lots of dependencies to external systems, then the problem becomes even more challenging.


Choosing an identity resilience solution: What’s most important?

I had the privilege of being at Semperis at the very beginning—as we were developing what became the first cyber-aware AD forest recovery solution.

That solution involved re-thinking both the backup and recovery of AD—how it had been done for about 16 years at that point, and why that established process wasn’t sufficient to meet the needs of the growing threat of AD-based attacks.

What I learned was that you can’t think about AD disaster recovery as “just another backup problem” to be solved by “just another backup vendor.”

Expertise matters. You wouldn’t want your primary care doctor to perform your brain surgery. Likewise, you must ensure that your AD disaster recovery solution provides true identity resilience—not just backup.


Checklist for selecting your identity resilience solution

With that in mind, I’ve put together this list of requirements that should be non-negotiable for you in your job providing identity resilience to critical identity systems like Active Directory.


1. Start with this understanding: Identity recovery is not a “checkbox” item.

Saying, “OK, I’ve backed up AD, I’m good to go” is like “hope”—it’s not a strategy.


2. Recovery is the most important thing to prove—not backup.

This is a corollary to #1. Anyone can back up AD with a set of scripts using in-the-box tools. Not everyone can recover AD in a way that meets a complex organization’s needs to recover the identity infrastructure from a cyber event (more on this in #4).


3. When selecting an identity resilience solution—don’t just take the vendor’s word for it.

If your organization is going to live and die by what they are selling you, make them prove it. Do a proof of concept (PoC—aka proof of value) with a realistic representation of your production AD. Make sure the solution can recover AD within your specified Recovery Time Objective (RTO).


4. Define your actual RTO.

As a corollary to #3, your RTO should include enough AD capacity to serve your most critical line of business applications—what we call the Minimum Viable Company (MVC). If your vendor’s RTO is getting one domain controller functional…that’s not your RTO.

If the initial domain restore doesn’t include those critical business capabilities and you have to spend another day or two bringing up enough capacity to restore business functions, then THAT is the RTO of the solution.


5. Know what compromised AD looks like.

And also make sure that your chosen solution understands what it means to have a compromised AD.

AD can be compromised in at least two ways. Attackers can compromise the OS on which AD runs, and they can compromise AD itself—by creating persistence and backdoors within the AD database.

A solution that claims to be “cyber-first” should be able to help you address both.

NOTE: Please don’t rely on AV scanning to handle an attacker compromise. If I had a dollar for every time an AV failed to detect a novel piece of malware…

And yes, the solution should also be able to help you hunt down those persistence elements once AD has been recovered; but in truth, this is also where expertise matters. Make sure you are partnering with someone who really understands how AD can be attacked. If your vendor just learned about AD yesterday because AD cyber recovery is the cool new thing to offer, then you will have the privilege of experiencing that lack of experience when it matters the most—and not in a good way.


6. Finally—and this is important: Just as identity resilience is important, so is resilience in your identity resilience solution.

When you are in the $h*t (and IT folks can totally relate to this) everything that can go wrong will.

With the entire management team breathing down your neck, the last thing you want to tell them is, “Sorry, a step failed in our recovery so we have to start over.”

Your recovery solution needs to be tolerant of the vagaries of disasters. That tolerance comes from experience—from a vendor who understands what can go wrong in a recovery and is fully able to help you work around unexpected blockers. Again, those aren’t things you can predict; but you can simulate that sort of chaos and put the solution to the test.

If your vendor can’t handle recovery in a PoC, do you really think they can when a real cyber crisis hits?


Prove you’re ready when it matters most

I truly hope you never have to recover your AD during a crisis.

But if you’ve chosen an identity resilience solution that meets these requirements, you should be able to recover with confidence, reduce the impact to your business—and even use the experience to strengthen your identity security posture against the next threat.

That’s true resilience.


Further reading