Sean Deuby

Microsoft continues to work on a sore spot in its hybrid identity strategy: The challenge of deploying its identity bridge between Active Directory Domain Services (AD DS) on premises and Azure Active Directory in the cloud. This bridge consists of AD FS for federation and a succession of utilities, culminating in Azure AD Connect, for identity synchronization. Azure AD Connect has recently been made generally available, and it makes the experience of hooking your on-premises AD DS (and other identity database types such as SQL Server or LDAP) to Azure AD easier than its predecessors DirSync and Azure AD Sync.

AD Connect’s new capabilities

AD Connect (you can learn more about it here) is a configuration utility that simplifies the setup of identity synchronization between a short list of popular identity sources (including Active Directory Domain Services of course) and optionally single sign on with AD FS. It greatly simplifies the setup process, and also provides new capabilities such as device-related attribute writeback from Azure AD to AD DS.

One new capability that’s not received much mention, however, is one that practically all my enterprise customers ask for: The Azure AD Connect staging server option. To understand this option, let’s first review the situation it’s designed to mitigate.

How Microsoft identity synchronization works

This evolution of synchronization utilities – DirSync, its successor AADSync, and finally AD Connect  use a trimmed down version of Microsoft’s metadirectory service, variously named MIIS, ILM, FIM, and as of a few months ago, Microsoft Identity Manager (MIM). These metadirectory services have connectors to both AD DS and Azure AD to pull attributes into connector spaces. A set of synchronization rules determines whether these attributes are added into a metaverse that contains the join of both on-premises identity attributes and Azure AD attributes. Finally, outbound sync rules determine what attributes are written to Azure AD.

Figure 1: AD Connect Service Architecture (Image courtesy of Microsoft)

Figure 1: AD Connect Service Architecture (Image courtesy of Microsoft)

This requires a server with connectors configured, rules defined, and a database engine (SQL Server Express or full SQL Server) to hold the connector spaces and the metaverse. There is no fault tolerance or redundancy built into this on-server database, and clustering is not recommended. And if you’re thinking, I’ll just restore the database from backups I take that’s not supported either. Basically, if the server dies you have to either uninstall and reinstall the sync service or recover the entire server from backups.

This isn’t quite as bad as it seems. Microsoft rightly says that though sync is certainly a mission critical service, it’s not a particularly time sensitive service; the default and recommended sync interval is three hours. This means though it may take a few hours to rebuild the sync service in event of failure, at most you’ll probably miss one sync cycle. Enterprise IT shops, however, don’t like grey areas in their production systems recovery time. In most large companies, if it’s deemed a high-importance system it must have some kind of fault tolerance or enhanced availability built in. Are you serious?  is a comment I’ve received more than once when a company was contemplating DirSync or AADSync’s high availability capabilities.

How the staging server option works

The staging server option was developed to address this shortcoming. Though it isn’t true high availability, implementing staging server will allow you to resume identity sync within a few minutes (once you’ve decided to fail over to it).

To set up a staging server (Figure 2), you go through an identical setup of Azure AD Connect as you did for the primary AD Connect server with one exception: At the very last step, you check the staging server option. This prevents the sync results developed in the staging server’s service from being written to Azure AD or back to AD DS. This also means that password writeback and password hash sync is disabled, because the former requires changes written to AD DS while the latter requires changes written to Azure AD.

Figure 2: AD Connect Architecture with Staging Server

Figure 2: AD Connect Architecture with Staging Server

Recovering from AD Connect failure

In the event of a failure of the primary AD Connect server, you simply run through the AD Connect setup wizard again and uncheck the staging server option (and password writeback or hash synchronization if you’re using it). This enables updates to Azure AD and AD DS. Obviously you want to be careful that only one AD Connect server is fully enabled at a time!

The staging server option in AD Connect is a new capability that will be welcomed by large enterprises. It provides a much short service recovery time than its predecessors and it’s simple to implement with no special hardware or software requirements. I’m sure it will become a standard deployment practice for many companies.