How can I check the replication topology for problems?

Active Directory (AD) replication is dependent entirely upon an accurate replication topology, a map of which domain controllers will exchange replicated changes with one another. The topology is generated by the AD Knowledge Consistency Checker (KCC), which generates a topology both for intrasite and intersite replication.

Generally, the first symptom of topology problems is when one or more domain controllers, or an entire site, fail to replicate AD changes. You might, for example, notice that user password resets aren't being replicated properly or that new users and groups aren't consistently available throughout the domain. Often your first response to this problem is to use the Active Directory Sites and Services console to force replication; but doing so is useless if the underlying replication topology isn't correct.

Troubleshooting topology issues requires a methodical approach. If possible, start by determining whether you're dealing with an intersite or intrasite topology problem, as you'll need to troubleshoot them separately. To help make that determination, connect to a specific domain controller by using Active Directory Users and Computers. Make a change to AD, such as creating a new user group or organizational unit (OU). Check to see whether the change appears on another domain controller within the same site and within another site. Keep in mind that intrasite replication can take as long as 5 minutes or so to complete under normal conditions.

Intersite replication is dependent upon the replication schedule you configured for the site links.

Intersite Replication

When intersite replication seems to be a problem, check the obvious first:

  • Make sure your site links are configured with the correct replication schedule. If you have multiple site links between two sites, check the schedule on each. It's possible that one link has failed, and that AD was switched to an alternative link that uses a different schedule.
  • Check the network connectivity between the two sites to make sure that your network isn't creating the problem.

Next, figure out which domain controllers are acting as the bridgehead servers in each site. Keep in mind that each domain in each site will have at least one designated bridgehead server. Sites with connections to multiple other sites might have multiple bridgehead servers per domain, with each bridgehead handling connections to another site. You can find the bridgehead servers by looking at the connection objects in Active Directory Sites and Services, and noting which domain controller is configured with connection objects to a domain controller in another site.

If you can't find a connection object on any domain controller in one site to a domain controller in another site and both sites contain domain controllers in the same domain, then you have a topology problem. Troubleshoot the intersite topology generator (ITSG).

On each bridgehead server, ensure that you can ping the bridgehead servers at the other site(s). If you can't, a network issue is causing the problem and must be resolved.

If network connectivity between the bridgehead servers is OK, it's possible that the bridgehead servers aren't able to handle the replication traffic. Although this situation is rare, you can test the theory by manually creating connection objects to different domain controllers, thus creating new bridgehead connections. Delete the automatically configured bridgehead connection objects. If this action resolves the problem, the former bridgehead domain controllers are at fault and should be examined for functionality and capacity.

If no steps thus far have solved the problem or indicated a possible cause, you might have a serious replication problem. Check the System and Security event logs on your domain controllers for any events that might provide clues to the source of the problem.

ITSG

Each site has a designated ITSG, which generates the topology for that site. You can discover which domain controller is the ITSG by using Active Directory Sites and Services. To do so, open the console, select the site in question, and in the details pane, right-click NTDS Settings.

As Figure 20.1 shows, you'll see the name of the server acting as ITSG.

Figure 20.1: Server BRAINCORENET is the ITSG for this site.

After you've located the ITSG, ensure that it's functioning properly (services are all started and you can log on) and connected to the network. Next, force it to regenerate the intersite replication topology by running

repadmin /kcc

from the server's console. If the new topology doesn't solve your problem, consider taking the domain controller offline or restarting it. AD will choose a new ITSG within about an hour, and the new domain controller might have better luck generating the correct topology.

Intrasite Replication

The intrasite replication is generated by the KCC service running on each domain controller in the site. Intrasite replication generally occurs automatically and constantly throughout each domain in the site.

If a particular domain controller (or domain controllers) within the site don't seem to be replicating properly, check its replication partners. You can do so by running

repadmin /showreps

at each domain controller's console, or running

repadmin /showreps servername from a single console, providing the appropriate servername.

Document the incoming and outgoing partners for each involved domain controller, and ensure that the domain controller has proper network connectivity (such as ping) to its replication partners. If network connectivity between any domain controller and one or more of its replication partners isn't available, troubleshoot and resolve that problem.

If network connectivity appears to be OK, try to force each involved domain controller to generate a new replication topology by running

repadmin /kcc

on each domain controller. This process normally occurs automatically every so often, but it's possible that a new topology problem hasn't yet been corrected by the automatic process.

If a new topology doesn't correct the problem, try restarting each domain controller involved. If that doesn't fix the problem, you have a more serious AD replication issue that does not involve the replication topology; check the System and Security event logs for messages that provide clues as to the source of the problem.