We’ve recently resolved a couple of instances of a very specific Web Application Proxy post-install configuration failure. The conditions under which you would see this are quite specific but if you do encounter this it’s not a particularly easy problem to either identify or resolve so we wanted to share our findings to hopefully save some of you some time and pain.
What are the Symptoms of the Issue?
After installing the Web Application Proxy feature you then need to run through the Web Application Proxy post-installation configuration wizard. After completing all the necessary details the post-install configuration tasks will then run.
Under certain conditions which will come to a bit later, retrieving the ADFS Proxy configuration may fail with the error:-
“AD FS Proxy could not be configured. Unable to retrieve proxy configuration data from the Federation Server.”
You will also see the following logged in the AD FS event log on the Web Application Proxy server:-
Log Name: AD FS/Admin Unable to retrieve proxy configuration data from the Federation Service. Trust Certificate Thumbprint: Status Code: |
What’s going on in the Background to cause this?
As part of responding to the AD FS Proxy configuration request the AD FS server has to determine if Device Registration has been initialized and enabled. This is so the AD FS Proxy service on the Web Application Proxy server knows whether it needs to enable the Device Registration endpoints and certificate bindings,and for which UPN suffixes.
As part of the DRS discovery logic, if the DRS configuration objects which are stored in Active Directory, cannot be found in the local domain, the AD FS 2012 R2 server will try to connect a DC in the root domain to enumerate all GCs under the CN=Sites container and that host the root domain. From this list, it iterates over each GC until it finds a reachable GC to try to locate the DRS configuration.
In a complex, segregated network environment the AD FS server may not have network access to many of these Global Catalog servers which leads to delays while the TCP connection attempts to these GC servers time out.
The Web Application proxy server call to retrieve the AD FS Proxy configuration has a 100 second timeout so if the cumulative delays due to the GC network timeouts exceeds this timeout the post-install configuration will fail. Web Application Proxy will retry the configuration call a number of times but, if you are seeing this issue, each call will be subject to the same delays and will fail.
We are aware that this behavior may cause issues in complex network environments and we are working with the AD FS Product Group to review this and look at how we can modify this behavior moving forward but that’s not going to help you if you’re hitting this issue now.
How can I check if I am hitting this issue?
The first thing to check for is the presence of the “System.Net.WebException: The operation has timed out” Event ID 422 error in the AD FS event logs on the Web Application Proxy server. This is the primary indicator that you may be hitting this problem.
If you believe you may be hitting this issue you can also run a Network Monitor trace (or your network capture tool of choice) trace on the ADFS 2012 R2 server while attempting to run the Web Application Proxy post installation configuration.
Network Monitor 3.4 can be downloaded from the following link - http://www.microsoft.com/en-us/download/details.aspx?id=4865
You can then apply the following display filter in Network Monitor - property.TCPSynRetransmit
This filter will show re-transmitted TCP SYN packets.
A TCP SYN is the first packet used to establish a TCP connection. Re-transmitted SYN’s means that the expected SYN ACK response was not received so the OS retries the SYN packet. You can think of this a bit like trying to phone someone, not getting an answer, waiting a few minutes and then trying again.
We’re specifically interested in unanswered SYN packets to port 139 and 445 which are the ports that will be used to talk to the Global Catalog server.
If you hitting this issue you would see something like the following with the above filter applied.
Note – as the Web Proxy Application server will make a number of attempts to retrieve the configuration you will see multiple cycles of attempts. For the purposes of solving this issue we just need to bring one cycle under the 100 second timeout so we’re interested in unique Global Catalog servers and not how many times we try to connect to them.
A Short Diversion into TCP/IP SYN Re-transmissions
The TCP/IP SYN re-transmission behavior is very specific and worth a short discussion on as tuning this does provide a possible workaround for the issue.
As noted above if a TCP SYN connection attempt does not receive a response it will re-transmit the SYN attempt and there are a couple of key parameters that control the behavior here:-
- TcpMaxConnectRetransmissions – default value 2 – this controls how may times the TCP stack will retry the initial SYN packet if a SYN ACK response is not received. The following KB talks in more details about this setting - http://support.microsoft.com/kb/2786464
- InitialRTO – default value 3000 msec – this is the initial retransmit timeout period and gets doubled between each successive retry. The following KB talks in more details about this setting - http://support.microsoft.com/kb/2472264
For example, with default settings you will see the following behavior:-
Syn Attempt
Timeout Period
Cumulative Time
Initial SYN
3 seconds
3 seconds
1st SYN re-transmit
6 seconds
9 seconds
2nd SYN re-transmit
12 seconds
21 seconds
These settings are relevant as it gives us a means to control how long the timeout takes for each unavailable Global Catalog server.
Potential Workarounds
Now that we know what’s causing the timeout let’s look at potential workarounds for this.
1) Allow the Network Traffic to one of the Global Catalog servers
It’s easy for me to sit here and write this but I’m not the one who has to go and talk to the network/Firewall guys and convince them :-)
Seriously though, if you are able to open either TCP port 139 or 445 to one of the first Global Catalog servers that AD FS can’t reach this should resolve the issue as the AD FS server would stop at the point it successfully connects to a GC. Hopefully this blog will help explain the issue to your network team.
As mentioned above you can use Network Monitor to identify which Global Catalog servers the AD FS server is trying to connect to and then look at allowing Network Access to the first of the IP addresses you find.
2) Initialize Device Registration
Initializing Device Registration would create the DRS configuration containers in Active Directory and should avoid the Global Catalog discovery steps that lead to the delay.
Device Registration can be initialized using the following PowerShell script on one of the AD FS 2012 R2 servers:-
Initialize-ADDeviceRegistration
Note – this needs to be run with Enterprise Administrator credentials and requires that the Windows 2012 R2 AD schema update has been deployed. For further details on this cmdlet please see the following article - http://technet.microsoft.com/en-us/library/dn479332.aspx
This cmdlet does *not* enable the device authentication nor the device registration service in the ADFS servers. Enabling Device Registration is a second step in the DRS set-up process.
3) Tune the TCP SYN Retransmit behavior
If you are unable to implement either of the above resolutions then tuning the TCP SYN retransmit behavior can also be used to workaround the problem.
As discussed previously, we will incur a 21 second delay for each Global Catalog server we try to connect to that we cannot reach. If we can tune the timeout down to that the cumulative delay brings this under 100 seconds (let’s say 80 seconds to give some time for other required activities) then the AD FS server should respond to the configuration request within the 100 second timeout.
- The default setting for TcpMaxConnectRetransmissions is 2 and this is also the minimum value so we cannot tune this further down.
- This leaves us with tuning InitialRTO down from its default value of 3000 msecs. The valid range is 300-3000 msec. This can be tuned using the following netsh command:-
netsh interface tcp set global initialRto=3000
It’s hard to give a precise value to set this to as it how low you will need to go will depend on how many Global Catalog servers the AD FS server is trying to reach. You can use the following table to get an idea of the timeout per server you will get for various initialRto settings:-
initialRTO setting
(in milliseconds)Total Timeout Period (per failed GC connection) 3000 (default) 21 seconds 1000 7 seconds 500 3.5 seconds
This would only be a viable option for up to around 20 unreachable Global Catalog servers. Anything above this number is going to start to reach the 100 second overall timeout even with a 3.5 second total timeout per server.
Note – lowering the initialRTO could potentially cause the AD FS server connectivity issues in high latency environments and while we would not expect this to be the case for the majority of deployments please just keep this in mind. This is also a global OS level setting so affects any outbound TCP connections made by the AD FS server. It has no impact on inbound connections.
Note – reducing the initialRTO is not something that we would normally recommend so please do remember if you do change this that we’d recommend that you revert to the default settings when we have a fuller resolution for this issue if you have needed to tune this down.
Summary
Under certain conditions the Web Application Proxy post-installation configuration task may fail with the following error:-
“AD FS Proxy could not be configured. Unable to retrieve proxy configuration data from the Federation Server.”
The AD FS event log on the Web Application Proxy server will also show an Event ID 422 error with “System.Net.WebException: The operation has timed out” in the error details.
We do not expect the issue we’ve covered here to be a common issue but due to the difficulties in diagnosing this and understand the resolution options we wanted to share our findings.
In the cases we have seen this timeout errors due to the AD FS server trying to locate the Device Registration configuration containers in Active Directory. As part of this discovery the AD FS server may try to connect to GC’s across the local Forest.
If there is a complex, segregated network infrastructure in place these connection attempts may fail incurring a 21 second delay per inaccessible Global Catalog server leading to the Web Application proxy server 100 second timeout being exceeded.
There are a number of workarounds you can consider to alleviate this issue:-
- Allow ADFS connectivity to one of the relevant Global Catalog servers
- Initialize Device Registration
- Tune the TCP SYN retransmit timer (initialRTO) to reduce the time delay per GC server
We are working with the AD FS Product Group on a solution for this issue moving forward and will update this article when we have some updates on this but in the meantime hopefully the above information will help unblock any of you who do hit this issue.
Ian Parramore, Senior Escalation Engineer, Web Application Proxy support team
Thanks also to the following for helping put all the pieces of this jigsaw together:-
Billy Price, Senior Escalation Engineer, Web Application Proxy support team
Rainier Amara, Support Escalation Engineer, Web Application Proxy support team