TFS Warm Standby, Friendly names and incorrect documentation

I’ve been working with a client to get a high-availability TFS installation.  They have decided that a warm-standby App Tier would be beneficial so we cracked open the TFS Installation Guide to get one configured.  Everything worked out well until we got to the second-to-last step.

On the primary application-tier computer, rename the primary server to use the virtual server name. By following this step, you can point to either computer through the DNS method that you chose.

We needed to change TFS so that it used a Friendly Name (Virtual Server Name/HOST record) instead of the primary App Tier’s machine name.  Luckily these steps are documented in the Install Guide.

Configuring Team Foundation Server to Use the Virtual Server Name

You can configure both the standby and the primary servers to use the virtual server name as soon as the name is available in the system. Any existing clients using the name of the primary server can still connect. However, after you activate the standby server, the clients need to reconnect using the virtual server name.

To configure the primary computer to use the virtual server name

  • On the primary computer, activate the virtual server name by using the ActivateAT command of the TfsAdminUtil command-line utility.

    For example, the following command activates TFS_AT as the application-tier server, which is using the same IP Address as the primary server.

    TfsAdminUtil ActivateAT TFS_AT

Unfortunately, this didn’t work.  We wound up with all kinds of errors being thrown.  We couldn’t connect to either the friendly name or machine name from Team Explorer (local or remote).  We could however rename the AT back by putting the machine name into TfsAdminUtil ActivateAT, so at least we weren’t totally broken. 

We tried a few different things including running the Best Practices Analyzer but still had no luck. 

At this point I did what any self-respecting consultant would do…I had the client call Microsoft Premier Support to get it fixed.  Oops..I forgot to mention that we had to have it all up and running for Production use the next day. 🙂

The Microsoft folks were quite helpful but the steps that they gave us were “less than obvious”.  I will recount them below for anyone that may come across a similar issue.  I’m not sure why TfsAdminUtil ActivateAT didn’t work, but these steps did.

1. Configure the TFS connections using the Walkthrough: Setting up Team Foundation Server to Require HTTPS and Secure Sockets Layer (SSL) page on MSDN (http://msdn.microsoft.com/en-us/library/aa833873.aspx ).  You need to scroll about half-way down the article until you find the section marked “To update configuration information for Team Foundation Server“.  This section tells you to use TfsAdminUtil ConfigureConnections to make the necessary name changes.  Here’s the basic command-line that I used.  Of course, you will need to replace the [FriendlyName] stuff with your friendly name.  You may also have to change the port number on the SharepointAdminUri property if yours is different.

TfsAdminUtil ConfigureConnections
    /ATUri:
http://[FriendlyName]:8080
    /SharepointUri:
http://[FriendlyName]
    /SharepointSitesUri:
http://[FriendlyName]/Sites
    /SharepointAdminUri:
http://[FriendlyName]:17012
    /ReportsUri:
http://[FriendlyName]/Reports
    /ReportServerUri:
http://[FriendlyName]/ReportServer/ReportService.asmx
    /SharepointUnc:\[FriendlyName]Sites

2. After we updated the connections things were working a bit better.  We were able to access the AT from Team Explorer using the friendly name but we were still not completely running.  We were getting permission errors from some of the services.  This, we found, was caused by a Windows Server 2003 SP1 security fix that is discussed in KB 926642 – Error message when you try to access a server locally by using its FQDN or its CNAME alias after you install Windows Server 2003 Service Pack 1: “Access denied” or “No network provider accepted the given network path” (http://support.microsoft.com/kb/926642).  We needed to update the server to get around this issue.  We used Method #2 with a reboot afterward to resolve this issue.

NOTE: Please note the bold text at the bottom of the block below.  Making this change leaves you open to a specific security vulnerability.  Please try Method #1 first or better yet, don’t do this step if your can help it.

Method 2: Disable the authentication loopback check

Re-enable the behavior that exists in Windows Server 2003 by setting the DisableLoopbackCheck registry entry in the HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlLsa registry subkey to 1. To set the DisableLoopbackCheck registry entry to 1, follow these steps on the client computer:

  1. Click Start, click Run, type regedit, and then click OK.
  2. Locate and then click the following registry subkey: HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlLsa
  3. Right-click Lsa, point to New, and then click DWORD Value.
  4. Type DisableLoopbackCheck, and then press ENTER.
  5. Right-click DisableLoopbackCheck, and then click Modify.
  6. In the Value data box, type 1, and then click OK.
  7. Exit Registry Editor.
  8. Restart the computer.

Note You must restart the server for this change to take effect. By default, loopback check functionality is turned on in Windows Server 2003 SP1, and the DisableLoopbackCheck registry entry is set to 0 (zero).

The security is reduced when you disable the authentication loopback check, and you open the Windows Server 2003 server for man-in-the-middle (MITM) attacks on NTLM
. (emphasis mine)

3. That’s it for the AT.  Now we need to update all of the clients so that they have the correct metadata in their cache.  We can clear the cache by doing the following:
    a. Close Visual Studio
    b. Clear TFS Client Cache by deleting the folder below:
        XP – C:Documents and Settings[user]Local SettingsApplication DataTeam Foundation Server2.0
        Vista – C:Users[user]AppDataLocalMicrosoftTeam Foundation2.0
    c. Restart Visual Studio and connect to [FriendlyName] in Team Explorer

That did it for us.  I hope this post is helpful to someone else struggling with this.  The next step is to actually do a fail-over test to the warm standby.  Should be interesting…