TFS Team Build can run forever

Problem: It is possible for TFS Server to lose track of a remote team build and believe that it is still running even if the build machine has not raised an event back to the TFS Server in a very long time (>36 hrs). It seems that the default timeout for a build is for the TFS server to wait forever.

Scenario: I was testing a team build that went awry. My build script deleted a large portion of the C: folder and hosed the machine (stop laughing). To stop the build (prior to learning that there is a TFSBUILD STOP command), I shutdown the build server. The machine was hosed and needed to be re-imaged. About 36 hrs later, I reviewed “All Builds” within Team Explorer and noticed that the TFS Server thought that the build was still running. So even after 36 hrs+, TFS hadn’t failed the build on a timeout.

Fix: I had to use the “TFSBuild.exe Stop” command to inform the TFS server that the build should be aborted. You should also run tf workspaces owner:* server:[MYSERVER] on the Build Server and the Client machine that initiated the build to update the workspace cache which will clean up any stray workspaces.

Comment: If you have a failure of a remote build machine during a build, you need to ensure that the build is cancelled on the TFS server or you may have a workspace collision when a new build is run when the machine is back up as the workspace’s local path is considered to be still “in use”. this “in use” status comes from the Build Machine or Client’s workspace cache being out of sync with the Team Foundation Server’s database. the tf workspaces command above will update the locaal cache from the server and clean this up.

Mike Ruminer has posted a step-by-step listing of the entire event on his blog.