You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some agents which are shutting down, but not being deallocated.
This is a similar, if not the same, issue as #64 however the fix given there doesn't work in my case. We did originally have a different idle timeout to our current one, but the agents which are failing to deallocate where created after the timeout was last changed.
Here is the relevant server log, where you can see that tcagent3 has been found to be stopped. You can also see an agent which was successfully deallocated (tcagent2) gets the Stopping virtual machine tcagent2 message. There is not a Stopping virtual machine tcagent3 message.
[2017-11-04 01:30:39,254] INFO [ey'{id=arm-1} 2] - base.tasks.UpdateInstancesTask - Updated instance 'tcagent3' status to Stopped based on API information
[2017-11-04 01:30:39,254] INFO [ey'{id=arm-1} 2] - uds.base.AbstractCloudInstance - Changing tcagent3(247c3648) status from Running to Stopped
[2017-11-04 01:30:40,647] INFO [uled executor 5] - .server.impl.CloudEventsLogger - Cloud instance entered 'stopped' state, profile 'Azure'{id=arm-1}, AzureCloudInstance{myName='tcagent4'}
[2017-11-04 01:30:40,647] INFO [uled executor 5] - .server.impl.CloudEventsLogger - Cloud agent Unregistered agent "tcagent4" {id=298} is no longer recognized as cloud agent because instance has gone: profileId=arm-1, imageId=tcagent, instanceId=tcagent4
[2017-11-04 01:30:40,647] INFO [uled executor 5] - ver.instances.AgentRemoverImpl - Removing cloud agent Unregistered agent "tcagent4" {id=298} because: Cloud instance (id=tcagent4, imageId=tcagent, profileId=arm-1) has gone (is not reported by cloud profile)
[2017-11-04 01:30:40,663] INFO [uled executor 5] - .server.impl.CloudEventsLogger - Cloud instance entered 'stopped' state, profile 'Azure'{id=arm-1}, AzureCloudInstance{myName='tcagent3'}
[2017-11-04 01:30:40,665] INFO [uled executor 5] - .server.impl.CloudEventsLogger - Cloud agent Unregistered agent "tcagent3" {id=299} is no longer recognized as cloud agent because instance has gone: profileId=arm-1, imageId=tcagent, instanceId=tcagent3
[2017-11-04 01:30:40,666] INFO [uled executor 5] - ver.instances.AgentRemoverImpl - Removing cloud agent Unregistered agent "tcagent3" {id=299} because: Cloud instance (id=tcagent3, imageId=tcagent, profileId=arm-1) has gone (is not reported by cloud profile)
[2017-11-04 01:34:44,822] INFO [uled executor 5] - ances.KillIdleRunningInstances - Terminating instance AzureCloudInstance{myName='tcagent2'}, profile 'Azure'{id=arm-1} with idle time 15 minutes (idle time is 15 minutes)
[2017-11-04 01:34:44,834] INFO [uled executor 5] - l.instances.StopInstanceAction - Terminating instance: AzureCloudInstance{myName='tcagent2'}, profile 'Azure'{id=arm-1}, reason: idle timeout of 15 minutes reached
[2017-11-04 01:34:44,837] INFO [uled executor 5] - r.impl.DBCloudStateManagerImpl - Image: AzureCloudImage{myName='tcagent'}, profile: profile 'Azure'{id=arm-1} was marked to CONTAIN agent
[2017-11-04 01:34:44,840] INFO [uled executor 5] - uds.base.AbstractCloudInstance - Changing tcagent2(47caa6e6) status from Running to Scheduled to stop
[2017-11-04 01:34:44,840] INFO [onPool-worker-1] - ouds.azure.arm.AzureCloudImage - Stopping virtual machine tcagent2
[2017-11-04 01:34:45,787] INFO [uled executor 1] - .server.impl.CloudEventsLogger - Cloud instance entered 'scheduled to stop' state, profile 'Azure'{id=arm-1}, AzureCloudInstance{myName='tcagent2'}
[2017-11-04 01:34:52,681] INFO [nio-80-exec-188] - ver.instances.AgentRemoverImpl - Removing cloud agent "tcagent2" {id=294} because: Instance has been terminated. AzureCloudInstance{myName='tcagent2'}, profile 'Azure'{id=arm-1}
[2017-11-04 01:34:52,708] INFO [nio-80-exec-188] - .server.impl.CloudEventsLogger - Cloud agent Unregistered agent "tcagent2" {id=294} is no longer recognized as cloud agent because instance has gone: profileId=arm-1, imageId=tcagent, instanceId=tcagent2
[2017-11-04 01:34:52,709] INFO [nio-80-exec-188] - ver.instances.AgentRemoverImpl - Removing cloud agent Unregistered agent "tcagent2" {id=294} because: Cloud instance (id=tcagent2, imageId=tcagent, profileId=arm-1) has gone (is not reported by cloud profile)
[2017-11-04 01:37:15,210] INFO [onPool-worker-1] - uds.base.AbstractCloudInstance - Changing tcagent2(47caa6e6) status from Scheduled to stop to Stopped
[2017-11-04 01:37:15,210] INFO [onPool-worker-1] - ouds.azure.arm.AzureCloudImage - Virtual machine tcagent2 has been successfully terminated
[2017-11-04 01:37:15,826] INFO [uled executor 2] - .server.impl.CloudEventsLogger - Cloud instance entered 'stopped' state, profile 'Azure'{id=arm-1}, AzureCloudInstance{myName='tcagent2'}
[2017-11-04 02:44:45,752] INFO [uled executor 4] - ances.KillIdleRunningInstances - Terminating instance AzureCloudInstance{myName='tcagent5'}, profile 'Azure'{id=arm-1} with idle time 15 minutes (idle time is 15 minutes)
[2017-11-04 02:44:45,757] INFO [uled executor 4] - l.instances.StopInstanceAction - Terminating instance: AzureCloudInstance{myName='tcagent5'}, profile 'Azure'{id=arm-1}, reason: idle timeout of 15 minutes reached
[2017-11-04 02:44:45,759] INFO [uled executor 4] - r.impl.DBCloudStateManagerImpl - Image: AzureCloudImage{myName='tcagent'}, profile: profile 'Azure'{id=arm-1} was marked to CONTAIN agent
[2017-11-04 02:44:45,761] INFO [uled executor 4] - uds.base.AbstractCloudInstance - Changing tcagent5(68ab4c2e) status from Running to Scheduled to stop
Here is the agent log from tcagent3 where we can see it has decided to shutdown. This log is a bit odd since our timeout is currently 15 minutes.
[2017-11-04 01:24:33,067] INFO - l.directories.DirectoryMapImpl - Cleaning up old checkout directories. Default lifetime = 192 hour(s)
[2017-11-04 01:29:32,582] WARN - rver.clouds.azure.IdleShutdown - Agent was idle for 25 minutes. Cloud profile timeout was 25 minutes. Instance will be shut down
[2017-11-04 01:29:32,582] WARN - rver.clouds.azure.IdleShutdown - Agent was idle for 25 minutes. Cloud profile timeout was 25 minutes. Instance will be shut down
[2017-11-04 01:29:32,582] INFO - rver.clouds.azure.IdleShutdown - To change this command define 'teamcity.agent.shutdown.command' property with proper shutdown command in the buildAgent.properties file
[2017-11-04 01:29:32,582] INFO - rver.clouds.azure.IdleShutdown - To change this command define 'teamcity.agent.shutdown.command' property with proper shutdown command in the buildAgent.properties file
[2017-11-04 01:29:32,582] INFO - rver.clouds.azure.IdleShutdown - Shutdown instance commands for windows server 2012 r2
[2017-11-04 01:29:32,582] INFO - rver.clouds.azure.IdleShutdown - Shutdown instance commands for windows server 2012 r2
[2017-11-04 01:29:32,582] INFO - rver.clouds.azure.IdleShutdown - Shutting down agent with command: shutdown -s -t 1 -c "TeamCity Agent Azure Instance shutdown on idle time" -f
[2017-11-04 01:29:32,582] INFO - rver.clouds.azure.IdleShutdown - Shutting down agent with command: shutdown -s -t 1 -c "TeamCity Agent Azure Instance shutdown on idle time" -f
[2017-11-04 01:29:32,798] INFO - rver.clouds.azure.IdleShutdown - Shutdown exit code:0
[2017-11-04 01:29:32,798] INFO - rver.clouds.azure.IdleShutdown - Shutdown call:
[2017-11-04 01:29:32,798] INFO - rver.clouds.azure.IdleShutdown - Shutting down agent with command: shutdown /s /t 1 /c "TeamCity Agent Azure Instance shutdown on idle time" /f
[2017-11-04 01:29:32,798] INFO - rver.clouds.azure.IdleShutdown - Shutdown exit code:1190
[2017-11-04 01:29:32,798] INFO - rver.clouds.azure.IdleShutdown - Shutdown call:
A system shutdown has already been scheduled.(1190)
[2017-11-04 01:29:32,798] INFO - rver.clouds.azure.IdleShutdown - Shutting down agent with command: shutdown /s /t 1 /c "TeamCity Agent Azure Instance shutdown on idle time" /f
[2017-11-04 01:29:32,813] INFO - rver.clouds.azure.IdleShutdown - Shutdown exit code:1190
[2017-11-04 01:29:32,813] INFO - rver.clouds.azure.IdleShutdown - Shutdown call:
A system shutdown has already been scheduled.(1190)
[2017-11-04 01:29:32,813] INFO - rver.clouds.azure.IdleShutdown - Shutdown exit code:1190
[2017-11-04 01:29:32,813] INFO - rver.clouds.azure.IdleShutdown - Shutdown call:
A system shutdown has already been scheduled.(1190)
[2017-11-04 01:29:33,048] INFO - r.artifacts.impl.HttpDiskCache - Cleaning up items with life time in cache greater than 172800 seconds
[2017-11-04 01:29:33,048] INFO - r.artifacts.impl.HttpDiskCache - Finished cleaning up expired items, 0 items removed
[2017-11-04 01:29:33,048] INFO - dDirectoryBasedCleanupRegistry - Removing files from C:\BuildAgent\temp\.old
[2017-11-04 01:29:33,048] INFO - dDirectoryBasedCleanupRegistry - Removing files from C:\BuildAgent\work\.old
[2017-11-04 01:29:33,067] INFO - l.directories.DirectoryMapImpl - Cleaning up old checkout directories. Default lifetime = 192 hour(s)
[2017-11-04 01:29:35,152] INFO - jetbrains.buildServer.AGENT - Received stop command from launcher
[2017-11-04 01:29:35,152] INFO - jetbrains.buildServer.AGENT - Stop command was not performed. No build to stop with reason: BuildInterruptReason.AGENT_SHUTDOWN
[2017-11-04 01:29:35,152] INFO - jetbrains.buildServer.AGENT - Starting agent shutdown sequence, reason: Stop command called
[2017-11-04 01:29:35,168] INFO - agent.impl.AgentPortFileWriter - Delete agent runtime file from C:\BuildAgent\logs\buildAgent.xmlRpcPort
[2017-11-04 01:29:35,168] INFO - jetbrains.buildServer.AGENT - Unregistering from the server. Agent id: 298
[2017-11-04 01:29:35,230] INFO - jetbrains.buildServer.AGENT - Shutdown agent WebServer start
[2017-11-04 01:29:35,261] INFO - jetbrains.buildServer.AGENT - Shutdown agent WebServer finish
[2017-11-04 01:29:35,261] INFO - jetbrains.buildServer.AGENT - Shutdown agent finish
[2017-11-04 01:29:35,261] INFO - jetbrains.buildServer.AGENT - Shutdown agent and exit with code DIE_USER_COMMAND
[2017-11-04 01:29:35,261] INFO - ent.impl.upgrade.AgentExitCode - Agent exited. Stop command called
[2017-11-04 01:29:35,261] INFO - buildServer.agent.AgentMain2$2 - Closing jetbrains.buildServer.agent.AgentMain2$2@2286778: startup date [Sat Nov 04 01:03:59 UTC 2017]; root of context hierarchy
Since the Stopping virtual machine tcagent3 message never appears in the server log, the terminateInstance method doesn't seem to be getting called for it, despite the agent deciding to shutdown. The shutdown of the agent VM causes it to be in the 'Stopped (incurring charges)' state but since the server-side plugin didn't enter terminateInstance (and then stopVmAsync), the deallocation never happens.
So the bug here seems to be that the agent is shutting down when the server-side plugin isn't in terminateInstance. I don't know why this is the case. Let me know if I can provide further logs which might be of use.
The text was updated successfully, but these errors were encountered:
@tom-dudley, I see that you are using the ARM plugin to start these machines, but in the agent log from vm tcagent3 mentioned IdleShutdown which is not used in the latest plugin builds.
Could you please verify that you're using latest version of ARM plugin and check that you don't have installed Azure Classic plugin? You could do that on the Administration -> Plugins List page.
I have some agents which are shutting down, but not being deallocated.
This is a similar, if not the same, issue as #64 however the fix given there doesn't work in my case. We did originally have a different idle timeout to our current one, but the agents which are failing to deallocate where created after the timeout was last changed.
Here is the relevant server log, where you can see that
tcagent3
has been found to be stopped. You can also see an agent which was successfully deallocated (tcagent2
) gets theStopping virtual machine tcagent2
message. There is not aStopping virtual machine tcagent3
message.Here is the agent log from
tcagent3
where we can see it has decided to shutdown. This log is a bit odd since our timeout is currently 15 minutes.Since the
Stopping virtual machine tcagent3
message never appears in the server log, theterminateInstance
method doesn't seem to be getting called for it, despite the agent deciding to shutdown. The shutdown of the agent VM causes it to be in the 'Stopped (incurring charges)' state but since the server-side plugin didn't enterterminateInstance
(and thenstopVmAsync
), the deallocation never happens.So the bug here seems to be that the agent is shutting down when the server-side plugin isn't in
terminateInstance
. I don't know why this is the case. Let me know if I can provide further logs which might be of use.The text was updated successfully, but these errors were encountered: