-
Notifications
You must be signed in to change notification settings - Fork 0
Job failed to delete workspace #161
Comments
I saw another instance of this problem here: https://ci.ros2.org/job/ci_windows/5940/console |
Please post the event log of the build which wasn't cleaned up somewhere in order to debug the cause - either here or on colcon/colcon-core#157. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Once the necessary debug informtation has been obtained revert the hack from ros2/ci#242 by merging ros2/ci#243. |
I think I see the same issue on
"Access is denied" when uninstalling colcon
I see a colcon process that has been up for a long time, not using much CPU, and has spawned a child cmd.exe process in working directory
with the command
I think this process came from Jenkins log for |
The last thing I see in the colcon event log is starting the job
|
There is another colcon process on
with command
However the event's log looks very different. I don't see nightly_win_rel_1096_events.log
Maybe |
Icecube also has 4 stuck colcon processes from different jobs. These are interesting because the most child process is Stuck process from
|
Just for the record the order of failed jobs and their summary:
The pattern is always the same: one job fails because Jenkins looses the connection to the node and as a consequence doesn't terminate the
For a more detailed analysis about the internal state of While this problem could be Windows specific we would likely not notice if the same happens on other platforms since their second build wouldn't fail since |
Thinking about this sounds like a problem in Jenkins something we have dealt with in the past and working around in |
Two more instances of this today with both portable and icecube: |
Another instance on |
Another instance on windshield today https://ci.ros2.org/job/ci_windows/7520/console |
All 3 of the Windows hosts (icecube, windshield, and portable) had this problem today. Just killing off |
Thanks @clalancette for getting the farm turning again (we needed that). However, if anyone else sees this in the future, please leave one aside so I can try out a script that I might add to CI so it can resolve this itself in the future. |
Another instance of a failure to delete workspace on icecube https://ci.ros2.org/job/ci_windows/7683/console There were two colcon processes and one launch_test process running. One colon process was for |
This issue has been recurring frequently. For recent occurrences I've been trying to use ProcessExplorer to find out what extant processes are still holding filehandles open. I wasn't able to find one this morning but while cleaning up after aborting a hung job on windshield, I found a ros2 daemon process still running and holding on to dynamic libraries. Process info: Open filehandles matching the Jenkins workspace path I don't know if the daemon process was created during the aborted CI run. This might be a one-off or it might be part of why we're still having issues even after ros2/ci#323. I'll keep looking into new instances of this. |
I've been reading up on process control mechanisms available on Windows that might help us combat this. If we can wrap our process tree in a Job Object closing all handles to the job should kill all processes that are part of that job and their descendant processes. I tried to get an experiment together but today my Windows VM is causing my whole machine to lock up so I haven't been able to explore further. |
ros2/ci#374 brings in the subprocess_reaper.py script as recommended by @dirk-thomas |
We aren't using this repository anymore for buildfarm issues, so I'm going to archive it. Thus I'm closing out this issue. If you continue to have problems, please report another bug against https://github.com/ros2/ros2. Thank you. |
Similar to #45 (linux) and #85 (OSX) which are also errors about failing to delete a folder.
Failing nightlies on windows_portable with this error
windows_portable
failed to delete the workspace at the start of a jobThe text was updated successfully, but these errors were encountered: