Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When sharing the terminal with child nodes, wait for the children to terminate before exiting ourselves. #6053

Merged
merged 19 commits into from
Feb 25, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions scripts/Deploy-MSBuild.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Write-Host "Existing MSBuild assemblies backed up to $BackupFolder"
if ($runtime -eq "Desktop") {
$targetFramework = "net472"
} else {
$targetFramework = "netcoreapp2.1"
$targetFramework = "net5.0"
}

$bootstrapBinDirectory = "artifacts\bin\MSBuild.Bootstrap\$configuration\$targetFramework"
Expand All @@ -74,11 +74,11 @@ $filesToCopyToBin = @(
FileToCopy "$bootstrapBinDirectory\Microsoft.Managed.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.Managed.Before.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.Managed.After.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.Net.props"
FileToCopy "$bootstrapBinDirectory\Microsoft.NetFramework.CurrentVersion.props"
FileToCopy "$bootstrapBinDirectory\Microsoft.NetFramework.CurrentVersion.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.NetFramework.props"
FileToCopy "$bootstrapBinDirectory\Microsoft.NetFramework.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.NET.props"
tmds marked this conversation as resolved.
Show resolved Hide resolved
FileToCopy "$bootstrapBinDirectory\Microsoft.NETFramework.CurrentVersion.props"
FileToCopy "$bootstrapBinDirectory\Microsoft.NETFramework.CurrentVersion.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.NETFramework.props"
FileToCopy "$bootstrapBinDirectory\Microsoft.NETFramework.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.VisualBasic.CrossTargeting.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.VisualBasic.CurrentVersion.targets"
FileToCopy "$bootstrapBinDirectory\Microsoft.VisualBasic.targets"
Expand Down
6 changes: 3 additions & 3 deletions src/Build/BackEnd/BuildManager/BuildManager.cs
Original file line number Diff line number Diff line change
Expand Up @@ -629,7 +629,7 @@ void Callback(object state)
}
}

ShutdownConnectedNodesAsync(true /* abort */);
ShutdownConnectedNodes(true /* abort */);
CheckForActiveNodesAndCleanUpSubmissions();
}
}
Expand Down Expand Up @@ -774,7 +774,7 @@ public void EndBuild()
try
{
_noActiveSubmissionsEvent.WaitOne();
ShutdownConnectedNodesAsync(false /* normal termination */);
ShutdownConnectedNodes(false /* normal termination */);
_noNodesActiveEvent.WaitOne();

// Wait for all of the actions in the work queue to drain. Wait() could throw here if there was an unhandled exception
Expand Down Expand Up @@ -1937,7 +1937,7 @@ public void Dispose()
/// Asks the nodeManager to tell the currently connected nodes to shut down and sets a flag preventing all non-shutdown-related packets from
/// being processed.
/// </summary>
private void ShutdownConnectedNodesAsync(bool abort)
private void ShutdownConnectedNodes(bool abort)
{
_shuttingDown = true;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
using Microsoft.Build.Utilities;

using BackendNativeMethods = Microsoft.Build.BackEnd.NativeMethods;
using Task = System.Threading.Tasks.Task;
using DotNetFrameworkArchitecture = Microsoft.Build.Shared.DotNetFrameworkArchitecture;

namespace Microsoft.Build.BackEnd
{
Expand All @@ -50,6 +52,11 @@ internal abstract class NodeProviderOutOfProcBase
/// </summary>
private const int TimeoutForNewNodeCreation = 30000;

/// <summary>
/// The amount of time to wait for an out-of-proc node to exit.
/// </summary>
private const int TimeoutForWaitForExit = 30000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this this is the right timeout? Node creation takes longer than exiting, I would have thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends if we expect it to be common for nodes to fail to shut down when asked to.
If we think in the common case they should be able to shut down, we can keep the long timeout.
If in the common case they won't shut down when asked to and we don't care about the reason, we can use a lower timeout.


/// <summary>
/// The build component host.
/// </summary>
Expand Down Expand Up @@ -96,9 +103,29 @@ protected void ShutdownConnectedNodes(List<NodeContext> contextsToShutDown, bool
// Send the build completion message to the nodes, causing them to shutdown or reset.
_processesToIgnore.Clear();

// We wait for child nodes to exit to avoid them changing the terminal
// after this process terminates.
bool waitForExit = !enableReuse &&
!Console.IsInputRedirected &&
tmds marked this conversation as resolved.
Show resolved Hide resolved
Traits.Instance.EscapeHatches.EnsureStdOutForChildNodesIsPrimaryStdout;
Forgind marked this conversation as resolved.
Show resolved Hide resolved

Task[] waitForExitTasks = waitForExit && contextsToShutDown.Count > 0 ? new Task[contextsToShutDown.Count] : null;
int i = 0;
foreach (NodeContext nodeContext in contextsToShutDown)
{
nodeContext?.SendData(new NodeBuildComplete(enableReuse));
if (nodeContext is null)
{
continue;
}
nodeContext.SendData(new NodeBuildComplete(enableReuse));
if (waitForExit)
tmds marked this conversation as resolved.
Show resolved Hide resolved
{
waitForExitTasks[i++] = nodeContext.WaitForExitAsync();
tmds marked this conversation as resolved.
Show resolved Hide resolved
}
}
if (waitForExitTasks != null)
{
Task.WaitAll(waitForExitTasks);
}
}

Expand Down Expand Up @@ -139,7 +166,7 @@ protected void ShutdownAllNodes(bool nodeReuse, NodeContextTerminateDelegate ter
{
// If we're able to connect to such a process, send a packet requesting its termination
CommunicationsUtilities.Trace("Shutting down node with pid = {0}", nodeProcess.Id);
NodeContext nodeContext = new NodeContext(0, nodeProcess.Id, nodeStream, factory, terminateNode);
NodeContext nodeContext = new NodeContext(0, nodeProcess, nodeStream, factory, terminateNode);
nodeContext.SendData(new NodeBuildComplete(false /* no node reuse */));
nodeStream.Dispose();
}
Expand Down Expand Up @@ -205,7 +232,7 @@ protected NodeContext GetNode(string msbuildLocation, string commandLineArgs, in
{
// Connection successful, use this node.
CommunicationsUtilities.Trace("Successfully connected to existed node {0} which is PID {1}", nodeId, nodeProcess.Id);
return new NodeContext(nodeId, nodeProcess.Id, nodeStream, factory, terminateNode);
return new NodeContext(nodeId, nodeProcess, nodeStream, factory, terminateNode);
}
}
}
Expand Down Expand Up @@ -243,20 +270,20 @@ protected NodeContext GetNode(string msbuildLocation, string commandLineArgs, in
#endif

// Create the node process
int msbuildProcessId = LaunchNode(msbuildLocation, commandLineArgs);
_processesToIgnore.Add(GetProcessesToIgnoreKey(hostHandshake, msbuildProcessId));
Process msbuildProcess = LaunchNode(msbuildLocation, commandLineArgs);
_processesToIgnore.Add(GetProcessesToIgnoreKey(hostHandshake, msbuildProcess.Id));

// Note, when running under IMAGEFILEEXECUTIONOPTIONS registry key to debug, the process ID
// gotten back from CreateProcess is that of the debugger, which causes this to try to connect
// to the debugger process. Instead, use MSBUILDDEBUGONSTART=1

// Now try to connect to it.
Stream nodeStream = TryConnectToProcess(msbuildProcessId, TimeoutForNewNodeCreation, hostHandshake);
Stream nodeStream = TryConnectToProcess(msbuildProcess.Id, TimeoutForNewNodeCreation, hostHandshake);
if (nodeStream != null)
{
// Connection successful, use this node.
CommunicationsUtilities.Trace("Successfully connected to created node {0} which is PID {1}", nodeId, msbuildProcessId);
return new NodeContext(nodeId, msbuildProcessId, nodeStream, factory, terminateNode);
CommunicationsUtilities.Trace("Successfully connected to created node {0} which is PID {1}", nodeId, msbuildProcess.Id);
return new NodeContext(nodeId, msbuildProcess, nodeStream, factory, terminateNode);
}
}

Expand Down Expand Up @@ -392,7 +419,7 @@ private Stream TryConnectToProcess(int nodeProcessId, int timeout, Handshake han
/// <summary>
/// Creates a new MSBuild process
/// </summary>
private int LaunchNode(string msbuildLocation, string commandLineArgs)
private Process LaunchNode(string msbuildLocation, string commandLineArgs)
{
// Should always have been set already.
ErrorUtilities.VerifyThrowInternalLength(msbuildLocation, nameof(msbuildLocation));
Expand Down Expand Up @@ -491,7 +518,7 @@ private int LaunchNode(string msbuildLocation, string commandLineArgs)
}

CommunicationsUtilities.Trace("Successfully launched {1} node with PID {0}", process.Id, exeName);
return process.Id;
return process;
}
else
{
Expand Down Expand Up @@ -549,7 +576,7 @@ out processInfo
}

CommunicationsUtilities.Trace("Successfully launched {1} node with PID {0}", childProcessId, exeName);
return childProcessId;
return Process.GetProcessById(childProcessId);
}
}

Expand Down Expand Up @@ -598,9 +625,9 @@ internal class NodeContext
private int _nodeId;

/// <summary>
/// The process id
/// The node process.
/// </summary>
private int _processId;
private readonly Process _process;

/// <summary>
/// An array used to store the header byte for each packet when read.
Expand All @@ -613,14 +640,14 @@ internal class NodeContext
private byte[] _smallReadBuffer;

/// <summary>
/// Event indicating the node has terminated.
/// Delegate called when the context terminates.
/// </summary>
private ManualResetEvent _nodeTerminated;
private NodeContextTerminateDelegate _terminateDelegate;

/// <summary>
/// Delegate called when the context terminates.
/// Node was requested to terminate.
/// </summary>
private NodeContextTerminateDelegate _terminateDelegate;
private bool _isExiting;

/// <summary>
/// Per node read buffers
Expand All @@ -630,18 +657,17 @@ internal class NodeContext
/// <summary>
/// Constructor.
/// </summary>
public NodeContext(int nodeId, int processId,
public NodeContext(int nodeId, Process process,
Stream nodePipe,
INodePacketFactory factory, NodeContextTerminateDelegate terminateDelegate)
{
_nodeId = nodeId;
_processId = processId;
_process = process;
_clientToServerStream = nodePipe;
_serverToClientStream = nodePipe;
_packetFactory = factory;
_headerByte = new byte[5]; // 1 for the packet type, 4 for the body length
_smallReadBuffer = new byte[1000]; // 1000 was just an average seen on one profile run.
_nodeTerminated = new ManualResetEvent(false);
_terminateDelegate = terminateDelegate;
_sharedReadBuffer = InterningBinaryReader.CreateSharedBuffer();
}
Expand Down Expand Up @@ -761,7 +787,7 @@ public void SendData(INodePacket packet)
#else
_serverToClientStream.WriteAsync(writeStreamBuffer, i, lengthToWrite);
#endif
return;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as an FYI, this will conflict with another PR:
https://github.com/dotnet/msbuild/pull/6023/files#diff-f0e57cbf17e0f7fb207f255deb11445afccbb5c7bf0499a24b8065ff057fcdf3L764

I'm happy to resolve conflicts if needed in either PR (depending on whichever one goes first).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also keep in mind @tmds that the logic in SendData here is as follows: for large packets, we break them up in 1 MB chunks, and we write all chunks synchronously, except for the last one. The last chunk is fire-and-forget (the WriteAsync starts the write and returns). For small packets (the vast majority), the write is async and we never wait for the write to finish. In fact we see cases where the next invocation comes in and issues another write before the previous write has finished. We're just getting lucky that the framework seems to be good at serializing this.

In my PR #6023 I optimize memory allocations, which required to make all the writes synchronous (so we can reuse the byte array buffer that we pass to Write). To compensate for that, we made the entire SendData method fire-and-forget.

I haven't look into this PR deeply enough to decide whether one needs to await writing packets to the socket connected to the node process before any shutdown/killing happens. Before my PR 6023 it would be a bit harder to await each async write before killing or shutting down. After my PR you can just await the _packetWriteDrainTask here: https://github.com/dotnet/msbuild/pull/6023/files#diff-f0e57cbf17e0f7fb207f255deb11445afccbb5c7bf0499a24b8065ff057fcdf3R774 But again, I'm not sure if blocking or awaiting here is needed or not. Can we be in a situation where we still have outstanding packet writes to the node and the node dies? Do we need a try/catch somewhere?

Copy link
Member Author

@tmds tmds Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we shouldn't set _isExiting when we were unable to flush all pending data.
If we get to that point, we assume the node will act on our request to terminate.
And if it doesn't we'll end up waiting TimeoutForWaitForExit.

Let's look at how this needs to be changed after #6023 is merged.

}
else
{
Expand All @@ -777,6 +803,7 @@ public void SendData(INodePacket packet)
#endif
}
}
_isExiting = packet is NodeBuildComplete buildCompletePacket && !buildCompletePacket.PrepareForReuse;
}
catch (IOException e)
{
Expand All @@ -792,7 +819,7 @@ public void SendData(INodePacket packet)
/// <summary>
/// Closes the node's context, disconnecting it from the node.
/// </summary>
public void Close()
private void Close()
{
_clientToServerStream.Dispose();
if (!object.ReferenceEquals(_clientToServerStream, _serverToClientStream))
Expand All @@ -802,6 +829,40 @@ public void Close()
_terminateDelegate(_nodeId);
}

/// <summary>
/// Waits for the child node process to exit.
/// </summary>
public async Task WaitForExitAsync()
tmds marked this conversation as resolved.
Show resolved Hide resolved
{
// Wait for the process to exit.
if (_isExiting)
{
CommunicationsUtilities.Trace("Waiting for node with pid = {0} to exit", _process.Id);

// .NET 5 introduces a real WaitForExitAsyc.
// This is a poor man's implementation that uses polling.
int timeout = TimeoutForWaitForExit;
int delay = 5;
while (timeout > 0)
{
bool exited = _process.WaitForExit(milliseconds: 0);
if (exited)
{
return;
}
timeout -= delay;
await Task.Delay(delay).ConfigureAwait(false);

// Double delay up to 500ms.
delay = Math.Min(delay * 2, 500);
}
}

// Kill the child and do a blocking wait.
CommunicationsUtilities.Trace("Killing node with pid = {0}", _process.Id);
tmds marked this conversation as resolved.
Show resolved Hide resolved
tmds marked this conversation as resolved.
Show resolved Hide resolved
tmds marked this conversation as resolved.
Show resolved Hide resolved
_process.KillTree(timeout: -1);
}

#if FEATURE_APM
/// <summary>
/// Completes the asynchronous packet write to the node.
Expand All @@ -823,17 +884,16 @@ private bool ProcessHeaderBytesRead(int bytesRead)
{
if (bytesRead != _headerByte.Length)
{
CommunicationsUtilities.Trace(_nodeId, "COMMUNICATIONS ERROR (HRC) Node: {0} Process: {1} Bytes Read: {2} Expected: {3}", _nodeId, _processId, bytesRead, _headerByte.Length);
CommunicationsUtilities.Trace(_nodeId, "COMMUNICATIONS ERROR (HRC) Node: {0} Process: {1} Bytes Read: {2} Expected: {3}", _nodeId, _process.Id, bytesRead, _headerByte.Length);
try
{
Process childProcess = Process.GetProcessById(_processId);
if (childProcess?.HasExited != false)
if (_process.HasExited)
{
CommunicationsUtilities.Trace(_nodeId, " Child Process {0} has exited.", _processId);
CommunicationsUtilities.Trace(_nodeId, " Child Process {0} has exited.", _process.Id);
}
else
{
CommunicationsUtilities.Trace(_nodeId, " Child Process {0} is still running.", _processId);
CommunicationsUtilities.Trace(_nodeId, " Child Process {0} is still running.", _process.Id);
}
}
catch (Exception e) when (!ExceptionHandling.IsCriticalException(e))
Expand Down
1 change: 1 addition & 0 deletions src/Build/BackEnd/TaskExecutionHost/TaskExecutionHost.cs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
using Microsoft.Build.Utilities;

using TaskItem = Microsoft.Build.Execution.ProjectItemInstance.TaskItem;
using Task = System.Threading.Tasks.Task;

namespace Microsoft.Build.BackEnd
{
Expand Down
1 change: 1 addition & 0 deletions src/Build/Microsoft.Build.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
<Compile Include="..\Shared\NodeBuildComplete.cs" />
<Compile Include="..\Shared\NodeShutdown.cs" />
<Compile Include="..\Shared\NamedPipeUtil.cs" />
<Compile Include="..\Shared\ProcessExtensions.cs" />
<Compile Include="..\Shared\TaskLoggingHelper.cs">
<Link>BackEnd\Components\RequestBuilder\IntrinsicTasks\TaskLoggingHelper.cs</Link>
<ExcludeFromStyleCop>True</ExcludeFromStyleCop>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
using System.IO;
using Microsoft.Build.Shared;

namespace Microsoft.Build.Utilities
namespace Microsoft.Build.Shared
{
internal static class ProcessExtensions
{
Expand Down Expand Up @@ -77,10 +77,21 @@ private static void GetAllChildIdsUnix(int parentId, ISet<int> children)

private static void KillProcessUnix(int processId)
{
RunProcessAndWaitForExit(
"kill",
$"-TERM {processId}",
out string _);
try
{
using Process process = Process.GetProcessById(processId);
process.Kill();
}
catch (ArgumentException)
{
// Process already terminated.
tmds marked this conversation as resolved.
Show resolved Hide resolved
return;
}
catch (InvalidOperationException)
{
// Process already terminated.
return;
}
}

private static int RunProcessAndWaitForExit(string fileName, string arguments, out string stdout)
Expand All @@ -102,8 +113,13 @@ private static int RunProcessAndWaitForExit(string fileName, string arguments, o
}
else
{
process.Kill();

try
{
process.Kill();
}
catch (InvalidOperationException)
{ }

// Kill is asynchronous so we should still wait a little
//
process.WaitForExit((int) TimeSpan.FromSeconds(1).TotalMilliseconds);
Expand Down
Loading