Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make service wait on its state before stopping #84447

Merged
merged 1 commit into from
Apr 7, 2023

Conversation

ericstj
Copy link
Member

@ericstj ericstj commented Apr 6, 2023

Fix: #84438

Make sure the service allows SCM to transition to Running and Stopped before exiting the test service.

@ghost
Copy link

ghost commented Apr 6, 2023

Tagging subscribers to this area: @dotnet/area-extensions-hosting
See info in area-owners.md if you want to be subscribed.

Issue Details

Fix: #84438

Make sure the service allows SCM to transition to Running and Stopped before exiting the test service.

Author: ericstj
Assignees: -
Labels:

area-Extensions-Hosting

Milestone: -

@buyaa-n
Copy link
Contributor

buyaa-n commented Apr 6, 2023

Make sure the service allows SCM to transition to Running and Stopped before exiting the test service.

Host was stopped before it was completely started? Just curious what would happen if it happened to a real service? (the test failure log looks weird, could not tell for sure, looks like it was not stopped)

@buyaa-n buyaa-n closed this Apr 6, 2023
@buyaa-n buyaa-n reopened this Apr 6, 2023
@buyaa-n
Copy link
Contributor

buyaa-n commented Apr 6, 2023

buyaa-n closed this

Oops sorry, wrong button clicked

Copy link
Contributor

@buyaa-n buyaa-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

@ericstj
Copy link
Member Author

ericstj commented Apr 6, 2023

Host was stopped before it was completely started?

Yes, that's more or less what I suspect.

Just curious what would happen if it happened to a real service?

I wouldn't expect a real service to be doing this - stopping itself immediately after starting. Probably it would always do some work before stopping. It's possible there is still some bug here in how StopAsync works -- it's possible SCM itself has a bug -- that was never tested before -- not even on ServiceBase. We should keep an eye out for how this test behaves.

(the test failure log looks weird, could not tell for sure, looks like it was not stopped)

The service was left running and we timed out waiting for the status to change to stopped. The remotexecutor infrastructure then gathered a dump of the process when it was disposed. You can fetch the dump and console log with runfo get-helix-payload.
The dump shows that the service process was waiting for dispatcher routine to return. My best guess is that SCM didn't cleanly enter the stopped state when the service state was set to stopped.

000000BB5F9FF410 00007ff894fe3c04 [InlinedCallFrame: 000000bb5f9ff410] Interop+Advapi32.g____PInvoke|35_0(IntPtr)
000000BB5F9FF410 00007ff81538b4d9 [InlinedCallFrame: 000000bb5f9ff410] Interop+Advapi32.g____PInvoke|35_0(IntPtr)
000000BB5F9FF3E0 00007ff81538b4d9 ILStubClass.IL_STUB_PInvoke(IntPtr)
000000BB5F9FF4D0 00007ff81538b141 Interop+Advapi32.StartServiceCtrlDispatcher(IntPtr)
000000BB5F9FF520 00007ff815389cb2 System.ServiceProcess.ServiceBase.Run(System.ServiceProcess.ServiceBase[])
000000BB5F9FF670 00007ff815389403 System.ServiceProcess.ServiceBase.Run(System.ServiceProcess.ServiceBase)
000000BB5F9FF6C0 00007ff815389130 Microsoft.Extensions.Hosting.WindowsServices.WindowsServiceLifetime.Run()
000000BB5F9FF730 00007ff815388fea System.Threading.Thread+StartHelper.RunWorker() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 73]
000000BB5F9FF780 00007ff815388e2b System.Threading.Thread+StartHelper.Run() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 54]
000000BB5F9FF7C0 00007ff815388c86 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
000000BB5F9FFA50 00007ff874ddb1e3 [DebuggerU2MCatchHandlerFrame: 000000bb5f9ffa50] 

The goal of this test was to prove that StopAsync works when initiated by the process, so I'm OK with introducing this wait to make that a more representative test.

@buyaa-n
Copy link
Contributor

buyaa-n commented Apr 6, 2023

Thank you for the broad explanation, makes sense!

@ericstj
Copy link
Member Author

ericstj commented Apr 6, 2023

After writing this up and being annoyed with how little information I got from the dump, I tried a different debugger:

0:000> !dumpasync
Statistics:
              MT    Count    TotalSize Class Name
00007ff8159632d0        1           80 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.WindowsServiceLifetimeTests+<>c+<<ServiceCanStopItself>b__4_0>d, Microsoft.Extensions.Hosting.WindowsServices.Tests]]
00007ff815962f30        1           80 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.Internal.Host+<StopAsync>d__14, Microsoft.Extensions.Hosting]]
00007ff815962cf0        1           80 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.WindowsServices.WindowsServiceLifetime+<StopAsync>d__18, Microsoft.Extensions.Hosting.WindowsServices]]
Total 3 objects
         Address               MT     Size      State Description
000001c44b0352c0 00007ff815962cf0       80          1 Microsoft.Extensions.Hosting.WindowsServices.WindowsServiceLifetime+<StopAsync>d__18
000001c44b035310 00007ff815962f30       80          1 Microsoft.Extensions.Hosting.Internal.Host+<StopAsync>d__14
000001c44b035360 00007ff8159632d0       80          0 Microsoft.Extensions.Hosting.WindowsServiceLifetimeTests+<>c+<<ServiceCanStopItself>b__4_0>d

So that means we are waiting at the following states:

await _serviceDispatcherStopped.Task.ConfigureAwait(false);

await _hostLifetime.StopAsync(token).ConfigureAwait(false);

This agrees with what I shared above. We've already Stopped the ServiceBase from inside the process and we're waiting for the dispatcher to exit.

I can dump what state the service believes it's in:

0:004> !DumpObj /d 000001c44b01e4d0
Name:        Microsoft.Extensions.Hosting.WindowsServiceLifetimeTests+LoggingWindowsServiceLifetime
MethodTable: 00007ff81565c218
EEClass:     00007ff81568f890
Tracked Type: false
Size:        200(0xc8) bytes
File:        C:\h\w\B1BC0A2A\w\A109091A\e\Microsoft.Extensions.Hosting.WindowsServices.Tests.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000000000000000  400001c        8 ...ponentModel.ISite  0 instance 0000000000000000 _site
0000000000000000  400001d       10 ....EventHandlerList  0 instance 0000000000000000 _events
00007ff815129518  400001b       98        System.Object  0   static 0000000000000000 s_eventDisposed
00007ff81565bc60  4000010       64 ...32+SERVICE_STATUS  1 instance 000001c44b01e534 _status
00007ff8152e9e38  4000011       50        System.IntPtr  1 instance 000001C447C774B0 _statusHandle
00007ff8157511d8  4000012       18 ...ControlCallbackEx  0 instance 000001c44b021048 _commandCallbackEx
00007ff815750ea8  4000013       20 ...rviceMainCallback  0 instance 000001c44b021008 _mainCallback
00007ff815753b30  4000014       28 ....ManualResetEvent  0 instance 000001c44b0230c0 _startCompletedSignal
00007ff815459158  4000015       30 ...ptionDispatchInfo  0 instance 0000000000000000 _startFailedException
00007ff81524f4e0  4000016       58         System.Int32  1 instance                5 _acceptedCommands
00007ff8153a1cb0  4000017       38        System.String  0 instance 000001c44b023008 _serviceName
00007ff81524c140  4000018       5c       System.Boolean  1 instance                1 _nameFrozen
00007ff81524c140  4000019       5d       System.Boolean  1 instance                1 _commandPropsFrozen
00007ff81524c140  400001a       5e       System.Boolean  1 instance                0 _disposed
00007ff81524c140  400001b       5f       System.Boolean  1 instance                1 _initialized
00007ff815129518  400001c       40        System.Object  0 instance 000001c44b01e6b0 _stopLock
00007ff8158e8238  400001d       48 ...gnostics.EventLog  0 instance 000001c44b02f3a8 _eventLog
00007ff81524c140  400001f       60       System.Boolean  1 instance                1 <AutoLog>k__BackingField
00007ff8157114b0  4000005       80 ...Private.CoreLib]]  0 instance 000001c44b01e5c8 _delayStart
00007ff8157114b0  4000006       88 ...Private.CoreLib]]  0 instance 000001c44b01e628 _serviceDispatcherStopped
00007ff815711940  4000007       90 ...ualResetEventSlim  0 instance 000001c44b01e688 _delayStop
00007ff8156537d0  4000008       98 ...sting.HostOptions  0 instance 000001c44b01e830 _hostOptions
00007ff81524c140  4000009       b8       System.Boolean  1 instance                1 _serviceStopRequested
00007ff815516928  400000a       a0 ...plicationLifetime  0 instance 000001c44b01acc0 <ApplicationLifetime>k__BackingField
00007ff815628940  400000b       a8 ....IHostEnvironment  0 instance 000001c44b00dea0 <Environment>k__BackingField
00007ff815658380  400000c       b0 ...s.Logging.ILogger  0 instance 000001c44b01e6e0 <Logger>k__BackingField
0:004> !DumpVC /d 00007ff81565bc60 000001c44b01e534
Name:        Interop+Advapi32+SERVICE_STATUS
MethodTable: 00007ff81565bc60
EEClass:     00007ff81568f788
Size:        48(0x30) bytes
File:        C:\h\w\B1BC0A2A\w\A109091A\e\System.ServiceProcess.ServiceController.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ff81524f4e0  400016e        0         System.Int32  1 instance               16 serviceType
00007ff81524f4e0  400016f        4         System.Int32  1 instance                4 currentState
00007ff81524f4e0  4000170        8         System.Int32  1 instance                5 controlsAccepted
00007ff81524f4e0  4000171        c         System.Int32  1 instance                0 win32ExitCode
00007ff81524f4e0  4000172       10         System.Int32  1 instance                0 serviceSpecificExitCode
00007ff81524f4e0  4000173       14         System.Int32  1 instance                0 checkPoint
00007ff81524f4e0  4000174       18         System.Int32  1 instance                0 waitHint

So 4 currentState means that it's

So if we look at where we set it to RUNNING, it's here:

private void ServiceQueuedMainCallback(object state)
{
string[] args = (string[])state;
try
{
OnStart(args);
WriteLogEntry(SR.StartSuccessful);
_status.checkPoint = 0;
_status.waitHint = 0;
_status.currentState = ServiceControlStatus.STATE_RUNNING;
}
catch (Exception e)
{
WriteLogEntry(SR.Format(SR.StartFailed, e), EventLogEntryType.Error);
_status.currentState = ServiceControlStatus.STATE_STOPPED;
// We capture the exception so that it can be propagated
// from ServiceBase.Run.
// We also use the presence of this exception to inform SCM
// that the service failed to start successfully.
_startFailedException = ExceptionDispatchInfo.Capture(e);
}
_startCompletedSignal!.Set();
}

Right before that, it called OnStart which completes the task that was returned by WaitForStartAsync

protected override void OnStart(string[] args)
{
_delayStart.TrySetResult(null);
base.OnStart(args);
}

Note at this point the main thread waiting on host.Start() is free to proceed as if the service is started, but it's not.

The OnStart call was happening on a background thread and completes, after which, the main dispatcher thread waits for that background thread to signal, then sets the service to RUNNING from that ServiceMainCallback thread:

_startCompletedSignal.WaitOne();
if (_startFailedException != null)
{
// Inform SCM that the service could not be started successfully.
// (Unless the service has already provided another failure exit code)
if (_status.win32ExitCode == 0)
{
_status.win32ExitCode = ServiceControlStatus.ERROR_EXCEPTION_IN_SERVICE;
}
}
statusOK = SetServiceStatus(_statusHandle, pStatus);
if (!statusOK)

If we were to stop the service in this time (between OnStop() and ServiceMainCallback completing), it's somewhat unpredictable what would happen -- since SCM is still waiting for the service's main routine to register it's handler function and return. I would place a bet that's what's happening.

Essentially ServiceBase provides no way for a derived service to know once it's actually fully started. OnStart happens some time before it's actually done telling SCM that it's started. I think you could have the same problem with ServiceBase alone if you queued work in OnStart that finished super fast, then called Stop.

TL;DR - waiting for service to enter RUNNING state might be the right thing to do regardless.

I might see if there's any way to keep WaitForStartAsync blocked a little longer to guarantee that the service is started.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing this.

@ericstj ericstj merged commit dcc0d25 into dotnet:main Apr 7, 2023
ericstj added a commit to ericstj/runtime that referenced this pull request May 2, 2023
ericstj added a commit to ericstj/runtime that referenced this pull request May 2, 2023
ericstj added a commit that referenced this pull request May 4, 2023
…5661)

* Make WindowsServiceLifetime gracefully stop (#83892)

* Make WindowsServiceLifetime gracefully stop

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.

* Alternate approach to ensuring we only ever set STATE_STOPPED once.

* Avoid calling ServiceBase.Stop on stopped service

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.

* Add tests for WindowsServiceLifetime

These tests leverage RemoteExecutor to avoid creating a separate service
assembly.

* Respond to feedback and add more tests.

This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.

* Honor Cancellation in StopAsync

* Fix bindingRedirects in RemoteExecutor

* Use Async lambdas for service testing

* Fix issue on Win7 where duplicate service descriptions are disallowed

* Respond to feedback

* Fix comment and add timeout

* Fix test condition

# Conflicts:
#	src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs

* Enable M.E.H.WindowsServices and S.SP.ServiceController for servicing

* Make service wait on its state before stopping (#84447)

* Fix WindowsService Tests where RemoteExecutor is unsupported

# Conflicts:
#	src/libraries/Microsoft.Extensions.Hosting.WindowsServices/tests/UseWindowsServiceTests.cs

* Port changes to 6.0 codebase

* Version Microsoft.Windows.Compatibility

---------

Co-authored-by: Vladimir Sadov <vsadov@microsoft.com>
ericstj added a commit that referenced this pull request May 4, 2023
…5656)

* Make WindowsServiceLifetime gracefully stop (#83892)

* Make WindowsServiceLifetime gracefully stop

WindowsServiceLifetime was not waiting for ServiceBase to stop the service.  As a result
we would sometimes end the process before notifying service control manager that the service
had stopped -- resulting in an error in the eventlog and sometimes a service restart.

We also were permitting multiple calls to Stop to occur - through SCM callbacks, and through
public API.  We must not call SetServiceStatus again once the service is marked as stopped.

* Alternate approach to ensuring we only ever set STATE_STOPPED once.

* Avoid calling ServiceBase.Stop on stopped service

I fixed double-calling STATE_STOPPED in ServiceBase, but this fix will
not be present on .NETFramework.  Workaround that by avoiding calling
ServiceBase.Stop when the service has already been stopped by SCM.

* Add tests for WindowsServiceLifetime

These tests leverage RemoteExecutor to avoid creating a separate service
assembly.

* Respond to feedback and add more tests.

This better integrates with the RemoteExecutor component as well,
by hooking up the service process and fetching its handle.

This gives us the correct logging and exitcode handling from
RemoteExecutor.

* Honor Cancellation in StopAsync

* Fix bindingRedirects in RemoteExecutor

* Use Async lambdas for service testing

* Fix issue on Win7 where duplicate service descriptions are disallowed

* Respond to feedback

* Fix comment and add timeout

* Fix test condition

* Enable M.E.H.WindowsServices and S.SP.ServiceController for servicing

* Make service wait on its state before stopping (#84447)

* Fix WindowsService Tests where RemoteExecutor is unsupported

* Enable MS.W.C for servicing

* Reference latest Microsoft.Extensions.Logging.Abstractions

This package has been serviced and we compile against the serviced
version of its assemblies.

None of the directly referenced projects have been serviced so our
package doesn't restore the serviced versions.

Lift up the dependency on Logging.Abstractions to ensure we reference
the serviced package.

---------

Co-authored-by: Vladimir Sadov <vsadov@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators May 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WindowsServiceLifetimeTests.ServiceCanStopItself fails in CI
3 participants