-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow/inconsistent startup time for SQL Server LocalDB on windows-2022 image #8164
Comments
Hello @cremor, we will take a look, however, we cannot promise to start the MSSQLLocalDB service after Microsoft-hosted runner allocation, since it can affect the performance of other customers. |
@cremor , you can check event log by running powershell, for example
|
I don't know how the agent images work, but can't you start the LocalDB instance before you create the image, so that it is already running when a new agent is started from that image? Of course, those are just some ideas. If there is a way to "simply" fix the creation/startup performance then none of this is needed. |
@cremor , did you have a chance to collect Windows Application Event Log ? |
No, I've only seen that LocalDB creation error twice over a timespan of about a month. I'd have to add this to all my pipelines and hope that it happens again, which I don't really want to do. |
ok, most probably it will be something like "the mentioned service was not started for .... seconds". but let's add and see. from my side, I'll try running your steps |
@criemen , I created sample workflow
and ran it 50 times
all 50 times LocalDB started. I was not able to reproduce your case when service did not start. |
My actual pipeline calls the LocalDB start script after a .NET build step. But even in my test pipeline with that single LocalDB start step the step alone sometimes took over an minute. I didn't catch extremely long (multiple minute) durations with the simple test pipeline, but that might just have been bad (good? 😁 ) luck. As I wrote in my initial report, the time is very inconsistent. Sometimes it's fast, sometimes it's slow. The error has only happend twice for me as far as I know, so that is very rare. And I suspect that the error is just some timeout, because when it happend the step duration was around 5 minutes. So fixing the performance might also fix the error. Btw, I'm in the Azure DevOps region "West Europe" in case the region affects performance. |
I'd say that bottleneck is vm size. Standard runners are I'd suggest to move LocalDB start to the very beginning. |
But if the same project (same build step) still results in LocalDB startup times that vary between 20s and 5m, then the build can't be the factor that slows the LocalDB startup time down, right? Also, I've seen the 5 minute LocalDB startup time for very simple Web API projects (not much more than a single controller and a bit of EF Core infrastructure code). But I'll try a few things and will get back to you. |
I do not observer 5m. I suspect that you have some other component which slows things down. On clear build I observe startup times between 12s and 60s. |
anyway, please provide detailed repro step how to observe 5min startup time, so we can investigate. with repro steps provided, I see 12s-60s. |
@cremor , I modified my workflow as
I really wonder whether THAT difference is because of "script" <--> "task" semantic, but my numbers are ... |
well, I cannot reproduce your numbers... thus I cannot make measurement on my side. something like
this should collect WPT telemetry in tmp.etl and publish it as artifact. |
I'll provide the WPT telemetry file once I manage to reproduce the slow startup in my test pipeline again. Right now the LocalDB startup always takes less than 1 minute. It's really annoying that this is so inconsistent (maybe dependend on outside factors?). |
I'm not familiar with LocalDB, how |
|
ok, I'll try to run 50 builds with createdb + telemetry |
I got a 3m 7s start time. Here is the WPT telemetry file: |
good good. I made small mistake in yaml, glad you were able to fix it. |
well, in theory we can invoke |
That would be great. It seems like most of the time the "start" only takes a few seconds when "create" was executed before. Only rarely "start" alone takes long too. (If I catch it, I will provide separate trace files.)
|
@cremor , I made some tests, unfortunately "sqllocaldb create MSSQLLocalDB" will take a bit more effort from our side. good news - it can be run many times, if I add it to image generation, you still can run it from your pipeline, no error will be emitted. also, few thoughts
|
Is this also true for other databases that are preinstalled on the images? Or do the data files of that other databases use the D: drive?
If this (or some other background process) is indeed the problem, than maybe the correct solution would be to fix that resource usage instead of doing something special for LocalDB? |
Just wanted to let you know... thank god we are not the only ones with this issue. It has been bugging us on and off for a few months. But recently (as in, last week and the week before that) it has gotten significantly worse. We weren't using the work around @cremor implemented so our unit tests keep failing now and then. So at least that is something we can implement on our end to make the builds stay green. |
Yeah, Microsoft-hosted agent performance is all over the place recently. I have seen steps (not LocalDB, something completely different) that usually take less than 5 seconds and now sometimes take nearly 2 minutes 😞 |
@cremor , sorry for long delay. while I'm still looking how to properly "warmup" things, it seems that I've found the reason of accidental high disk IO. we haven't disabled the following scheduled task we'll disable it, it's easy part. |
I don't think so. Even though I don't know which time zone this would be, I've seen slow LocalDB startup at all times around a (work) day. But maybe the trigger is executed as soon as the agent starts because it's last scheduled time was missed? If that is the case, then it could indeed be the cause for all the performance problems of the agent. |
@cremor , I've figured out that deleting defender tasks does not help, it recreates them silently. we are going to add if you have spare time, can you verify that adding |
I've now ran a few more tests:
So I'd say I can't see an improvement with that command 😉 I've ran the command as early as possible (even before checkout). Here is my full test pipeline: pool:
vmImage: 'windows-latest'
steps:
- task: PowerShell@2
inputs:
targetType: 'inline'
script: 'Set-MpPreference -ScanScheduleDay 8'
- checkout: self
- task: CmdLine@2
inputs:
script: |
choco install windows-performance-toolkit
"C:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit\xperf.exe" -start -on LOADER+PROC_THREAD+DISK_IO+HARD_FAULTS+DPC+INTERRUPT+CSWITCH+PERF_COUNTER+FILE_IO_INIT+REGISTRY
- script: 'sqllocaldb start MSSQLLocalDB'
displayName: 'Start SQL Server LocalDB'
- task: CmdLine@2
inputs:
script: |
"c:\Program Files (x86)\Windows Kits\8.1\Windows Performance Toolkit\xperf.exe" -d $(Build.ArtifactStagingDirectory)\tmp.etl
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)\tmp.etl'
ArtifactName: 'etl'
publishLocation: 'Container'
|
I'm back with some good news and some bad news. bad news: LocalDB wants database to be placed in %USERPROFILE%, neither #8435, nor ilia-shipitsin@25d45e4 did not convince LocalDB to use moved database. it is created from scratch and placed to %USERPROFILE% good news: during investigation we improved several things
taking into account that we cannot warmup LocalDB due to its nature (to use %USERPOFILE%), we hope that other improvements will help to reduce disk IO (especially disabling StorSvc) |
Thanks for the information. |
my measurements show that StorSvc was the most significant Disk IO consumer. the last change (related to powershell first run warmup) is not merged yet. but it is rather cosmetic. it warms up the first invocation. if you add dummy first step with "something in powershell", that step will warmup for you |
Description
I’m running database integration tests in Azure DevOps Pipelines with the Microsoft-hosted Windows agent. The tests use MS SQL Server Express LocalDB. The time it takes to start the LocalDB instance is very inconsistent. Sometimes it takes just 20 seconds (which is still slow compared to my local dev machine), but sometimes it takes more than 5 minutes!
For that reason I can’t just use LocalDB in my tests and rely on automatic startup of the DB like I’m used to from my local dev machine because then the tests time out.
As a workaround I now have an explicit step in my pipeline:
The works most of the time, but as mentioned just this single command can take multiple minutes.
Rarely the LocalDB startup even fails. It then prints the following to the log:
(I can't check the Windows Application event log for details because this is a Microsoft-hosted agent. Enabling system diagnostics for the build also doesn't provide any more useful logs.)
Please fix this so that the LocalDB startup is consistently fast.
Maybe it would be possible to already start it when the image is created so that each new build run already gets a started LocalDB instance?
Platforms affected
Runner images affected
Image version and build link
Image: windows-2022
Version: 20230804.1.0
Is it regression?
unknown
Expected behavior
MS SQL Server Express LocalDB creation/startup time should be fast and consistent.
Ideally I could rely on the auto-start feature and not have to manually start the LocalDB with a script step.
Actual behavior
MS SQL Server Express LocalDB creation/startup time varies between a few seconds and multiple minutes.
Repro steps
The text was updated successfully, but these errors were encountered: