-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Worker sometimes fails to install a new SQL Server instance #2804
Comments
It would be great if you ping us immediately after you see it stuck so we can investigate as it happens, before build failed. Or if you increase sleep before restart to say 10 seconds, chances that it stuck will be much lower. But I prefer to wait for next issue and investigate. Out of curiosity -- why do you need to restart VM? I do not see anything which require restart happening before... |
@IlyaFinkelshteyn I will report back if I see it happening again. Where would it be best to ping you? In this issue, Twitter or somewhere else? I would rather not restart it, and haven't needed it before. But it seems that it might be that an SQL instance was installed without the image being rebooted after, so sometimes the worker started to fail to install an new instance (installing a default instance as part of our integration test - that test the SqlSetup DSC resource). |
@IlyaFinkelshteyn For reference. This is the issue dsccommunity/SqlServerDsc#1260 for which I added the restart workaround for, and this is the error that happened (which no longer happens when the restart workaround was added) https://ci.appveyor.com/project/johlju/sqlserverdsc/builds/21302743?fullLog=true#L2641 (see line 2660 for SQL Server setup error message). |
@johlju I would rather root cause SQL issue than stay with reboot workaround. Can you send a links to number of randomly failed and successful builds -- we will try to find some commonalities. Also can you create a simplified fast repro which should fail after some number of repetitions? It is a not problem for us to run a lot of repetitions as we can do that on internal account with a lot of concurrent jobs, but it would be great if repro itself it fast and simple. Regarding reboot issue you can email to team@appveyor.com with high importance and reference this issue. Most of us work in PST though but often after normal working hours too. But again, I believe we can root cause the problem. |
I tested yesterday to make sure the SQL issue still existed by running all the tests without the restart workaround, 1 of 4 test runs failed. I'm now looking at reproducing the SQL issue with a simplified branch (removed most other tests). It's probably gonna take a day or so until I see that this fails as well. |
I saw yesterday that a contributor got the same SQL issue when the restart workaround was applied, so the restart workaround only mitigate the SQL issue. I rename this issue to focus on the SQL issue. |
@IlyaFinkelshteyn Yesterday I created a simplified branch with minimum of tests to see if I could reproduce the issue, but after running the tests 16 times, none failed. This lets me believe that this might be a memory problem when running with all tests 🤔 I have seen the build worker adding more total memory as it goes, maybe the VM does not get more memory fast enough. 🤔 Maybe we have out grow the (free) AppVeyor build worker? |
It is interesting coincidence that starting from Jan 8th we instantiate VMs with 5000Mb memory and allow Hyper-V dynamic memory to grow up to 6000Mb. It was 1400Mb - 4000Mb before. We did not announced it yet because we are still monitoring how it goes and see we if have to do some adjustments. If your tests are indeed that memory hungry, you should see an improvement during last 2 days. Another option which I would recommend is to use parallel testing (which is actually a special case of build matrix) to segment tests into smaller groups which will run as a separate build jobs against the same commit. |
That is great news that you have raised the memory! I will re-test with the full test suite and see if that memory increase helped my case. |
@IlyaFinkelshteyn In my test I could not find the actual reason for the tests to fail, I evaluated if it could be a problem with the downloaded media, but the media do have the same hash both when a test run passes as when a test run fails. I have switched from using the containers and instead running two parallel (sequential on the free account) build workers, unit tests in one and integration tests in the other, and it seems stable. No errors so far, but the time i takes to to run the full test suit doubled. I need to rewriting our test framework to get the containers working in this new parallel scenario, that would speed up the testing again. I'm closing this issue at this time as I think the test suit, how it was run previously, overwhelmed the build worker. /cc @PlagueHO (FYI) |
I have seen the build worker not restarting correctly sometimes. Curious if this is a bug, known problem or us doing something wrong.
For this PR dsccommunity/SqlServerDsc#1246 it happened twice for commit 95bf616 and commit 6502e54.
See example here.
https://ci.appveyor.com/project/PowerShell/sqlserverdsc/builds/21423039
And the YAML:
https://github.com/PowerShell/SqlServerDsc/blob/dev/appveyor.yml
The text was updated successfully, but these errors were encountered: