-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote targets: tarballs and source repos #5351
Conversation
This obsoletes #5333, so I closed that PR. Obv, can be reopened if anyone has an issue with that. |
The win32 failures were btw of the following variety:
|
First guess about the Windows CI failure: after each test, once the git program has terminated, we delete the repo; however Windows is lazy with releasing file locks so the files git had open are still locked for a few miliseconds after it terminates. So a solution might be to do a |
It's only lazy if the process exits without releasing the locks. In that the interrupt to release the lock I think it it's likely that there is just an existing open handle to the file/folder preventing the cleanup from happening. The I/O manager should know to process interrupts in order so the delete should never be sent before the unlock if they are both queued. It seems that Appveyor may have windows defender still enabled https://help.appveyor.com/discussions/questions/2898-spuriously-failing-test-due-to-seemingly-stray-open-handles they say it's seemingly disabled in the A common workaround is to have a retry loop (what postgres etc do, and what we do in the ghc testsuite) or to use |
Windows is so bonkers: https://superuser.com/questions/918476/in-windows-do-file-locks-stay-after-a-process-was-terminated-via-taskkill I mean, seriously, the only thing a parent process can sanely wait on to make sure it doesn't trample on shared resources is the child process terminating, and then Win32 goes and removes that option. |
The behavior is still there if the child process is well behaved. If the
child process exits abnormally or doesn't clean up its resources the only
thing it guarantees is that eventually the resources will be freed.
…On Wed, May 30, 2018, 10:43 Duncan Coutts ***@***.***> wrote:
Windows is so bonkers:
https://superuser.com/questions/918476/in-windows-do-file-locks-stay-after-a-process-was-terminated-via-taskkill
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365202.aspx
I mean, seriously, the only thing a parent process can sanely wait on to
make sure it doesn't trample on shared resources is the child process
terminating, and then Win32 goes and removes that option.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5351 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABH3KbGuxFCJ73vZpZVlB9NnTuTlIwaDks5t3rAlgaJpZM4URgBU>
.
|
the VS 2015 image-type was used; I've switched it to VS 2017 and triggered a rebuild of this PR; let's see if this changes anything. |
If it doesn't we can turn up the verbosity in that test and track down at what point the failure occurs. I'm just guessing currently (though I think it's a pretty reasonable guess). |
@dcoutts I'm afraid we need to go for that verbosity increase... |
@dcoutts with increased verbosity, we now get (the git config error is orthogonal):
|
Is anyone testing on non-CI windows? I mean, not that it fixes the issue entirely but that could help figure out a solution/if it’s a CI config issue.
…On Fri, Jun 01, 2018 at 8:03 AM Herbert Valerio Riedel < Herbert Valerio Riedel ( Herbert Valerio Riedel ***@***.***> ) > wrote:
@dcoutts ( https://github.com/dcoutts ) with increased verbosity, we now
get (the git config error is orthogonal):
Running 1 test suites... Test suite unit-tests: RUNNING... File
modification time resolution calibration completed, maximum delay
observed: 125.003 ms. Will be using delay of 250.006 for test runs.
"C:\Program Files\Git\mingw64\bin\git.exe" "--version" Warning: cannot
determine version of C:\Program Files\Git\mingw64\bin\git.exe : "git
version 2.16.2.windows.1\n" "C:\Program Files\Git\mingw64\bin\git.exe"
"init" Initialized empty Git repository in
C:/Users/appveyor/AppData/Local/Temp/vcstest-1344/src/.git/ "C:\Program
Files\Git\mingw64\bin\git.exe" "add" "file/C" "C:\Program
Files\Git\mingw64\bin\git.exe" "commit" "--all" "--message=a patch"
"--author=Author ***@***.***>" *** Please tell me who you are. Run
git config --global user.email ***@***.***" git config --global
user.name "Your Name" to set your account's default identity. Omit
--global to set the identity only in this repository. fatal: unable to
auto-detect email address (got ***@***.***(none)') Unit Tests
"C:\Program Files\Git\mingw64\bin\git.exe" "--version"
UnitTests.Distribution.Client.VCS check VCS test framework git: FAIL
(0.18s) *** Failed! Exception:
'removeDirectoryRecursive:removeContentsRecursive:removePathRecursive:removeContentsRecursive:removePathRecursive:removeContentsRecursive:removePathRecursive:removeContentsRecursive:removePathRecursive:removeContentsRecursive:removePathRecursive:DeleteFile
"C:\\Users\\appveyor\\AppData\\Local\\Temp\\vcstest-1344\\src\\.git\\objects\\90\\781888239b7a5c4e9011438395ed68f2425f4f":
permission denied (Access is denied.)' (after 1 test): BranchingRepoRecipe
[Left (TaggedCommits "tag_A" [Commit [FileUpdate "file/C" ***@***.***"]])]
Use --quickcheck-replay=696078 to reproduce.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub (
#5351 (comment) ) , or mute
the thread (
https://github.com/notifications/unsubscribe-auth/ABAj_aUrn8mYsFRs5uXD57J535MX3dFmks5t4VesgaJpZM4URgBU
).
|
I have Windows installed, can run tests for you guys. |
@dcoutts I incremented the sleep-time, and it didn't change anything; in fact I don't think the threadDelay is in the code-path we care about, as I noticed |
Ah, ok, so git fails (due to the author thing) and we propagate an exception, the bracket handler executes the cleanup and so we end up with the same problem, that we didn't wait between the end of the (failing) test and removing the dir contents. So we should 1. adjust the git flags to provide the user name and email or whatever git is complaining about and 2. move the thread delay inside the remove dir cleanup handler so that it delays in both the success and failing cases (otherwise any future failures will also be masked by this confusing and annoying failure). Or switch from simple delay to the try-multiple-times dir cleanup. |
@Mistuke, so even the generous 10s delay seems to leave this issue unaffected; any ideas/suggestions what we can/should do? I still don't fully understand what's going on here. |
This is seriously messed up.
This means it tried 30 times, waiting a second between each go. This is clearly more than just waiting for file locks from a dead process to be released by the kernel. Two plausible scenarios I can think of:
|
So apparently the theory is that on windows git's emulated But this doesn't seem to fit the facts.
|
This is looking like enough of a mess that I’d almost consider just letting this one fail on Windows.
…On Wed, Jun 06, 2018 at 1:06 PM Duncan Coutts < Duncan Coutts ( Duncan Coutts ***@***.***> ) > wrote:
So apparently the theory is that on windows git's emulated exec impl causes
the git process to terminate while a git child process is still running,
executing the command.
But this doesn't seem to fit the facts.
* We don't generally have a problem waiting for git commands to terminate.
These tests run lots and lots of git commands in sequence and then check
the state of the filesystem afterwards. If the git process was terminating
before the work was actually complete we would see test failures much
earlier.
* even if the child process took a moment to complete its work an
terminate at the end of the sequence of calls, are we really saying it'd
take 30 seconds?
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub (
#5351 (comment) ) , or mute
the thread (
https://github.com/notifications/unsubscribe-auth/ABAj_TLXo5JJL5YcUTU1xtCnduNXpVLQks5t6DZFgaJpZM4URgBU
).
|
Also, git uses a lock file to detect concurrent execution on the same repo and complain, That situation is not being reported by git in this test. So again this suggests that git really is not running concurrently and has really terminated, but something else is holding files open. |
It occurred to me that this may not be a file locking issue at all. It may simply be, as the error states a permissions one. I forgot that another thing we account for in the GHC testsuite is that these psuedo posix tools of msys2 sometimes set the In the GHC testsuite we work around it by if the error is a Googling around a bit some people doing git-automation have run into the same issue loot/loot@01d60ce and solved it same way. e.g. this can be verified by setting https://www.appveyor.com/docs/how-to/rdp-to-build-worker/, logging into the machine and trying to delete the folder manually (from the commandline, not using explorer, explorer clears the flag first so it works.) or checking the file permissions or just implement a call to |
Ok, looking into it some more, this makes a lot more sense now. git's internal object database is immutable, because once a object is created, it can never be changed https://git.wiki.kernel.org/index.php/Git#Implementation e.g. amending a commit creates a new hash. so replaying the actions you'll find
and a subsequent
So indeed, it's not a file locking issue (though for robustness in the presence of AVs we should still leave that code that retries in) it is simply a permissions one and a difference with the unix APIs in that Win32 won't allow the removal of read-only files. |
@Mistuke @dcoutts so all we need is to use |
@hvr yeah, that looks like it'll do the job! |
Seems to be a different issue now. |
It's definitely something about the new tests that's making it hang. |
Looks very much like |
I might have jumped the gun earlier on pointing the finger at the VCS tests and assuming it had hung, and killed the test run too quickly. |
OK, I think we've got a handle on the problem here: the VCS tests are catastrophically slow on AppVeyor.
I initially misdiagnosed this as a hang because I'm calling it quits for tonight. I hope that this is enough for you to go on in chasing the real problem down, @typedrat. |
In the appveyor file you have:
notice the order, you're putting msys before the git. My guess is if you do I assume the git in If not try actually installing
But I think that's what they pre-installed. |
Btw if it's still slow after this enable git tracing by exporting |
Now it's a different error 😂 I think the final directory is too long because of the fully qualified package name. Add a |
@Mistuke @typedrat @quasicomputational Now we're at...
|
Still have one failing test. |
I wish I knew why it was failing sometimes on the manpage thing. That's either a bug in |
I don't know if that is caused by this patch or not. |
appveyor.yml
Outdated
- appveyor-retry cabal new-build exe:cabal exe:cabal-tests --only-dependencies | ||
- cabal %CABOPTS% new-build exe:cabal | ||
- cabal %CABOPTS% new-run cabal-tests -- -j3 --with-cabal=dist-newstyle\build\x86_64-windows\ghc-8.0.2\cabal-install-2.3.0.0\build\cabal\cabal.exe | ||
- appveyor-retry cabal new-build cabal-install:tests --only-dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires %CABOPTS%
otherwise it won't try to use the cache for components we just built. the other retry commands too I think.
build_script: | ||
- cd Cabal | ||
- ghc --make -threaded -i -i. Setup.hs -Wall -Werror -XRank2Types -XFlexibleContexts | ||
environment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing APPVEYOR_SAVE_CACHE_ON_ERROR: True
, the issue is, right now it's not using the cache at all due to the build not completing, so it's in a bit of a circular dependency. we need the cache in order to finish on time, but it needs to finish to create the cache. You can see that it still marks all dependencies as needs download and build currently.
I think we can always save the cache so that option should be safe. See https://www.appveyor.com/docs/build-cache/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be fixed, but it wouldn't matter. The owner of the Appveyor account (@23Skidoo, I believe) needs to enable setting the cache from PR builds for it to have any effect before this is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, indeed. probably should be enabled for a bit so we get the initial cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not currently using the cache, which is why we're still running out of time. With the cache the build should only take about 10-15mins,
Tests work. 🙏 |
🚢 |
Good job! Only 8 mins to spare though on the build time 😂😂 |
Should get better as soon as this gets merged. What's that waiting on, now? |
Do it! |
VCS tests are *very* slow; it took a substantial amount of work (this is a squash of 35 commits by two authors with debugging help from @Mistuke as well) to get the test suite and build process fast enough that they can run along with the rest of it in under an hour (the Appveyor limit). Co-authored-by: quasicomputational <quasicomputational@gmail.com>
This is for review, I'll be adding more commits as I tidy things up from the
wip/remote-targets
branch.So far it's just got the VCS and Get change, a preview of which was in PR #5333. The version here should address the review comments from that PR.
And the standard checklist:
The VCS and Get modules have quite good tests. The remaining integration has only been tested manually so far...