-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests (and Lint?) freeze on Win64 #7942
Comments
Can you force a backtrace (either from gdb or WinDbg )? |
I can try, but will have to get better at working gdb and/or windbg to figure out how. |
|
Will give that a try. I'm clearly not in the habit of running the tests with Edit: only in win64, and still happens after deleting |
All I see with |
oh, right. usually it does backtrace on the wrong thread. need to first switch back to |
This at all meaningful to you? https://gist.github.com/9985064c916005f0cb04 In this case there were 2 processes stuck each using 100% of a core. |
(Copied from JuliaStrings/utf8proc#18) I have managed to have a single process get stuck here, now. Dominant call stack: The second-place finisher has VirtualQuery at the top of the call stack. Full Very Sleepy profile: https://dl.dropboxusercontent.com/u/16873321/capture.sleepy |
So after a number of crashes (of virtualbox) I finally managed to compile a julia on a 64bit windows 7 in virtual box. @tkelman How do you usually reproduce this? (commit? which test?) |
master, just run all the tests in a loop until something freezes. Watch in task manager to see whether a julia process looks stuck without memory consumption changing at all. |
And |
Depends where you got gdb from, but that's what I was using above (usually with cygwin's gdb IIRC). |
OK. I'll see if I can get it work... BTW thanks for the windows compilation manual, it's super clear. |
Before you start anything too time-intensive, did you build with |
I realized that I'm not after LLVM compilation is done... If you have got it freeze with a debug build of llvm I guess I'll just abort the current test and build the LLVM again with debug on, especially since the terminal is frozen for the test right now................ |
I haven't yet caught a freeze in gdb with a debug llvm, but I'm trying that now, and presumably if you want to watch llvm local variables and step through llvm code you'll need the debug info anyway. |
I see. Anyway, I'm building the llvm with debug on now and will see if it helps. Given how long it took me to get the well documented compilation working, it might take even longer to get the debugging working = =.... |
@tkelman Interestingly, it seems to hang the whole system (the host!!) when the freeze happens. |
Oh my. I wonder whether that teaches us anything other than LLVM and Virtualbox are having a bad interaction? I had some similar wackiness when trying to run the cross-compiled build under wine under docker. Apparently we're venturing a bit beyond the design parameters, or something. |
Actually to be precise, I'm not sure if it is the freeze we expect. I just see the system become very slow for a minute while the vbox display was stuck at the |
That could possibly be related to the broken pipe warning we've been seeing since #12144? You could try checking out the immediate predecessor on master right before that was merged. |
A broken pipe warning crashes vbox? Hmm... Running the full test is a little too resources consuming when I want to work on something else at the same time. I saw one instance of freezing in |
Related question. Is the llvm version used on AppVeyor a debugging build or can it be a debug version? |
It's a release+assertions build IIRC, but I could upload a debug build if you want to test things out. Would need to tweak a couple lines in |
No need for now. Maybe if I couldn't get anything out of a local test (and if I still want to spend more time on it at that point). |
Thanks for trying, it's much appreciated to have anyone else looking at these long-standing serious bugs. |
I've gotten this to happen on 2 different Win64 computers, one Sandy Bridge, one Haswell. Happens when running
runtests.jl all
in parallel, seemingly more often the more cores I use for the tests. One of the workers gets stuck on its first test - so usually linalg, but I just got it to happen even on the strings test - while the rest of the workers happily finish everything else, waiting right before runningparallel
at the very end like they're supposed to.This isn't just the usual linalg slowness, I've left these going on multiple computers for half an hour or longer. The offending processes are stuck at 100% of a single core, but the memory consumption isn't changing at all.
This doesn't happen on Win32, or when
JULIA_CPU_CORES=1
. Any ideas how to narrow this down? OpenBlas interaction? Win64 codegen problem? Something to do with libuv and task spawning? Ignore it and hope it doesn't show up in normal code?The text was updated successfully, but these errors were encountered: