Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading output from readandwrite freezes on Windows #7082

Closed
tkelman opened this issue Jun 2, 2014 · 26 comments
Closed

reading output from readandwrite freezes on Windows #7082

tkelman opened this issue Jun 2, 2014 · 26 comments
Labels
bug Indicates an unexpected problem or unintended behavior io Involving the I/O subsystem: libuv, read, write, etc. system:windows Affects only Windows upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@tkelman
Copy link
Contributor

tkelman commented Jun 2, 2014

test/repl.jl was added in #6955, most of the file is @unix_only at the moment except a few lines at the very end. These lines are freezing for me on Windows, but they're using process commands that you'd think should work.

  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+3381 (2014-06-02 13:19 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit ddf4197* (0 days old master)
|__/                   |  x86_64-w64-mingw32

julia> exename = joinpath(JULIA_HOME, "julia")
"D:\\code\\msys64\\home\\Tony\\julia\\usr\\bin\\julia"

julia> outs, ins, p = readandwrite(`$exename -f --quiet`)
(Pipe(active, 0 bytes waiting),Pipe(open, 0 bytes waiting),Process(`'D:\code\msy
s64\home\Tony\julia\usr\bin\julia' -f --quiet`, ProcessRunning))

julia> write(ins,"1\nquit()\n")
9

julia> ins
Pipe(open, 0 bytes waiting)

julia> outs
Pipe(active, 2 bytes waiting)

julia> readall(outs)

This last command never exits, I have to manually kill the julia process(es).


exename = joinpath(JULIA_HOME, "julia")
outs, ins, p = readandwrite(`$exename -f --quiet`)
write(ins,"1\nquit()\n")
readall(outs)
@vtjnash
Copy link
Member

vtjnash commented Jun 4, 2014

It should work, but the julia repl subprocess is waiting to read one more character before it dies. @Keno

@tkelman
Copy link
Contributor Author

tkelman commented Jun 28, 2014

So any chance of fixing #7174 for the release, or do we @unix_only this test for 0.3.0?

@vtjnash
Copy link
Member

vtjnash commented Jun 28, 2014

It's on the list for 0.3, if someone has any idea how to patch it. This seems an especially hard windows bug to work around well

@vtjnash
Copy link
Member

vtjnash commented Jul 13, 2014

duplicate of #7174

@vtjnash vtjnash closed this as completed Jul 13, 2014
@vtjnash
Copy link
Member

vtjnash commented Jul 13, 2014

this is only an issue for Mintty, correct?

@tkelman
Copy link
Contributor Author

tkelman commented Jul 13, 2014

Happens outside of Mintty too.

@vtjnash
Copy link
Member

vtjnash commented Jul 14, 2014

oh. i guess it could happen to any NamedPipe

if anyone wants to fix this, I've posted several useful links on the upstream issue page:
joyent/libuv#1313 (comment)

@tkelman
Copy link
Contributor Author

tkelman commented Jul 14, 2014

Read those links, were interesting but over my head. Would love to help fix this, but don't think I'm capable of it.

Last time writing a Windows kernel module was mentioned (#6144 (comment)), I thought it was a joke. That sounds like the kind of thing that could (?) complicate the Julia installation process on Windows in terms of permissions or antivirus false-positives, which are thankfully in good shape right now.

@Keno
Copy link
Member

Keno commented Jul 14, 2014

There will be no writing of julia kernel modules.

@vtjnash
Copy link
Member

vtjnash commented Jul 14, 2014

That one might have been a joke, but this "feature" is a lot more annoying. It doesn't sound like it triggers antivirus detection, but it does apparently trigger hacking detection in some games (according the FAQ http://processhacker.sourceforge.net/faq.php). It apparently loads from memory when launching the program, thereby avoiding issues with installation or needing permissions. If we didn't mind linking with GPL3 code, it would be as simple as copying the relevant file from their project (perhaps their developers would be willing to relicense it?)

@Keno, it would be a libuv kernel module, not a Julia kernel module :)

@vtjnash
Copy link
Member

vtjnash commented Jul 15, 2014

#7611 made me realize that a simple kernel module for replacing NtQueryInformationFile would be utterly insufficient to fix this bug. Actually we just need to reimplement NamedPipes with a less buggy ReadFile implementation (I consider all possible deadlock situations to be bugs, especially ones that the user can't actually avoid). Perhaps we could just ship with Qemu instead, or wubi?

@Keno
Copy link
Member

Keno commented Jul 15, 2014

Are we sure we can't just get the CancelRead approach to work?

@tkelman
Copy link
Contributor Author

tkelman commented Jul 15, 2014

Are Node, Rust, etc capable of spawning themselves, sending commands, and successfully reading the resulting output back on Windows? (cc @jhasse, looks like you also use Rust, do you know the answer to this?) We can't be the only ones gnashing teeth about this.

What about the non-libuv languages, Python, Ruby, Perl? Do they not use named pipes? Are their default Windows implementations built with Visual Studio and using a less-broken .NET version of named pipes (presumably PowerShell is also not this broken)?

@vtjnash
Copy link
Member

vtjnash commented Jul 15, 2014

it's pretty rare to have a program that demands as much of it's process spawn system as Julia. despite having roughly the same backend as node, Keno and I significantly patched the spawn code about a year ago to fix a number of libuv bugs which they haven't fully noticed yet (aside from our pull request to incorporate it back into the trunk).

and it's actually pretty trivial to spawn yourself, send commands, and read the result back, if you have full control of both ends of the pipe (because it's easy to use only the API's that work). it's also trivial to use process-blocking I/O (since the guarantee of synchronous operation only breaks if there happens to be a call to CancelIo). those are very boring situations, however. But Julia has tests for lots of boundary cases, and the set of @unix_only tests vs. @windows_only vs. @osx_only tests hints at the difference in the quality of the design of these operating systems for avoiding these edge case pitfalls.

@Keno yes, I'm attempting a version with CancelRead

@vtjnash vtjnash reopened this Jul 15, 2014
@tkelman
Copy link
Contributor Author

tkelman commented Jul 15, 2014

Three cheers for @vtjnash, don't say "too hard to fix" around this guy. Or maybe do.

It's not 100% fixed yet, but I have a feeling it's most of the way there. The newer two blocks in the repl test, having to do with control characters and history issues, are still freezing. The first block is freezing at sendrepl("\"Hello REPL\""), the second at LineEdit.history_prev(s, hp). The original readall from a readandwrite is almost working, except for a troubling access violation possibly having to do with unicode trying to lock after the process has already exited?

D:\cygwin64\home\Tony\julia32\test>..\usr\bin\julia runtests.jl repl
     * repl
Please submit a bug report with steps to reproduce this fault, and any error mes
sages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x77bc8e19 -- RtlIntegerToUnicodeString
 at ???:2008845849
RtlIntegerToUnicodeString at ???:2008845849
RtlIntegerToUnicodeString at ???:2008845608
uv_mutex_lock at ???:1645065090
uv_idle_invoke at ???:1645017386
vsnprintf at ???:2009044242
TpCallbackIndependent at ???:2008957993
BaseThreadInitThunk at ???:2003055498
RtlInitializeExceptionChain at ???:2008850290
RtlInitializeExceptionChain at ???:2008850245
    SUCCESS

(this was running the test from cmd - it seems okay inside mintty, unless I run the tests with julia-debug)

In 64 bit, from cmd, it looks a little different:

julia> exename=joinpath(JULIA_HOME,(ccall(:jl_is_debugbuild,Cint,())==0?"julia":
"julia-debug"))
"D:\\cygwin64\\home\\Tony\\julia\\usr\\bin\\julia"

julia> outs, ins, p = readandwrite(`$exename -f --quiet`)
(Pipe(active, 0 bytes waiting),Pipe(open, 0 bytes waiting),Process(`'D:\cygwin64
\home\Tony\julia\usr\bin\julia' -f --quiet`, ProcessRunning))

julia> write(ins,"1\nquit()\n")
9

julia> Please submit a bug report with steps to reproduce this fault, and any er
ror messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x779fe4e4 -- RtlDeNormalizeProcessPara
ms at ???:2006967524
RtlDeNormalizeProcessParams at ???:2006967524
RtlDeNormalizeProcessParams at ???:2006967307
uv_idle_invoke at ???:1829839936
TpCallbackMayRunLong at ???:2006783451
RtlRealSuccessor at ???:2006781014
BaseThreadInitThunk at ???:2004507117
RtlUserThreadStart at ???:2006828353
julia>

@jhasse
Copy link
Contributor

jhasse commented Jul 15, 2014

@tkelman Sorry, I don't know enough about Rust.

@vtjnash
Copy link
Member

vtjnash commented Jul 15, 2014

@tkelman please delete your sys.dll file before posting backtraces

@Keno, can you code review this. I'm wondering if I need to check handle->flags & UV_HANDLE_PIPESERVER in more places (especially cleanup)

edit: I see the problem: in uv_pipe_cleanup, we delete the mutex, but in uv_pipe_zero_readfile_thread_proc we need to reaquire that mutex to ensure we don't delete the thread handle too early. We need to delay the final deletion of this mutex until uv_pipe_endgame

@tkelman
Copy link
Contributor Author

tkelman commented Jul 15, 2014

@vtjnash whoops, sorry, forgot about that. backtraces don't look any different here without sys.dll.

vtjnash added a commit that referenced this issue Jul 15, 2014
@tkelman
Copy link
Contributor Author

tkelman commented Jul 15, 2014

Very nice, JuliaLang/libuv@88b6ceb fixed the access violations. Still have to figure out the other 2 freezes.

@tkelman
Copy link
Contributor Author

tkelman commented Jul 16, 2014

Here are the 2 freezing cases, unfolded a bit to determine exactly which line freezes:
https://gist.github.com/tkelman/bf8627f8143d276c5ce0
https://gist.github.com/tkelman/fe40666f451d9d328f9e

In both cases it's a write command to a fake REPL. @Keno should this be able to work on Windows?

@vtjnash
Copy link
Member

vtjnash commented Jul 16, 2014

those examples share a common trait that they are made with link_pipe(...,true,...,true), where "true" means "unaffected by the recent libuv changes"

but also:

stdin_read,stdin_write = (Base.Pipe(C_NULL), Base.Pipe(C_NULL));
Base.link_pipe(stdin_read,true,stdin_write,true);
write(stdin_write,"hi")
 #= Julia completely frozen =#

cause unknown, but we appear to be stuck in a (supposedly) non-blocking syscall to the window's API function GetQueuedCompletionStatusEx (which we called to determine if there is anything that is ready for us to do without blocking...)

@vtjnash
Copy link
Member

vtjnash commented Jul 16, 2014

oh, well, i guess the cause wasn't actually /that/ unknown

@tkelman
Copy link
Contributor Author

tkelman commented Jul 18, 2014

ping @ihnorton I think we need to force a libuv rebuild for the Win32 nightlies

@ihnorton
Copy link
Member

Done.

@vtjnash
Copy link
Member

vtjnash commented Jul 18, 2014

not to be a pain, but I just bumped the libuv version again :P

@ihnorton
Copy link
Member

Done again.

tkelman added a commit that referenced this issue Apr 26, 2015
These freeze otherwise, the fix in libuv for #7082 does not work on XP
tkelman added a commit that referenced this issue Apr 27, 2015
repl tests freeze otherwise, the fix in libuv for #7082 does not work on XP

dllist tests also fail on xp since oddly Libdl.dllist()
is only listing file names, not full absolute paths

udp recvfrom tests give ERROR: LoadError: bind: address family not supported (EAFNOSUPPORT)
tkelman added a commit that referenced this issue Apr 27, 2015
repl tests freeze otherwise, the fix in libuv for #7082 does not work on XP

dllist tests also fail on xp since oddly Libdl.dllist()
is only listing file names, not full absolute paths

udp recvfrom tests give ERROR: LoadError: bind: address family not supported (EAFNOSUPPORT)
tkelman added a commit that referenced this issue Apr 28, 2015
repl tests freeze otherwise, the fix in libuv for #7082 does not work on XP

(cherry picked from commit 8cf5881)
mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
repl tests freeze otherwise, the fix in libuv for JuliaLang#7082 does not work on XP

dllist tests also fail on xp since oddly Libdl.dllist()
is only listing file names, not full absolute paths

udp recvfrom tests give ERROR: LoadError: bind: address family not supported (EAFNOSUPPORT)
mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
repl tests freeze otherwise, the fix in libuv for JuliaLang#7082 does not work on XP

dllist tests also fail on xp since oddly Libdl.dllist()
is only listing file names, not full absolute paths

udp recvfrom tests give ERROR: LoadError: bind: address family not supported (EAFNOSUPPORT)
@JeffBezanson JeffBezanson added the io Involving the I/O subsystem: libuv, read, write, etc. label Jul 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior io Involving the I/O subsystem: libuv, read, write, etc. system:windows Affects only Windows upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

No branches or pull requests

6 participants