Skip to content

random fails/segfaults from the io-upstream merge when RUST_THREADS=2 on Mac #7797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pnkfelix opened this issue Jul 15, 2013 · 9 comments
Closed
Labels
I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-macos Operating system: macOS

Comments

@pnkfelix
Copy link
Member

Summary: there is some non-deterministic crash while doing make check that is easier for me to reproduce when I set environment variable RUST_THREADS=2 during the make check. This is on a build with both --enable-debug and --enable-optimize; I am not sure if removing those options actually makes the bug go away, or merely makes it harder to reproduce. (Update: --enable-optimize is not relevant; it can be toggled on/off and the bug persists.)

It appears to have been injected by commit 41dcec2

This is a merge commit by bors; its parent supplied by bors is: 137d1fb

Things seem to work fine on the parent.


I found during my attempts to bisect the history that I could not trust our build infrastructure to reliably update LLVM source and/or rebuild the object code (perhaps for LLVM or perhaps elsewhere), so I got in the habit of deleting the both the LLVM source tree and the entire build tree (and re-running configure) during the bisection. This of course makes the build take a bit longer. (But maybe now that I have narrowed it to this particular merge, perhaps one can be less conservative with how much cleanup one does before testing.)

So, to reproduce, here is the command I was running in the build tree:

export RUST_THREADS=2 ; git log -1 ; echo ; echo ; rm -rf ../src/llvm/ ; rm -rf * ; echo ; echo ; /Users/pnkfelix/Dev/Mozilla/rust.git/configure --enable-debug --enable-optimize  --enable-clang --prefix=~/opt/rust-dbg ; remake clean clean-llvm ; time make check 

(There is certainly steps that can be omitted from the above, like the git log -1, which was for my own logging purposes while I bisected. remake is the fork of GNU make I often use: remake, but that is almost certainly orthogonal.)

I wish I had better information (either a more specific commit from the originating branch, or a narrow test test), but as of yet I have not been able to gather such. I'm mostly filing this so that I keep a log of the crashes and any other detail I manage to gather.

A few of the crashes I saw occurred while testing rustpkg, and included this as the output:

  rustc: for the -arm-enable-ehabi option: may only occur zero or one times!
  rustc: for the -arm-enable-ehabi-descriptors option: may only occur zero or one times!

shortly before crashing; from what I can tell from skimming the source code, there is no way we should be seeing that message from tests on an Intel Mac, unless some internal state has been corrupted. But the failure is non-deterministic, so this does not happen reliably.

A few samples of the sorts of crashes that one sees while running make check, some while using RUST_LOG to get more info (I have more complete logs available on request); I had to post process some of the output, replacing raw bytes unacceptable to this text box e.g. \342\200\224 with character strings representing those byte sequences:

test rt::io::net::tcp::test::bind_error ... ignored
test rt::io::extensions::test::read_bytes ... ok
test result::tests::chain_failure ... ok
rust: ~"waking up task for 140313321592696"
test rt::io::extensions::test::push_bytes_partial ... ok
test rt::io::mem::test::test_mem_writer ... okrust: ~"waking up task for 140313322615688"
rust: ~"waking up task for 140313318415048"
error opening /dev/urandom: Too many open files
test rt::io::extensions::test::read_to_end ... ok
test rt::io::extensions::test::read_byte_0_bytes ... ok
rust: ~"blocked = 0 this = 7f9e45c08be0"
rust: ~"blocked = 7f9e45c08be0 this = 7f9e45c08be0 old_task = 0"
rust: ~"woke up, p.state = Full"
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7f9e45c08be0 as *()), buffer: (0x7f9e45a038c0 as *())}, payload: None}, going to sleep."
task failed at 'assertion failed: handle.is_not_null()', /Users/pnkfelix/Dev/Mozilla/rust.git/src/libstd/rt/uv/mod.rs:100
e
There was a night when winds from unknown spaces whirled us irresistibly into
limitless vacum beyond all thought and entity. Perceptions of the most
maddeningly untransmissible sort thronged upon us; perceptions of infinity
which at the time convulsed us with joy, yet which are now partly lost to my
memory and partly incapable of presentation to others.

fatal runtime error: assertion failed: void_sched.is_not_null()
test rt::comm::test::oneshot_single_thread_send_then_recv ... ok
test rt::io::mem::test::test_mem_reader ... ok
rust: ~"blocked = 0 this = 7fea01519590"
rust: ~"blocked = 7fea01519590 this = 7fea01519590 old_task = 0"
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7fea01519590 as *()), buffer: (0x7fea0150ec00 as *())}, payload: None}, going to sleep."
rust: ~"waking up task for 140643020157512"
test rt::io::net::tcp::test::connect_error ... ok
rust: ~"blocked = 0 this = 7fea02a022a0"
test rt::io::extensions::test::push_bytes_partial ... ok
rust: ~"blocked = 7fea02a022a0 this = 7fea02a022a0 old_task = 0"
rust: ~"blocked = 0 this = 7fea02b15670"
test rt::comm::test::oneshot_single_thread_send_port_close ... ok
rust: ~"blocked = 0 this = 7fea02a04750"
rust: ~"blocked = 7fea02b15670 this = 7fea02b15670 old_task = 0"
test rt::io::extensions::test::read_to_end_error ... ok
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7fea02a022a0 as *()), buffer: (0x7fea01418ce0 as *())}, payload: None}, going to sleep."
rust: ~"blocked = 7fea02a04750 this = 7fea02a04750 old_task = 0"
rust: ~"blocked = 0 this = 7fea01519890"
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7fea02b15670 as *()), buffer: (0x7fea02b154e0 as *())}, payload: None}, going to sleep."
test rt::io::extensions::test::read_bytes ... okrust: ~"blocked = 7fea01519890 this = 7fea01519890 old_task = 0"
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7fea02a04750 as *()), buffer: (0x7fea02a00800 as *())}, payload: None}, going to sleep."
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7fea01519890 as *()), buffer: (0x7fea01609fa0 as *())}, payload: None}, going to sleep."
test rt::comm::test::oneshot_single_thread_close_port_first ... ok
rust: ~"blocked = 0 this = 7fea01601c60"
rust: ~"blocked = 7fea01601c60 this = 7fea01601c60 old_task = 0"
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7fea01601c60 as *()), buffer: (0x7fea0280f020 as *())}, payload: None}, going to sleep."
(libuv) Failed to create kqueue (24)


Instead of the poems I had hoped for, there came only a shuddering blackness
and ineffable loneliness; and I saw at last a fearful truth which no one had
ever dared to breathe before \342\200\224 the unwhisperable secret of secrets \342\200\224 The fact
that this city of stone and stridor is not a sentient perpetuation of Old New
York as London is of Old London and Paris of Old Paris, but that it is in fact
quite dead, its sprawling body imperfectly embalmed and infested with queer
animate things which have nothing to do with it as it was in life.

fatal runtime error: no scheduler
test rt::io::extensions::test::push_bytes_eof ... ok
test rt::comm::test::oneshot_single_thread_try_send_closed ... ok
test rt::io::file::super_simple_smoke_test_lets_go_read_some_files_and_have_a_good_time ... ignored
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7f81dbe030a0 as *()), buffer: (0x7f81ddc011b0 as *())}, payload: None}, going to sleep."
test rt::io::flate::test::smoke_test ... ignored
rust: ~"\"Trap: pushing handler to TLS\""
rust: ~"blocked = 0 this = 7f81ddb03a20"
rust: ~"blocked = 7f81ddb03a20 this = 7f81ddb03a20 old_task = 0"
rust: ~"\"Condition.raise: found handler\""
rust: ~"\"Guard: popping handler from TLS\""
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7f81ddb03a20 as *()), buffer: (0x7f81ddb025e0 as *())}, payload: None}, going to sleep."
rust: ~"woke up, p.state = Full"
test rt::comm::test::megapipe_stress ... okrust: ~"blocked = 0 this = 7f81dd901580"
rust: ~"blocked = 0 this = 7f81dda01dd0"
rust: ~"woke up, p.state = Full"
rust: ~"\"read 0 bytes. trying again\""
s ... ok
test rt::io::extensions::test::read_bytes_eof ... ok
rust: ~"blocked = 7f81dd901580 this = 7f81dd901580 old_task = 0"
test rt::comm::test::oneshot_single_thread_try_recv_open ... okrust: ~"blocked = 7f81dda01dd0 this = 7f81dda01dd0 old_task = 0"
rust: ~"\"Trap: pushing handler to TLS\""
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7f81dd901580 as *()), buffer: (0x7f81ddc028a0 as *())}, payload: None}, going to sleep."
rust: ~"\"Condition.raise: found handler\""
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7f81dda01dd0 as *()), buffer: (0x7f81dbe06a10 as *())}, payload: None}, going to sleep."
rust: ~"\"Condition.raise: found handler\""
rust: ~"\"Guard: popping handler from TLS\""
rust: ~"\"Trap: pushing handler to TLS\""
rust: ~"\"Condition.raise: found handler\""
rust: ~"\"Condition.raise: found no handler\""
rust: task failed at 'Unhandled condition: read_error: {kind: OtherIoError, desc: "Placeholder error. You shouldn\'t be seeing this", detail: None}', /Users/pnkfelix/Dev/Mozilla/rust.git/src/libstd/condition.rs:43
test rt::io::net::http::test::smoke_test ... ignored
test rt::io::net::tcp::test::bind_error ... ignored
test rt::io::extensions::test::read_byte_0_bytes ... ok
test rt::comm::test::oneshot_multi_task_recv_then_close ... ok
rust: ~"blocked = 0 this = 7f81dbe08970"
rust: ~"blocked = 7f81dbe08970 this = 7f81dbe08970 old_task = 0"
test rt::io::extensions::test::read_byte ... ok
rust: ~"no data available on &mut {header: {state: Blocked, blocked_task: (0x7f81dbe08970 as *()), buffer: (0x7f81ddb07b80 as *())}, payload: None}, going to sleep."
rust: ~"waking up task for 140195716426264"
~"\"Trap: pushing handler to TLS\""
@pnkfelix
Copy link
Member Author

Did not reproduce atop Linux. So this may be Mac specific (how much is platform dependent here? I guess we are at the mercy of the kernel's thread scheduling?)

@graydon
Copy link
Contributor

graydon commented Jul 15, 2013

Raise ulimit on open file descriptors? We've seen that error recently

@pnkfelix
Copy link
Member Author

Can someone in California with a Mac attempt to reproduce this via the command line I provided in the description? I have reproduced atop Mac OS X 10.7.5 and some variant of 10.8 (I don't have that machine at hand at the moment). I just want to make sure I'm not the only one capable of reproducing the issue.

@pnkfelix
Copy link
Member Author

(mm, I'm a little embarrassed to note that the command I listed in the description says remake clean clean-llvm check rather than something along the lines of remake clean ; remake clean-llvm ; remake check. I think in general one needs to keep the cleaning separate from the building; I am pretty sure that this bug is not related to that (as in, the fact that it reached the test running state means that this is not related), but I will need to double check that.)

@pnkfelix
Copy link
Member Author

I have now confirmed that the problem arises with and without --enable-optimize

@pnkfelix
Copy link
Member Author

@graydon I tried to increase the max number of open files, namely by using this command sysctl -w kern.maxfilesperproc=20000 as found on stackoverflow. It did not help.

Note also that when using RUST_THREADS=1, the problem does not occur. I suppose I could believe that by running two threads instead of one, we might be going over some threshold for number of open files. But given the nature of some of the errors I saw, I cannot be certain that was what was happening; note in particular the bit about -arm-enable-ehabi that I noted in the description.

@graydon
Copy link
Contributor

graydon commented Jul 16, 2013

I just remember brian had to reboot the mac builders recently to raise the fd limit, I don't remember if that's how he did it. Ask him when he's back?

@brson
Copy link
Contributor

brson commented Jul 17, 2013

Same problem as #7772

@pnkfelix
Copy link
Member Author

yeah, okay, I'm willing to believe that the majority of the issues in the description are from #7772. So I'm closing as a duplicate

That -arm-enable-ehabi message still scares the heck out of me, and I believe it is a different bug, but I don't know how to reliably reproduce it, so I'll let this close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-macos Operating system: macOS
Projects
None yet
Development

No branches or pull requests

3 participants