Valgrind throwing lots of errors #15

critiqjo · 2015-12-15T20:08:06Z

When the number of coroutines spawned is greater than a certain value, a lot of "use of uninitialized value" errors are thrown by valgrind.

Errors are thrown only when 15 or more coroutines are spawned:

extern crate coio;

use coio::{Scheduler, sleep_ms};

fn main() {
    Scheduler::new()
        .with_workers(1)
        .run(|| {
            for _ in 0..15 {
                Scheduler::spawn(move || {
                    println!("Heil Hydra");
                    sleep_ms(100);
                });
            }
        })
        .unwrap();
}

If there are more workers, I can spawn more coroutines without errors, but I do reach a limit (eg. 19 for 2 workers). Despite these errors, the program gives the expected output, and exits without any segfaults.

Here are a few lines (out of 11901) from the log:

...
==13148== Warning: client switching stacks?  SP change: 0x5f134e0 --> 0x407bff8
==13148==          to suppress, use: --max-stackframe=32077032 or greater
==13148==          further instances of this message will not be shown.
==13148== Thread 2 Worker 0:
...
==13148== Invalid read of size 8
==13148==    at 0x1451AF: runtime::processor::_$LT$impl$GT$::running::hbdb031c50523a379rtc (processor.rs:190)
==13148==  Address 0x5f13f58 is in a rw- anonymous segment
==13148== 
==13148== Invalid read of size 8
==13148==    at 0x1451B9: runtime::processor::_$LT$impl$GT$::running::hbdb031c50523a379rtc (processor.rs:190)
==13148==  Address 0x5f13f60 is in a rw- anonymous segment
==13148== 
==13148== Invalid read of size 8
==13148==    at 0x1451EA: _$LT$impl$GT$::insert::insert::h14411986088127779969 (lib.rs:133)
==13148==  Address 0x5f14028 is in a rw- anonymous segment
==13148== 
...
==13148== Conditional jump or move depends on uninitialised value(s)
==13148==    at 0x56E26EF: pthread_join (in /usr/lib/libpthread-2.22.so)
==13148==    by 0x140862: thread::_$LT$impl$GT$::join::join::h17510934181584830106 (mod.rs:574)
==13148==    by 0x1407DE: thread::_$LT$impl$GT$::join::join::h10846901181898427765 (mod.rs:604)
==13148==    by 0x113AA5: scheduler::_$LT$impl$GT$::run::run::h12081195173303708932 (scheduler.rs:210)
==13148==    by 0x11219A: main::h28feaf366a048100iaa (main.rs:6)
==13148==    by 0x196324: sys_common::unwind::try::try_fn::h10038815114197766760 (in /home/john/dev/coio-test/target/debug/coio-test)
==13148==    by 0x193B78: __rust_try (in /home/john/dev/coio-test/target/debug/coio-test)
==13148==    by 0x195FC6: rt::lang_start::hbdf4b213a64b8c7394x (in /home/john/dev/coio-test/target/debug/coio-test)
==13148==    by 0x146DE9: main (in /home/john/dev/coio-test/target/debug/coio-test)
==13148== 
...

The text was updated successfully, but these errors were encountered:

zonyitoo · 2015-12-16T01:11:55Z

Ah, it seems that nearly every call in this crate will generate a "Invalid read" or "Use of uninitialised value".

Could you please first try to use valgrind to test this simple example program?
https://github.com/zonyitoo/context-rs/blob/master/examples/simple.rs

Let's see whether the problem is in the context-rs crate.

critiqjo · 2015-12-16T16:05:54Z

Could you please first try to use valgrind to test this simple example program?

Both simple.rs and asymmetric.rs runs fine without errors. But I think that should be expected since the errors are observed only when there are 15 or more coroutines in the coio example.

I also tried out a slightly modified asymemetric.rs example:

fn main() {
    let mut coros: Vec<Coroutine<i32>> = (0..20).map(|_| Coroutine::spawn(|me| {
        for num in 0..2 {
            me.yield_with(num);
        }
    })).collect();

    for num in 0..3 {
        for coro in coros.iter_mut() {
            println!("{:?}", coro.next());
        }
    }
}

which also runs without errors...

zonyitoo · 2015-12-16T16:18:22Z

That is weird. Is it possible that because of the user space context switching confuses the Valgrind?

critiqjo · 2015-12-16T16:26:23Z

I suppose valgrind should be smarter than that! But I'm not experienced in either domains, so can't say for sure!

zonyitoo · 2015-12-16T16:30:44Z

I am now focusing on refactoring the I/O module, so I will be happy if you can help to debug :)

critiqjo · 2015-12-16T17:10:00Z

I will be happy to help. But, currently, I have no idea how context is implemented. ~~Where do you suggest I should look to get a high-level view of the design, before jumping deep?~~ (never mind! :) your documentation/comments is pretty extensive!)

critiqjo · 2015-12-16T20:25:31Z

I ran it again with --track-origins=yes option (info source). And now, most of the errors are followed by these 2 lines:

==27720==  Uninitialised value was created by a stack allocation
==27720==    at 0x174D91: ??? (_context.S:121)

which is this statement:

mov (RUSTRT_RSP*8)(ARG1), %rsp

And a very few others such as:

==27720==  Uninitialised value was created by a stack allocation
==27720==    at 0x140DE4: scheduler::_$LT$impl$GT$::spawn_opts::spawn_opts::h12050465489478016694 (scheduler.rs:140)
...
==27720==  Uninitialised value was created by a stack allocation
==27720==    at 0x187540: sync::once::_$LT$impl$GT$::call_once::call_once::h13033172402953422293 (once.rs:72)
...
==27720==  Uninitialised value was created by a stack allocation
==27720==    at 0x1413F0: sync::mpsc::channel::channel::h3345162500116368437 (mpsc.rs:108)

The last two are not even related to context.

zonyitoo · 2015-12-17T01:17:51Z

mov (RUSTRT_RSP*8)(ARG1), %rsp

This line is used for restoring the %rsp register for swapping stacks. I can't see any obvious mistakes in this piece of code.

scheduler::_$LT$impl$GT$::spawn_opts::spawn_opts is called when spawning new coroutines, it is in pure Rust code. In this function, the spawn_opts, it will call the sync::mpsc::channel::channel.
The sync::once::_$LT$impl$GT$::call_once::call_once call may be related to the thread_local! macro, which used to store the Processor object and called by spawn_opt.

So these three errors are all related. But I don't know which one is uninitialised.

critiqjo · 2015-12-17T19:41:05Z

Another observation: if I reduce the DEFAULT_STACK to 64 * 1024 (instead of 128kB), the number of coroutines can go up to 30 without errors (doubled!). ~~So I doubt it has something to do with the stack pool?~~ (or not, since the modified asymmetric example used more much more stacks-allocations: 2MB x 20)

zonyitoo · 2015-12-18T03:30:18Z

Why not try to disable the stack cache.

https://github.com/zonyitoo/context-rs/blob/master/src/stack.rs#L157

Just try to set RUST_MAX_CACHED_STACKS=0, so it will call mmap to create a new stack on every spawns.

critiqjo · 2015-12-18T21:26:37Z

Why not try to disable the stack cache.

Unfortunately, increasing it or disabling it made no difference.

The total stack size without "errors" is just nearly 2 MB; or more precisely, 2 "stacks" less than 2 MB for 256 KB (6 coroutines without errors), 128 KB (14) and 64 KB (30) stack sizes; but 3 "stacks" less for 32 KB (61) stack size.

zonyitoo · 2015-12-19T10:51:21Z

Well, I have no idea what happened right now. Help wanted.

lhecker · 2016-01-04T13:57:34Z

@critiqjo @zonyitoo Did you guys use debug or release builds while testing this with valgrind? In the latter case this might be the issue: rust-lang/rust#5856 (comment)

zonyitoo · 2016-01-05T01:02:33Z

If you use release builds for testing, you would find no symbols in the output logs...

Or you could use --opt-level=3 with -g

critiqjo · 2016-01-09T06:55:31Z

Sorry for the late reply. I'm not able to test the original example in the machine I use now -- panics on sleep_ms call (backtrace).

Though, if I remember correctly, I got those valgrind errors for both debug and release builds, and I definitely didn't use --opt-level=3 and -g together.

zonyitoo · 2016-01-09T08:39:12Z

panicked at 'Box<Any>' should reason of triggering force unwinding .. I don't know exactly why it happens.

zonyitoo · 2016-02-13T14:55:00Z

In the latest commit in feature-migrate_context1.0 branch, valgrind doesn't show any Unitialized value or Invalid Read or Invalid Write logs. Including the test program in the top of this issue.

lhecker · 2016-02-13T14:55:49Z

Well that's kinda unexpected but still very very nice. 👍

zonyitoo · 2016-02-13T14:58:27Z

I think the previous logs about Invalid Reads and Invalid Writes are actually exist because we write to some variables allocated on the stack of the Coroutine after we dropped the stack, as described in that issue. But I think the problem is already solved during the migration to context-rs 1.0.

zonyitoo added help wanted question labels Dec 19, 2015

lhecker mentioned this issue Jan 31, 2016

Bug: Sync channel returns RecvError #31

Open

zonyitoo mentioned this issue Feb 10, 2016

Migrate to context-rs v1.0 #40

Closed

zonyitoo added the in progress label Feb 13, 2016

zonyitoo mentioned this issue Feb 15, 2016

Migrate to context 1.0 #43

Merged

8 tasks

lhecker closed this as completed in #43 Mar 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valgrind throwing lots of errors #15

Valgrind throwing lots of errors #15

critiqjo commented Dec 15, 2015

zonyitoo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

zonyitoo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

zonyitoo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

zonyitoo commented Dec 17, 2015

critiqjo commented Dec 17, 2015

zonyitoo commented Dec 18, 2015

critiqjo commented Dec 18, 2015

zonyitoo commented Dec 19, 2015

lhecker commented Jan 4, 2016

zonyitoo commented Jan 5, 2016

critiqjo commented Jan 9, 2016

zonyitoo commented Jan 9, 2016

zonyitoo commented Feb 13, 2016

lhecker commented Feb 13, 2016

zonyitoo commented Feb 13, 2016

Valgrind throwing lots of errors #15

Valgrind throwing lots of errors #15

Comments

critiqjo commented Dec 15, 2015

zonyitoo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

zonyitoo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

zonyitoo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

critiqjo commented Dec 16, 2015

zonyitoo commented Dec 17, 2015

critiqjo commented Dec 17, 2015

zonyitoo commented Dec 18, 2015

critiqjo commented Dec 18, 2015

zonyitoo commented Dec 19, 2015

lhecker commented Jan 4, 2016

zonyitoo commented Jan 5, 2016

critiqjo commented Jan 9, 2016

zonyitoo commented Jan 9, 2016

zonyitoo commented Feb 13, 2016

lhecker commented Feb 13, 2016

zonyitoo commented Feb 13, 2016