Triangle example triggers OutOfHostMemory with gtx 1080 #627

brandonson · 2017-07-08T23:04:32Z

I'm running on an old box with an upgraded GPU. Some details:

Arch Linux
RAM: 12GB
CPU: i7-4770
GPU: GTX 1080

With 8-9GB of memory free, the triangle example runs rampant and triggers an OutOfHostMemory error within a few seconds (which seems fast, but I don't know enough to be sure).

Waiting for the fence at line 380 appears to fix the issue, presumably by letting things finish and be cleaned up at the top of the loop.

Not sure what the right fix (if any) is for this, but it certainly seems like a bug.

tomaka · 2017-07-09T05:03:45Z

From my experience these out of memory errors are almost always a problem of some sort with the driver, but obviously I don't know if that's the case here.

brandonson · 2017-07-10T07:37:21Z

I've got a backtrace now, if it helps:

0  core::result::unwrap_failed<vulkano::OomError> (msg=..., error=vulkano::OomError::OutOfHostMemory) at /checkout/src/libcore/result.rs:860
1  0x00005555555ce13e in core::result::Result<vulkano::sync::fence::Fence<alloc::arc::Arc<vulkano::device::Device>>, vulkano::OomError>::unwrap<vulkano::sync::fence::Fence<alloc::arc::Arc<vulkano::device::Device>>,vulkano::OomError> (self=...) at /checkout/src/libcore/result.rs:738
2  0x00005555556842a9 in vulkano::sync::future::fence_signal::then_signal_fence<vulkano::swapchain::swapchain::PresentFuture<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::join::JoinFuture<alloc::boxed::Box<GpuFuture>, vulkano::swapchain::swapchain::SwapchainAcquireFuture>, vulkano::command_buffer::auto::AutoCommandBuffer<vulkano::command_buffer::pool::standard::StandardCommandPoolAlloc>>>> (future=..., behavior=...) at /home/brandon/.cargo/registry/src/github.com-1ecc6299db9ec823/vulkano-0.5.2/src/sync/future/fence_signal.rs:40
3  0x0000555555684e22 in vulkano::sync::future::GpuFuture::then_signal_fence<vulkano::swapchain::swapchain::PresentFuture<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::join::JoinFuture<alloc::boxed::Box<GpuFuture>, vulkano::swapchain::swapchain::SwapchainAcquireFuture>, vulkano::command_buffer::auto::AutoCommandBuffer<vulkano::command_buffer::pool::standard::StandardCommandPoolAlloc>>>> (self=...) at /home/brandon/.cargo/registry/src/github.com-1ecc6299db9ec823/vulkano-0.5.2/src/sync/future/mod.rs:216
4  0x0000555555685086 in vulkano::sync::future::GpuFuture::then_signal_fence_and_flush<vulkano::swapchain::swapchain::PresentFuture<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::join::JoinFuture<alloc::boxed::Box<GpuFuture>, vulkano::swapchain::swapchain::SwapchainAcquireFuture>, vulkano::command_buffer::auto::AutoCommandBuffer<vulkano::command_buffer::pool::standard::StandardCommandPoolAlloc>>>> (self=...) at /home/brandon/.cargo/registry/src/github.com-1ecc6299db9ec823/vulkano-0.5.2/src/sync/future/mod.rs:226
5  0x00005555556b30c2 in forever_protocol::main () at src/main.rs:194

Looks like it can't create a new fence.

Is this necessarily a driver issue? I'll admit to a lack of necessary experience to really debug this, but hypothetically, could there be no reason on the driver/GPU side to block, but my memory/CPU can't keep up. (It does pin a CPU to 100%, and the process as a whole uses a bit more than 1GB of memory before bailing out... not enough to use up all my memory, but maybe enough to max out some memory pool?)

Shooting in the dark, really, but maybe the backtrace will help.

tomaka · 2017-07-11T11:24:04Z

Could you add a bunch of printlns in the code in order to find out how many times the loop in the example runs before the OOM happens?

Knowing whether it panics at the first run, second run, third run, or afterwards, would be very useful.

cormac-obrien · 2017-07-11T20:59:50Z

On my machine the loop runs anywhere from 900-1000 iterations before the OOM occurs.

Specs:
Arch Linux
8GB RAM
i5-4690k
GTX 970

tomaka · 2017-07-12T06:57:58Z

Right now vulkano creates and destroys many fences and semaphores, which is discouraged.
The fact that the OOM triggers only after a lot of iterations makes me think that there's some sort of leak either in vulkano or in the driver.

brandonson · 2017-07-12T09:47:56Z

That would make sense, though it's interesting that what should be a similar number of iterations does not expose this issue if a wait is included. (I'll try to show what I mean tomorrow - too late to work at this tonight, and I need to confirm that the iteration counts get high enough with waits)

Assuming that waits do fix the issue and I haven't missed something, I'd suspect that something just doesn't get cleaned up soon enough, rather than a leak. The example uses FIFO present mode, which as far as I'm aware requires every frame to get displayed. There's no way my monitor can keep up with my graphics card for something this simple - could the example just be building up a massive chain of futures?

I'll test if/when I get a chance, but please let me know if I'm completely on the wrong track.

tomaka · 2017-07-12T10:03:20Z

In FIFO mode the call to acquire_next_image should already act as a wait (a "wait for vsync"), so I don't think adding another wait would fix it.

The call to cleanup_finished() is here to ensure that the resources used by previous frames are cleaned up: https://github.com/tomaka/vulkano/blob/606427f55f8ab6bae10d02462f76322837b26610/examples/src/bin/triangle.rs#L325

On my machine (and on many people's machines I presume) I can let the examples run for a very long time with no increase in memory, so I don't think there's anything leaking in vulkano itself.

brandonson · 2017-07-14T06:32:44Z

Per this reddit thread and other searching it looks like this is an Nvidia driver bug - acquire_next_image is not blocking the way it should (potentially only under X11). So, seems like things are in fact never getting cleaned up, but if the driver worked properly they would be.

Experimentation indicates this can be worked around by either waiting on the fence returned by then_signal_fence_and_flush (in non-vulkano code) or by waiting for the fence passed to acquire_next_image_raw2 (in vulkano, src/swapchain/swapchain.rs). Using PresentMode::Immediate for the swapchain also appears to fix things, though I suspect there'd be issues if the image actually changed. PresentMode::Mailbox isn't available for my setup yet, unfortunately - theory/google says it should work as well.

Anyway, for the time being I can work around this with a queue of futures and waiting for the oldest one when I've used all images (which I believe should approximate the behaviour I'd get with a properly blocking image acquire, though it's not really ideal). It would be nice if vulkano can expose the fence for acquire_next_image, even if only in an unsafe form - it would allow for a better workaround, and from looking at acquire_next_image_raw2 seems to be planned anyway. Even better would be some change that makes acquire_next_image work as expected, but that may not be possible to do without negatively affecting perf on other setups.

tomaka · 2017-07-14T06:56:23Z

Thanks for the investigation!

either waiting on the fence returned by then_signal_fence_and_flush (in non-vulkano code)

Note that you can do that in vulkano: https://docs.rs/vulkano/0.5.2/vulkano/sync/struct.FenceSignalFuture.html#method.wait

It would be nice if vulkano can expose the fence for acquire_next_image, even if only in an unsafe form

Yes, in general I think the acquire_next_image API needs some rework to allow fences.

brandonson · 2017-07-15T09:00:48Z

Hmm, reading again my comment was a bit unclear. I meant non-vulkano as in the wait is outside of vulkano itself, and therefore doesn't require a change to the library. It still uses the API exposed by vulkano (the method you mentioned).

I'd meant to highlight the difference in location compared to the second workaround, which worked by changing code in the vulkano library.

anxiousmodernman · 2017-08-19T21:07:35Z

Same here. I experience high (100%) CPU usage on the triangle example, as well with this system:

Ubuntu 16.04; Intel i7; GTX 1060

I have another small PC with Arch and intel m5 (integrated graphics), and everything is swell.

On the GTX 1060, I add this snippet of code just before here, and things settle down.

        // <hacks>
        match future.wait(Some(Duration::from_millis(100))) {
            Ok(x) => x,  // type unit
            Err(err) => println!("err: {:?}", err), // never see this
        }
        // </hacks>
        previous_frame_end = Box::new(future) as Box<_>;

Playing with the duration passed to wait:

None - works
100 milliseconds - works
10 milliseconds - no error printed, but a panic occurs: backtrace

benyb9 · 2017-09-06T06:23:13Z

Updating the crossbeam dependency to 0.3 seems to have fixed the leaks I was having.

ia0 · 2017-12-31T23:15:13Z

I have the same problem (after ~500 iterations) with GeForce GTX 970. It would be indeed nice to have a way to wait on the next image to be acquired at each iteration instead of building a huge pile of futures. Artificially slowing down the loop (with 50ms pause) works for me as a workaround.

AustinJ235 · 2019-10-23T08:21:47Z

Please refer to #1247

tomaka added status: unknown-cause type: bug labels Jul 9, 2017

rukai mentioned this issue Aug 10, 2018

Fix examples on nvidia linux #955

Closed

AustinJ235 mentioned this issue Oct 20, 2019

Crashing/Freezing/OutOfHostMemory with Nvidia or on MacOS systems #1247

Closed

AustinJ235 closed this as completed Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triangle example triggers OutOfHostMemory with gtx 1080 #627

Triangle example triggers OutOfHostMemory with gtx 1080 #627

brandonson commented Jul 8, 2017 •

edited

Loading

tomaka commented Jul 9, 2017

brandonson commented Jul 10, 2017 •

edited

Loading

tomaka commented Jul 11, 2017 •

edited

Loading

cormac-obrien commented Jul 11, 2017

tomaka commented Jul 12, 2017

brandonson commented Jul 12, 2017

tomaka commented Jul 12, 2017

brandonson commented Jul 14, 2017 •

edited

Loading

tomaka commented Jul 14, 2017

brandonson commented Jul 15, 2017

anxiousmodernman commented Aug 19, 2017

benyb9 commented Sep 6, 2017

ia0 commented Dec 31, 2017

AustinJ235 commented Oct 23, 2019

Triangle example triggers OutOfHostMemory with gtx 1080 #627

Triangle example triggers OutOfHostMemory with gtx 1080 #627

Comments

brandonson commented Jul 8, 2017 • edited Loading

tomaka commented Jul 9, 2017

brandonson commented Jul 10, 2017 • edited Loading

tomaka commented Jul 11, 2017 • edited Loading

cormac-obrien commented Jul 11, 2017

tomaka commented Jul 12, 2017

brandonson commented Jul 12, 2017

tomaka commented Jul 12, 2017

brandonson commented Jul 14, 2017 • edited Loading

tomaka commented Jul 14, 2017

brandonson commented Jul 15, 2017

anxiousmodernman commented Aug 19, 2017

benyb9 commented Sep 6, 2017

ia0 commented Dec 31, 2017

AustinJ235 commented Oct 23, 2019

brandonson commented Jul 8, 2017 •

edited

Loading

brandonson commented Jul 10, 2017 •

edited

Loading

tomaka commented Jul 11, 2017 •

edited

Loading

brandonson commented Jul 14, 2017 •

edited

Loading