Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triangle example triggers OutOfHostMemory with gtx 1080 #627

Closed
brandonson opened this issue Jul 8, 2017 · 14 comments
Closed

Triangle example triggers OutOfHostMemory with gtx 1080 #627

brandonson opened this issue Jul 8, 2017 · 14 comments

Comments

@brandonson
Copy link

brandonson commented Jul 8, 2017

I'm running on an old box with an upgraded GPU. Some details:

Arch Linux
RAM: 12GB
CPU: i7-4770
GPU: GTX 1080

With 8-9GB of memory free, the triangle example runs rampant and triggers an OutOfHostMemory error within a few seconds (which seems fast, but I don't know enough to be sure).

Waiting for the fence at line 380 appears to fix the issue, presumably by letting things finish and be cleaned up at the top of the loop.

Not sure what the right fix (if any) is for this, but it certainly seems like a bug.

@tomaka
Copy link
Member

tomaka commented Jul 9, 2017

From my experience these out of memory errors are almost always a problem of some sort with the driver, but obviously I don't know if that's the case here.

@brandonson
Copy link
Author

brandonson commented Jul 10, 2017

I've got a backtrace now, if it helps:

0  core::result::unwrap_failed<vulkano::OomError> (msg=..., error=vulkano::OomError::OutOfHostMemory) at /checkout/src/libcore/result.rs:860
1  0x00005555555ce13e in core::result::Result<vulkano::sync::fence::Fence<alloc::arc::Arc<vulkano::device::Device>>, vulkano::OomError>::unwrap<vulkano::sync::fence::Fence<alloc::arc::Arc<vulkano::device::Device>>,vulkano::OomError> (self=...) at /checkout/src/libcore/result.rs:738
2  0x00005555556842a9 in vulkano::sync::future::fence_signal::then_signal_fence<vulkano::swapchain::swapchain::PresentFuture<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::join::JoinFuture<alloc::boxed::Box<GpuFuture>, vulkano::swapchain::swapchain::SwapchainAcquireFuture>, vulkano::command_buffer::auto::AutoCommandBuffer<vulkano::command_buffer::pool::standard::StandardCommandPoolAlloc>>>> (future=..., behavior=...) at /home/brandon/.cargo/registry/src/github.com-1ecc6299db9ec823/vulkano-0.5.2/src/sync/future/fence_signal.rs:40
3  0x0000555555684e22 in vulkano::sync::future::GpuFuture::then_signal_fence<vulkano::swapchain::swapchain::PresentFuture<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::join::JoinFuture<alloc::boxed::Box<GpuFuture>, vulkano::swapchain::swapchain::SwapchainAcquireFuture>, vulkano::command_buffer::auto::AutoCommandBuffer<vulkano::command_buffer::pool::standard::StandardCommandPoolAlloc>>>> (self=...) at /home/brandon/.cargo/registry/src/github.com-1ecc6299db9ec823/vulkano-0.5.2/src/sync/future/mod.rs:216
4  0x0000555555685086 in vulkano::sync::future::GpuFuture::then_signal_fence_and_flush<vulkano::swapchain::swapchain::PresentFuture<vulkano::command_buffer::traits::CommandBufferExecFuture<vulkano::sync::future::join::JoinFuture<alloc::boxed::Box<GpuFuture>, vulkano::swapchain::swapchain::SwapchainAcquireFuture>, vulkano::command_buffer::auto::AutoCommandBuffer<vulkano::command_buffer::pool::standard::StandardCommandPoolAlloc>>>> (self=...) at /home/brandon/.cargo/registry/src/github.com-1ecc6299db9ec823/vulkano-0.5.2/src/sync/future/mod.rs:226
5  0x00005555556b30c2 in forever_protocol::main () at src/main.rs:194

Looks like it can't create a new fence.

Is this necessarily a driver issue? I'll admit to a lack of necessary experience to really debug this, but hypothetically, could there be no reason on the driver/GPU side to block, but my memory/CPU can't keep up. (It does pin a CPU to 100%, and the process as a whole uses a bit more than 1GB of memory before bailing out... not enough to use up all my memory, but maybe enough to max out some memory pool?)

Shooting in the dark, really, but maybe the backtrace will help.

@tomaka
Copy link
Member

tomaka commented Jul 11, 2017

Could you add a bunch of printlns in the code in order to find out how many times the loop in the example runs before the OOM happens?

Knowing whether it panics at the first run, second run, third run, or afterwards, would be very useful.

@cormac-obrien
Copy link

On my machine the loop runs anywhere from 900-1000 iterations before the OOM occurs.

Specs:
Arch Linux
8GB RAM
i5-4690k
GTX 970

@tomaka
Copy link
Member

tomaka commented Jul 12, 2017

Right now vulkano creates and destroys many fences and semaphores, which is discouraged.
The fact that the OOM triggers only after a lot of iterations makes me think that there's some sort of leak either in vulkano or in the driver.

@brandonson
Copy link
Author

That would make sense, though it's interesting that what should be a similar number of iterations does not expose this issue if a wait is included. (I'll try to show what I mean tomorrow - too late to work at this tonight, and I need to confirm that the iteration counts get high enough with waits)

Assuming that waits do fix the issue and I haven't missed something, I'd suspect that something just doesn't get cleaned up soon enough, rather than a leak. The example uses FIFO present mode, which as far as I'm aware requires every frame to get displayed. There's no way my monitor can keep up with my graphics card for something this simple - could the example just be building up a massive chain of futures?

I'll test if/when I get a chance, but please let me know if I'm completely on the wrong track.

@tomaka
Copy link
Member

tomaka commented Jul 12, 2017

In FIFO mode the call to acquire_next_image should already act as a wait (a "wait for vsync"), so I don't think adding another wait would fix it.

The call to cleanup_finished() is here to ensure that the resources used by previous frames are cleaned up: https://github.com/tomaka/vulkano/blob/606427f55f8ab6bae10d02462f76322837b26610/examples/src/bin/triangle.rs#L325

On my machine (and on many people's machines I presume) I can let the examples run for a very long time with no increase in memory, so I don't think there's anything leaking in vulkano itself.

@brandonson
Copy link
Author

brandonson commented Jul 14, 2017

Per this reddit thread and other searching it looks like this is an Nvidia driver bug - acquire_next_image is not blocking the way it should (potentially only under X11). So, seems like things are in fact never getting cleaned up, but if the driver worked properly they would be.

Experimentation indicates this can be worked around by either waiting on the fence returned by then_signal_fence_and_flush (in non-vulkano code) or by waiting for the fence passed to acquire_next_image_raw2 (in vulkano, src/swapchain/swapchain.rs). Using PresentMode::Immediate for the swapchain also appears to fix things, though I suspect there'd be issues if the image actually changed. PresentMode::Mailbox isn't available for my setup yet, unfortunately - theory/google says it should work as well.

Anyway, for the time being I can work around this with a queue of futures and waiting for the oldest one when I've used all images (which I believe should approximate the behaviour I'd get with a properly blocking image acquire, though it's not really ideal). It would be nice if vulkano can expose the fence for acquire_next_image, even if only in an unsafe form - it would allow for a better workaround, and from looking at acquire_next_image_raw2 seems to be planned anyway. Even better would be some change that makes acquire_next_image work as expected, but that may not be possible to do without negatively affecting perf on other setups.

@tomaka
Copy link
Member

tomaka commented Jul 14, 2017

Thanks for the investigation!

either waiting on the fence returned by then_signal_fence_and_flush (in non-vulkano code)

Note that you can do that in vulkano: https://docs.rs/vulkano/0.5.2/vulkano/sync/struct.FenceSignalFuture.html#method.wait

It would be nice if vulkano can expose the fence for acquire_next_image, even if only in an unsafe form

Yes, in general I think the acquire_next_image API needs some rework to allow fences.

@brandonson
Copy link
Author

Hmm, reading again my comment was a bit unclear. I meant non-vulkano as in the wait is outside of vulkano itself, and therefore doesn't require a change to the library. It still uses the API exposed by vulkano (the method you mentioned).

I'd meant to highlight the difference in location compared to the second workaround, which worked by changing code in the vulkano library.

@anxiousmodernman
Copy link

Same here. I experience high (100%) CPU usage on the triangle example, as well with this system:

Ubuntu 16.04; Intel i7; GTX 1060

I have another small PC with Arch and intel m5 (integrated graphics), and everything is swell.

On the GTX 1060, I add this snippet of code just before here, and things settle down.

        // <hacks>
        match future.wait(Some(Duration::from_millis(100))) {
            Ok(x) => x,  // type unit
            Err(err) => println!("err: {:?}", err), // never see this
        }
        // </hacks>
        previous_frame_end = Box::new(future) as Box<_>;

Playing with the duration passed to wait:

  • None - works
  • 100 milliseconds - works
  • 10 milliseconds - no error printed, but a panic occurs: backtrace

@benyb9
Copy link

benyb9 commented Sep 6, 2017

Updating the crossbeam dependency to 0.3 seems to have fixed the leaks I was having.

@ia0
Copy link

ia0 commented Dec 31, 2017

I have the same problem (after ~500 iterations) with GeForce GTX 970. It would be indeed nice to have a way to wait on the next image to be acquired at each iteration instead of building a huge pile of futures. Artificially slowing down the loop (with 50ms pause) works for me as a workaround.

@AustinJ235
Copy link
Member

Please refer to #1247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants