-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of page faults occurring in fn rav1d_submit_frame
#1358
Comments
fn rav1d_submit_frame
We could use other allocators like |
It's possible that if we just changed this to use an explicitly zeroed allocation, it will be optimized enough. If the performance of that is good, that's much easier than pooling it again and managing the lifetime for that. |
The optimization Rust does for zeroed allocations is only done for types that implement @ivanloz, if you want to see if doing this for |
|
While I've got experience tracking down performance issues, I'm a bit of a novice when it comes to actually writing Rust. This sounds like it should be a relatively straightforward change to put together, but I haven't had much luck implementing it. If you (or someone else) can provide the patch I'd be happy to test it out and report the results. |
Sure, I'll work on it when I get some more time. If there's not much of a rush either, we'll also probably want to wait for |
Hey @ivanloz, I'm trying to look into this further now and fix it. Are you measuring the page-faults, |
Thanks @kkysen -- I'm testing on an Android device so I'm using I collected the raw number of context-switches and page-faults with To see where they were occurring, I used Android provides inferno to inspect the data, and I believe As for the |
@ivanloz, thanks! Also, which files were you testing on? We've been testing on Chimera 10-bit 1080 mostly. I'm also curious if different kinds of videos make a difference here, especially if they do might do things like change frame sizes and stuff like that. |
I'm testing with the Chimera set as well, primarily with the 10-bit 1080 6191kbps file but I believe I've seen this with any of the files I've tested with from that set. I went ahead and tested this on my Linux x86-64 host as well just to make sure this wasn't some artifact from running in Android or ARM64, and I'm seeing the same behavior. It's less pronounced (maybe because of the greater resources), but page-faults grow with frames processed in rav1d vs. dav1d largely holding stable. Again this is with the Chimera 10-bit 1080 6191kbps sample on Linux x86-64.
I tested with #1368 and saw no difference unfortunately. |
(Copied from #1294 (comment) -- @kkysen for visibility, thanks!)
I've been looking into what remaining sources of overhead might be and wanted to chime in with some of my findings.
One major discrepency when running benchmarks I noticed between dav1d and rav1d was an order of magnitude difference in the number of
madvise
calls and and page-faults. I also saw a larger number of context-switches in rav1d than dav1d (likely related to page-faults?). This may explain at least some of the performance difference seen.Digging into this, I found the majority (~82%) of the page-faults in rav1d are coming from
rav1d::src::decode::rav1d_submit_frame
. Specifically, the stack trace points to:<alloc::boxed::Box<[rav1d::src::refmvs::RefMvsTemporalBlock]> as core::iter::traits::collect::FromIterator<rav1d::src::refmvs::RefMvsTemporalBlock>>::from_iter::<core::iter::adapters::map::Map<core::ops::range::Range<usize>, rav1d::src::decode::rav1d_submit_frame::{closure#1}>>
I believe that corresponds to this closure:
rav1d/src/decode.rs
Lines 5223 to 5227 in 7d72409
This is the equivalent operation in dav1d:
rav1d/src/decode.c
Lines 3623 to 3624 in 7d72409
Here
dav1d_submit_frame
allocates usingdav1d_ref_create_using_pool
. This then calls intodav1d_mem_pool_pop
, which allocates from pooled memory (initialized indav1d_mem_pool_init
). This likely reduces the amount of allocator calls.The switch from using pooled memory in rav1d looks to have been introduced as part of 6420e5a, PR #984.
The text was updated successfully, but these errors were encountered: