Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(static-files): break Arc reference cycle #6795

Merged
merged 26 commits into from
Feb 27, 2024

Conversation

shekhirin
Copy link
Collaborator

@shekhirin shekhirin commented Feb 26, 2024

Ethereum State Tests were failing on st_bad_opcode and st_time_consuming because of CI OOMs caused by memory leaks due to StaticFileProvider and StaticFileProviderRW being referenced by each other.

We fix that by making a reference from StaticFileProviderRW to StaticFileProvider weak, see https://doc.rust-lang.org/std/sync/struct.Arc.html#breaking-cycles-with-weak.

pub struct StaticFileProviderRW {
/// Reference back to the provider. We need [Weak] here because [StaticFileProviderRW] is
/// stored in a [dashmap::DashMap] inside the parent [StaticFileProvider].which is an [Arc].
/// If we were to use an [Arc] here, we would create a reference cycle.
reader: Weak<StaticFileProviderInner>,
pub struct StaticFileProviderInner {
/// Maintains a map which allows for concurrent access to different `NippyJars`, over different
/// segments and ranges.
map: DashMap<(BlockNumber, StaticFileSegment), LoadedJar>,
/// [`StaticFileProvider`] manages all existing [`StaticFileJarProvider`].
#[derive(Debug, Default, Clone)]
pub struct StaticFileProvider(pub(crate) Arc<StaticFileProviderInner>);

Additionally, I added parallelism to Ethereum State Tests. Some tests have up to 20k cases, so we can leverage parallel runs here.

reth (alexey/fix-state-tests) for dir in testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/st*; do echo -n $dir" "; jq -s '[flatten[] | length] | add' $dir/*.json; done | sort -k2 -n -r | head -n 10
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stTimeConsuming 25950
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stBadOpcode 13122
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stEIP1559 5536
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stPreCompiledContracts 4261
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stZeroKnowledge 4000
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stMemoryTest 2847
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stZeroKnowledge2 2595
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stStaticCall 2390
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stSStoreTest 2376
testing/ef-tests/ethereum-tests/BlockchainTests/GeneralStateTests/stStackTests 1875

It should be fine memory-wise, because currently tests take 600MB max, and our CI machine has 4 cores and 16GB RAM.


Valgrind also successfully detects a memory leak on a minimal repro:

ubuntu@reth4:~/reth$ valgrind --leak-check=full --num-callers=500 target/debug/examples/static-files
==2952982== Memcheck, a memory error detector
==2952982== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2952982== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2952982== Command: target/debug/examples/static-files
==2952982==
==2952982==
==2952982== HEAP SUMMARY:
==2952982==     in use at exit: 9,048,887 bytes in 12 blocks
==2952982==   total heap usage: 42 allocs, 30 frees, 9,084,662 bytes allocated
==2952982==
==2952982== 9,048,887 (14,336 direct, 9,034,551 indirect) bytes in 1 blocks are definitely lost in loss record 12 of 12
==2952982==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2952982==    by 0x14BE7F: alloc (alloc.rs:98)
==2952982==    by 0x14BE7F: alloc::alloc::Global::alloc_impl (alloc.rs:181)
==2952982==    by 0x14C768: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:241)
==2952982==    by 0x156D4B: alloc::raw_vec::RawVec<T,A>::allocate_in (raw_vec.rs:199)
==2952982==    by 0x139356: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter (raw_vec.rs:145)
==2952982==    by 0x13C6AD: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter (spec_from_iter.rs:33)
==2952982==    by 0x13C528: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter (mod.rs:2791)
==2952982==    by 0x144AFD: core::iter::traits::iterator::Iterator::collect (iterator.rs:2054)
==2952982==    by 0x1474EF: <alloc::boxed::Box<[I]> as core::iter::traits::collect::FromIterator<I>>::from_iter (boxed.rs:2043)
==2952982==    by 0x144AD9: core::iter::traits::iterator::Iterator::collect (iterator.rs:2054)
==2952982==    by 0x16CF4C: dashmap::DashMap<K,V,S>::with_capacity_and_hasher_and_shard_amount (lib.rs:284)
==2952982==    by 0x16C7D5: dashmap::DashMap<K,V,S>::with_capacity_and_hasher (lib.rs:229)
==2952982==    by 0x16C5D4: dashmap::DashMap<K,V,S>::with_hasher (lib.rs:212)
==2952982==    by 0x16C3F0: <dashmap::DashMap<K,V,S> as core::default::Default>::default (lib.rs:118)
==2952982==    by 0x12EE72: reth_provider::providers::static_file::manager::StaticFileProviderInner::new (manager.rs:91)
==2952982==    by 0x12EC5E: reth_provider::providers::static_file::manager::StaticFileProvider::new (manager.rs:52)
==2952982==    by 0x12E608: static_files::main (static-files.rs:9)
==2952982==    by 0x12E2CA: core::ops::function::FnOnce::call_once (function.rs:250)
==2952982==    by 0x12D67D: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:155)
==2952982==    by 0x12EB70: std::rt::lang_start::{{closure}} (rt.rs:166)
==2952982==    by 0x253C40: std::rt::lang_start_internal (function.rs:284)
==2952982==    by 0x12EB49: std::rt::lang_start (rt.rs:165)
==2952982==    by 0x12E80D: main (in /home/ubuntu/reth/target/debug/examples/static-files)
==2952982==
==2952982== LEAK SUMMARY:
==2952982==    definitely lost: 14,336 bytes in 1 blocks
==2952982==    indirectly lost: 9,034,551 bytes in 11 blocks
==2952982==      possibly lost: 0 bytes in 0 blocks
==2952982==    still reachable: 0 bytes in 0 blocks
==2952982==         suppressed: 0 bytes in 0 blocks
==2952982==
==2952982== For lists of detected and suppressed errors, rerun with: -s
==2952982== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

@shekhirin shekhirin changed the title bump state tests timeout to 10m fix(static-files): resolve Arc reference cycle Feb 27, 2024
@shekhirin shekhirin added C-bug An unexpected or incorrect behavior A-static-files Related to static files labels Feb 27, 2024
@shekhirin shekhirin marked this pull request as ready for review February 27, 2024 12:19
@shekhirin shekhirin changed the title fix(static-files): resolve Arc reference cycle fix(static-files): break Arc reference cycle Feb 27, 2024
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fine

let start = Instant::now();

let static_file_provider = StaticFileProvider(reader.upgrade().unwrap());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs .expects

and a note that this is fine because this is only accessible as a RefMut, so the owned type always exists,
in other words it's not possible to detach the StaticFileProviderRW from the StaticFileProvider

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -1,21 +1,17 @@
# Runs unit tests.

name: unit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lsp formatting, reverted

}
// Drop the provider without committing to the database.
drop(provider);
// TODO: replace with `tempdir` usage, so the temp directory is removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shekhirin shekhirin merged commit 4937c00 into feat/static-files Feb 27, 2024
25 checks passed
@shekhirin shekhirin deleted the alexey/fix-state-tests branch February 27, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-static-files Related to static files C-bug An unexpected or incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants