Skip to content

Conversation

LesnyRumcajs
Copy link
Contributor

This exposes an alternative iteration method without caching. Without it, the memory usage for iterating over deal proposals is too high (my 64 GiB machine gets knocked out under a minute). For reference, this works fine in Go's AMT implementation.

With for_each_cacheless, I can iterate the proposals in ~90s (no-op iteration).

The logic is mostly taken from the for_each_while_mut.

@github-project-automation github-project-automation bot moved this to 📌 Triage in FilOz Jul 31, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.56%. Comparing base (9273717) to head (602ac18).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2189      +/-   ##
==========================================
+ Coverage   77.50%   77.56%   +0.06%     
==========================================
  Files         147      147              
  Lines       15743    15789      +46     
==========================================
+ Hits        12201    12247      +46     
  Misses       3542     3542              
Files with missing lines Coverage Δ
ipld/amt/src/amt.rs 83.63% <100.00%> (+0.74%) ⬆️
ipld/amt/src/node.rs 87.56% <100.00%> (+1.22%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LesnyRumcajs
Copy link
Contributor Author

CI blocked by #2187

@rvagg
Copy link
Member

rvagg commented Aug 1, 2025

The only concern I have here is that this reportedly reads twice as many blocks as the cached version which seems off to me. I'd like to figure out what's going on there first.

@LesnyRumcajs
Copy link
Contributor Author

Three iterations in the tests lead to this block read count.

  1. Dirty cache - no block reads at all

    // Iterate over amt with dirty cache
    let mut x = 0;
    a.for_each_cacheless(|_, _: &BytesDe| {
    x += 1;
    Ok(())
    })
    .unwrap();

  2. 1st pass

new_amt
.for_each_cacheless(|i, _: &BytesDe| {
if i != indexes[x] {
panic!(
"for each cacheless found wrong index: expected {} got {}",
indexes[x], i
);
}
x += 1;
Ok(())
})
.unwrap();

  1. 2nd pass

new_amt.for_each_cacheless(|_, _: &BytesDe| Ok(())).unwrap();

Now, given that the only block read operation (and cache read) occurs here, it makes sense.

Link::Cid { cid, cache: _ } => {
let node = bs
.get_cbor::<CollapsedNode<V>>(cid)?
.ok_or_else(|| Error::CidNotFound(cid.to_string()))?
.expand(bit_width)?;
node.for_each_cacheless(bs, height - 1, bit_width, offs, f)?;

All entries are "normally" cached on the 1st pass, but in the for_each_cacheless there is none, so the 2nd pass has to re-read all blocks (and not grab it from the cache) as it is here:

match cache.get_or_try_init(|| {
self.blockstore
.get_cbor::<CollapsedNode<V>>(cid)?
.ok_or_else(|| Error::CidNotFound(cid.to_string()))?
.expand(self.bit_width)
.map(Box::new)
}) {

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs docs, but lgtm
can be rebased on master because the other PR is merged now

@LesnyRumcajs LesnyRumcajs force-pushed the amt-for-each-cacheless branch from bf6fc44 to d782519 Compare August 1, 2025 09:16
@LesnyRumcajs LesnyRumcajs force-pushed the amt-for-each-cacheless branch from 1ced61e to 9b21dbd Compare August 1, 2025 10:14
//!
//! Data structure reference:
//! https://github.com/ipld/specs/blob/51fab05b4fe4930d3d851d50cc1e5f1a02092deb/data-structures/vector.md
//! <https://github.com/ipld/specs/blob/51fab05b4fe4930d3d851d50cc1e5f1a02092deb/data-structures/vector.md>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: this URL is not a hyperlink
 --> ipld/amt/src/lib.rs:8:5
  |
8 | //! https://github.com/ipld/specs/blob/51fab05b4fe4930d3d851d50cc1e5f1a02092deb/data-structures/vector.md
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: bare URLs are not automatically turned into clickable links
  = note: `#[warn(rustdoc::bare_urls)]` on by default
help: use an automatic link instead
  |
8 | //! <https://github.com/ipld/specs/blob/51fab05b4fe4930d3d851d50cc1e5f1a02092deb/data-structures/vector.md>
  |     +                                                                                                     +

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other doc links issues in the repo; I'll have a look later and see if there can be a friendly CI check to ensure this doesn't happen in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LesnyRumcajs LesnyRumcajs marked this pull request as ready for review August 1, 2025 10:25
@LesnyRumcajs LesnyRumcajs changed the title [draft] feat: cacheless amt iteration feat: cacheless amt iteration Aug 1, 2025
Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice
I bet the HAMT could get exactlty the same treatment

@github-project-automation github-project-automation bot moved this from 📌 Triage to ✔️ Approved by reviewer in FilOz Aug 4, 2025
@LesnyRumcajs LesnyRumcajs merged commit a633547 into master Aug 4, 2025
16 checks passed
@LesnyRumcajs LesnyRumcajs deleted the amt-for-each-cacheless branch August 4, 2025 17:25
@github-project-automation github-project-automation bot moved this from ✔️ Approved by reviewer to 🎉 Done in FilOz Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

3 participants