Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FN/BN CPU BBQ #3630

Closed
Wondertan opened this issue Aug 6, 2024 · 2 comments · Fixed by #3634
Closed

FN/BN CPU BBQ #3630

Wondertan opened this issue Aug 6, 2024 · 2 comments · Fixed by #3634
Assignees
Labels
bug Something isn't working priority:high High priority issue to be prioritized for the current/following sprint v0.17.0 Intended for v0.17.0 release

Comments

@Wondertan
Copy link
Member

Wondertan commented Aug 6, 2024

Overview

Recent spikes in network activity increased resource consumption for FN/BN node operators to inadequate levels.

Problem Description

The flamegraph shown below is from the profile captured at the exact moment where such a spike happened.
image

The profile suggest that the most of the time spent happens during GetSize operation on the Blockstore. FN/BN uses GetSize to determine whether it has data or not. This check is triggered by remote Bitswap requests from LNs. Before requesting data itself, LNs check which peers have the data by broadcasting these requests to all connected peers(in worst case). Once they know who has the data, they request a particular peer for the data.

As it turns out, checking existence of data is an expensive operation which is as expensive as getting the data itself. This comes from the inverted_index the main bottleneck of the celestia-node(at the moment of writing). This index is maintained within KVStore that is enormous and simple lookup operations are the source of the CPU overhead we seeing.

Overall, we have LNs broadcasting data checks for every sample operations, which, in fact, consists of multiple requests where each of those multiple requests get duplicated to every connected peer in the worst case, creating massive load on the network.

Potential Solutions

  • Wait for Shwap which particularly targets this issue and many others.
    • This was the initial strategy, but BBQ is getting hotter so we have to come with an iterim solution.
  • Prune inverted index. We don't need to index outside availability window, while the index is not pruned and evergrowing.
    • Reducing index size will speed up queries to KVStore.
    • Complex and pruning itself will induce more pressure on KVstore in use.
  • Remove inverted index by doing quick change to sampling protocol.
    • Every single sampling requests would then contain DataRoot/Hash routing FN to respective EDS CAR file.
    • Breaking protocol change that feels simple, but is still protocol-breaking and potentially full of unknown uknowns.
  • Adjust topology. Reducing the number of peers LNs connects will reduce the amount of broadcasts the node does.
    • Only reduces the load, but does not solve the issue.
  • Dummy solution. For the described data existence pre-checks, FN/BNs always report like they have the data, without doing lookups into inverted index.
    • Extremely simple and non-breaking. The prevailing precheck msgs won't cause any load, so it solves the issue.
    • The correctly running FN/BN should have that data anyway.
    • If a FN/BN falls behind or is in syncing stage, it will technically lie to LN requesting data. However, the LN will use other FN after a timeout.
    • Similarly affects historical Blob requests over IPLDGetter as pruned node will report that they have data, while in fact they node.

Decision

After internal discussion we decided to go with "Dummy Solution". It solves the problem in a simple and fast way and tradeoffs it brings are negligible.

We may still do additionally the solution that removes inverted index entirely to win more time if Shwap gets delayed or if there is more pressure on achieving bigger blocks that inverted index is blocking as well

@Wondertan Wondertan added bug Something isn't working priority:high High priority issue to be prioritized for the current/following sprint labels Aug 6, 2024
@renaynay renaynay added the v0.17.0 Intended for v0.17.0 release label Aug 6, 2024
@Wondertan
Copy link
Member Author

The implementation for the dummy solution is quite straighforward.

Here instead of doing the expensive checks, we simply return the data size, which in our case is constant. Intuitively, returning the completely arbritrary or approximate size there should work, but to minimize risks we could still return the factual sizes.

There are two types of data we serve there. Leaf and inner nodes with their respective sizes defined here. Detecting which type we are requested with is tricky and might not be possible. Quic look into the old protocol made me realize that we don't have different CID codecs for leaf or inner node requests and we unified them to be under the same code.

If we won't find a way to determine which type of request that is by simply looking at the CID, we should try returning the max size, which is the size of leaf or the share.

@Wondertan
Copy link
Member Author

Related issue that would simplify our life: ipfs/boxo#657

But the proposed implementation above is not blocked on that and we can adopt the change in boxo once it gets shipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:high High priority issue to be prioritized for the current/following sprint v0.17.0 Intended for v0.17.0 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants