Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate whether we can match Lotus' snapshots byte-for-byte #1884

Closed
5 tasks done
lemmih opened this issue Sep 7, 2022 · 10 comments · Fixed by #2540
Closed
5 tasks done

Investigate whether we can match Lotus' snapshots byte-for-byte #1884

lemmih opened this issue Sep 7, 2022 · 10 comments · Fixed by #2540
Assignees
Labels
Priority: 4 - Low Limited impact and can be implemented at any time Ready Issue is ready for work and anyone can freely assign it to themselves Status: Needs Triage Issue has unresolved discussions and/or needs to be assigned a priority and assignee Type: Bug Something isn't working

Comments

@lemmih
Copy link
Contributor

lemmih commented Sep 7, 2022

Issue summary

Both Lotus and Forest has the ability to generate snapshots. However, it has come to light that Forest snapshots fail after a day or two due to unexplained forks in the blockchain. Therefore our snapshots must be different from the snapshots from Lotus and we need to figure out why.

Tasks:

  • Start with fairly recent calibnet snapshot from Lotus. (Bootstrap with a Lotus snapshot from our DO Space and generate a new snapshot.)
  • Initiate both Lotus and Forest with the calibnet snapshot.
  • Export a new snapshot with the same settings (epoch, recent stateroots, etc) from both Forest and Lotus.
  • Check if they are exactly the same.
  • If they are not, skim through the Lotus and Forest code to find differences. Add those differences to a new issue.

Other information and links

Lotus snapshots for calibnet: https://cloud.digitalocean.com/spaces/forest-snapshots?i=88c522&path=lotus-calibnet%2F

@lemmih lemmih added Type: Bug Something isn't working Priority: 1 - Critical Requires immediate attention Status: Needs Triage Issue has unresolved discussions and/or needs to be assigned a priority and assignee Ready Issue is ready for work and anyone can freely assign it to themselves labels Sep 7, 2022
@lemmih lemmih added this to the Calibration network support milestone Sep 7, 2022
@lemmih lemmih added Priority: 3 - Medium Nice-to-have, does not impede core functionality and removed Priority: 1 - Critical Requires immediate attention labels Sep 12, 2022
@lemmih
Copy link
Contributor Author

lemmih commented Sep 12, 2022

Apparently the unexplained forks also happen with snapshots from Lotus. However, this issue is still important as we would like to prove that our snapshots are valid (and equivalent to those from Lotus).

@lemmih
Copy link
Contributor Author

lemmih commented Oct 11, 2022

The fork issue turned out to be unrelated to how we're generating snapshots. I will close this issue for now since it's not a big priority anymore. Byte-for-byte identical snapshots would be nice but it's definitely not necessary. May re-open this in the future if things change.

@lemmih lemmih closed this as completed Oct 11, 2022
@lemmih lemmih added Priority: 4 - Low Limited impact and can be implemented at any time and removed Priority: 3 - Medium Nice-to-have, does not impede core functionality labels Dec 5, 2022
@lemmih lemmih reopened this Dec 5, 2022
@lemmih
Copy link
Contributor Author

lemmih commented Dec 5, 2022

Re-opening with low priority.

@jdjaustin
Copy link
Contributor

Steps to get result:

  • To get latest snapshot for Forest, run forest-cli --chain calibnet snapshot fetch --snapshot-dir .. If successfully downloaded, snapshot will save with format forest_snapshot_[network]_[date]_height_[epoch].car.
  • To get latest snapshot for Lotus, run forest-cli --chain calibnet snapshot fetch --snapshot-dir . --provider filecoin. If successfully downloaded, snapshot will save with format filecoin_snapshot_[network]_[date]_height_[epoch].car.
  • Import forest... snapshot to Forest with forest --chain calibnet --import-snapshot [file] --encrypt-keystore false.
  • Allow Forest node to run until result of forest-cli sync wait (in separate terminal windw) is Done!.
  • Export Forest snapshot with forest-cli snpashot export. When finished, shut down Forest node before attempting to start the Lotus node.
  • Import filecoin... snapshot to Lotus with lotus daemon --import-snapshot [file] (remember to switch to the proper network with make clean calibnet first, if necesssary).
  • Allow Lotus to run until result of lotus sync wait (in separate terminal windw) is Done!.
  • Export Lotus snapshot with lotus chain export --recent-stateroots 2000 --skip-old-msgs [file].
  • Compare bytes with cmp [Lotus snapshot] [Forest snapshot].

@jdjaustin
Copy link
Contributor

Problem is the files--although they are relatively similar in size--differ at byte 1, line 1, and using the cmp -l option produces a list showing nearly every byte differing. Perhaps each file has a different header, producing an offset that propagates through the files?

@lemmih
Copy link
Contributor Author

lemmih commented Dec 15, 2022

They definitely won't match if the snapshots aren't for the same epoch.

@jdjaustin
Copy link
Contributor

They definitely won't match if the snapshots aren't for the same epoch.

Is there a way to ensure that the snapshots are exported at the same epoch?

@jdjaustin
Copy link
Contributor

I was able to get snapshots from the same epoch. It appears that they start to differ at byte 970488820 and then differ for the rest of the file after that point (lines of output above cmp [Forest snapshot] [Lotus snapshot] are from using -l option flag to show all diffs).
Screenshot from 2023-01-12 11-56-27

@LesnyRumcajs
Copy link
Member

LesnyRumcajs commented Jan 13, 2023

We should not expect the snapshots to match.

Forest logic differs from Lotus in the walk_snapshot method. In particular, Lotus seems to cover more cases (e.g., filecoin-project/lotus#8691).

So the first step towards the direction of snapshot identity would be to match the logic in this method.

@lemmih
Copy link
Contributor Author

lemmih commented Jan 13, 2023

I was able to get snapshots from the same epoch. It appears that they start to differ at byte 970488820 and then differ for the rest of the file after that point (lines of output above cmp [Forest snapshot] [Lotus snapshot] are from using -l option flag to show all diffs).

Great! So they're like 99.9% identical? Have a chat with @LesnyRumcajs about the differences between our walk function and theirs. There might be a simple way to go from 99.9% identical to 100% identical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 4 - Low Limited impact and can be implemented at any time Ready Issue is ready for work and anyone can freely assign it to themselves Status: Needs Triage Issue has unresolved discussions and/or needs to be assigned a priority and assignee Type: Bug Something isn't working
Projects
No open projects
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants