Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: backport changes to release/v1.30.0 branch #12635

Merged
merged 16 commits into from
Oct 28, 2024

Conversation

rjan90
Copy link
Contributor

@rjan90 rjan90 commented Oct 24, 2024

Related Issues

#12480

Proposed Changes

Backports PRs that has landed with a backport label on them to the release/v1.30.0: https://github.com/filecoin-project/lotus/pulls?q=is%3Apr+label%3Arelease%2Fbackport+is%3Aclosed, to the releases branch.

PRs in question:

Checklist

Before you mark the PR ready for review, please make sure that:

masih and others added 6 commits October 24, 2024 12:00
* Investigate intermittent F3 itest failures on CI

Repeat F3 itests on CI to investigate intermittent failures.

* Fix participation lease removal for wrong network

When manifest changes, depending on the timing it is possible for newly
generated valid leases to get removed if the sign message loop attempts
to sign messages that are as a result of progressing previous network.

Here is an example scenario in a specific order that was causing itests
to fail:
 * participants get a lease for network A up to instance 5
 * network A progresses to instance 6
 * manifest changes the network name to B
 * participants get a new lease for network B up to instance 5
 * sign loop receives a message from network A, instance 6
 * `getParticipantsByInstance` lazily removes leases since it only
   checks the instance.
 * the node ends up with no participants, and stuck.

To fix this:
 1) check if participants asked for are within the current network, and
    if not refuse to participate.
 2) check network name, as well as instance, to lazily remove expired
    leases.

* Add debug capability to F3 itests to print current progress

To aid debugging failing tests add option to print progress of all nodes
at every eventual assertion, disabled by default.

* Shorten GPBFT settings for a more responsive timing

Defaults are based on epoch of 30s and real RTT. Shorten Delta and
rebroadcast times.

* Remove F3 itest repetitions on CI now that saul goodman

See proof of the pudding:
 * https://github.com/filecoin-project/lotus/actions/runs/11369403828/job/31626763159?pr=12597

* Update the changelog

* Address review comments

* Remove the sanity check that all nodes use the same initial manifest
* Implement F3 list participants API

Implement Lotus API to list the current miner IDs that are
participating.

* Include lease information to list participants API

Also return start and validity terms of the list participants API.
* Support Forrest parity testing for F3 participation APIs

For parity testing, Forest nodes need to decode an F3 participation
ticket that is issued by Lotus. Lotus so happens to use CBOR to encode
tickets, which it then decodes to issue a lease.

The lease issuer is of type `peer.ID` which is an alias of type `string`
that may contain non UTF-8 characters. If this type is used directly in
CBOR encoding then it gets encoded as string, which works fine in Golang
but not Rust, which in turn results in decoding issues in Forest.

To avoid this use `[]byte` as the type to encode the issuer public key.
In Lotus, the value will be the binary marshalling of `peer.ID` that
issued the ticket.

While at it, also fix two issues that were brought up during discussion:

* A miner must not be able to ask for a ticket to participate in zero
  instances. Validate and return error if instances is set to zero.

* Use CBOR tuple encoding for a slightly better efficient wire encoding
  of ticket. `cborgen.Write*` APIs explicitly create a given file; hence
  the need to separate the file to which tuple kinds are generated since
  the existing encoding for types in `api/cbor_gen.go` use maps.
Accept any of "", "0", "false", "no" as negative value for
`LOTUS_DISABLE_F3` in which case F3 should remain enabled.
Upgrade to `go-f3` `v0.4.0` removed the option to disable checkpointing
by F3. As a result this build parameter no longer has any effect. Remove
it.

Relates to: #12547
Continuing with the function here will lead to a panic. This can happen,
e.g., if the lotus node stops and/or returns an error while the
participant is waiting for its lease to expire.
@rjan90 rjan90 added the skip/changelog This change does not require CHANGELOG.md update label Oct 24, 2024
@rjan90 rjan90 requested review from masih and Kubuxu October 24, 2024 10:12
@rjan90
Copy link
Contributor Author

rjan90 commented Oct 24, 2024

@masih / @Kubuxu let us discuss if we want to ship a Lotus v1.30.0-rc3 with these backports so we can run some testing with these changes before shipping a final release on 2024-10-28

@masih
Copy link
Member

masih commented Oct 24, 2024

Can we also get this in when merged please?

@masih
Copy link
Member

masih commented Oct 24, 2024

And this please

@masih
Copy link
Member

masih commented Oct 24, 2024

We also need the fix for this in lotus

@rjan90
Copy link
Contributor Author

rjan90 commented Oct 24, 2024

Can we also get #12637 in when merged please?
And #12627 please
We also need the fix for filecoin-project/go-f3#719 in lotus

Yeah, I will wait for those to land and adding those to this backport.

masih and others added 4 commits October 28, 2024 10:22
Implement `lotus f3` CLI sub commands to:

* Get a specific finality certificate, either latest or by instance ID.
* List a range of finality certificates

Part of #12607
Fix the issue by instantiating pointers to sentinel F3 error values and
assert that errors indeed pass through via an integration test.

Fixes #12630
…ng progress (#12640)

Previously, the flow was:

1. Get ticket.
2. Participate.
3. Repeatedly poll progress.

The new flow is:

1. Get ticket.
2. Repeatedly participate, using the returned lease as an indicator of progress.

That way, if the lotus node reboots we'll eventually re-tell them about
the lease.

fixes filecoin-project/go-f3#719
@rjan90
Copy link
Contributor Author

rjan90 commented Oct 28, 2024

@masih all PRs with a backport label has now been added to this PR. I´m failing to complete the make gen to make that check green, because it fails here:

go build -ldflags="-X=github.com/filecoin-project/lotus/build.CurrentCommit=+git.d3b014faf" -o lotus ./cmd/lotus

github.com/zondax/hid

vendor/github.com/zondax/hid/hid_enabled.go:39:11: fatal error: 'hidapi/mac/hid.c' file not found
#include "hidapi/mac/hid.c"
^~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [lotus] Error 1

And I can´t see that I´m touching anything with regards to that here. Could you maybe help me out with that?

@masih
Copy link
Member

masih commented Oct 28, 2024

@rjan90 I am not sure about the fatal error: 'hidapi/mac/hid.c' file not found you are observing. It runs on my machine. re-generation pushed.

Try: go mod tidy before running the docsgen-cli command?

@rjan90
Copy link
Contributor Author

rjan90 commented Oct 28, 2024

I am not sure about the fatal error: 'hidapi/mac/hid.c' file not found you are observing. It runs on my machine. re-generation pushed.

Hmmmm.. I will have to do some fixing on my end it seems, go mod tidy did not sort it, but good that it runs on your machine.

Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
From archioz:
```
> lotus f3 manifest
Manifest:
  Protocol Version:     4
  Paused:               false
  Initial Instance:     0
  Initial Power Table:  bafy2bzaceab236vmmb3n4q4tkvua2n4dphcbzzxerxuey3mot4g3cov5j3r2c
  Bootstrap Epoch:      2081674
  Network Name:         calibrationnet
  Ignore EC Power:      false
  Committee Lookback:   10
  Catch Up Alignment:   15s
```

Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
Copy link
Contributor

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGWM we just need #12650 for rc3

* feat(chain): use the index when determining if A is an ancestor of B

That way we efficiently accept old and checkpoint finality certificates.

* feat(chain): allow checkpoints beyond finality but prevent forks

1. Allow setting checkpoints more than 900 epochs ago as long as said
checkpoint doesn't cause us to _revert_ more than 900 epochs.

2. Verify that we don't end up reverting more than 900 epochs
when switching to a chain at or beyond the current head.

3. Optimize all of this such that, e.g., setting checkpoints in the
distant past can use the chain index instead of having to walk back the
chain manually. We may still need to walk up to 900 epochs to make sure
we're not forking beyond finality, but that's it.

* chore(chain): remove ChainStore.NearestCommonAncestor

It's unused, inefficient, and a potential DoS vector if someone decides
to use it.
Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
@rjan90 rjan90 merged commit f4f4289 into release/v1.30.0 Oct 28, 2024
87 checks passed
@rjan90 rjan90 deleted the phi/backports-v1300 branch October 28, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip/changelog This change does not require CHANGELOG.md update
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

6 participants