Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

payments channel lanes appear broken #2297

Closed
raulk opened this issue Jul 7, 2020 · 3 comments
Closed

payments channel lanes appear broken #2297

raulk opened this issue Jul 7, 2020 · 3 comments

Comments

@raulk
Copy link
Member

raulk commented Jul 7, 2020

Context

During the execution of the payment channels stress test, I started seeing errors when sending a payment voucher on a newly-allocated lane. I was testing with 256 lanes initially, then down to 8 lanes, but ended up bringing it down to 1 lane to isolate the faults.

The interesting test logic starts here: https://github.com/filecoin-project/oni/pull/109/files#diff-cd0cffda40d3b3034e745c5e679091f7R66

Testing against Lotus 99b5ec9.

Problem

We start seeing (recovered) nil pointer panics from specs-actors v0.6.1 (lotus master) when sending payment vouchers on newly-allocated lanes:

Jul  7 09:58:36.005027	INFO	20.0336s      ERROR << instance   3 >> 2020-07-07T10:58:36.004+0100	ERROR	vm	vm/runtime.go:149	spec actors failure: runtime error: invalid memory address or nil pointer dereference	{"req_id": "d2d63420"}
Jul  7 09:58:36.005078	INFO	20.0337s      ERROR << instance   3 >> 2020-07-07T10:58:36.005+0100	WARN	vm	vm/vm.go:405	Send actor error	{"from": "t3vq6m3rzi6wuq66endy6drayxivittb6n6i2dgckcb35smdcbvr3myu37wmdkjnuubltjxl3gduvy67wzmfzq", "to": "t2ycqb6tpybtsodvastxi6enyxcu3wxu33lrb423a", "nonce": 2, "method": "2", "height": "9", "error": "spec actors failure: runtime error: invalid memory address or nil pointer dereference (RetCode=1):\n    github.com/filecoin-project/lotus/chain/vm.(*Runtime).shimCall.func1\n        /Users/raul/go/pkg/mod/github.com/filecoin-project/lotus@v0.4.2-0.20200703133307-d3a1261f1e81/chain/vm/runtime.go:150"}	{"req_id": "d2d63420"}
Jul  7 09:58:36.005095	INFO	20.0337s      ERROR << instance   2 >> 2020-07-07T10:58:36.005+0100	ERROR	vm	vm/runtime.go:149	spec actors failure: runtime error: invalid memory address or nil pointer dereference	{"req_id": "d2d63420"}
Jul  7 09:58:36.005193	INFO	20.0338s      ERROR << instance   2 >> 2020-07-07T10:58:36.005+0100	WARN	vm	vm/vm.go:405	Send actor error	{"from": "t3vq6m3rzi6wuq66endy6drayxivittb6n6i2dgckcb35smdcbvr3myu37wmdkjnuubltjxl3gduvy67wzmfzq", "to": "t2ycqb6tpybtsodvastxi6enyxcu3wxu33lrb423a", "nonce": 2, "method": "2", "height": "9", "error": "spec actors failure: runtime error: invalid memory address or nil pointer dereference (RetCode=1):\n    github.com/filecoin-project/lotus/chain/vm.(*Runtime).shimCall.func1\n        /Users/raul/go/pkg/mod/github.com/filecoin-project/lotus@v0.4.2-0.20200703133307-d3a1261f1e81/chain/vm/runtime.go:150"}	{"req_id": "d2d63420"}
Jul  7 09:58:36.005211	INFO	20.0338s      ERROR << instance   3 >> 2020-07-07T10:58:36.005+0100	ERROR	vm	vm/runtime.go:149	spec actors failure: runtime error: invalid memory address or nil pointer dereference	{"req_id": "d2d63420"}
Jul  7 09:58:36.005239	INFO	20.0338s      ERROR << instance   2 >> 2020-07-07T10:58:36.005+0100	ERROR	vm	vm/runtime.go:149	spec actors failure: runtime error: invalid memory address or nil pointer dereference	{"req_id": "d2d63420"}
Jul  7 09:58:36.005313	INFO	20.0339s      ERROR << instance   3 >> 2020-07-07T10:58:36.005+0100	WARN	vm	vm/vm.go:405	Send actor error	{"from": "t3ru5aqkzkfmslgpq7orisp4il6uvnwc4qcqz7aoqekmz3bjcz4zajkvqcuqywqpabvwbdfvw4xiguh6l4tqbq", "to": "t25y2vorn2rsizzuy5y5aijheklh7iakgwebf3uoi", "nonce": 2, "method": "2", "height": "9", "error": "spec actors failure: runtime error: invalid memory address or nil pointer dereference (RetCode=1):\n    github.com/filecoin-project/lotus/chain/vm.(*Runtime).shimCall.func1\n        /Users/raul/go/pkg/mod/github.com/filecoin-project/lotus@v0.4.2-0.20200703133307-d3a1261f1e81/chain/vm/runtime.go:150"}	{"req_id": "d2d63420"}
Jul  7 09:58:36.005363	INFO	20.0339s      ERROR << instance   2 >> 2020-07-07T10:58:36.005+0100	WARN	vm	vm/vm.go:405	Send actor error	{"from": "t3ru5aqkzkfmslgpq7orisp4il6uvnwc4qcqz7aoqekmz3bjcz4zajkvqcuqywqpabvwbdfvw4xiguh6l4tqbq", "to": "t25y2vorn2rsizzuy5y5aijheklh7iakgwebf3uoi", "nonce": 2, "method": "2", "height": "9", "error": "spec actors failure: runtime error: invalid memory address or nil pointer dereference (RetCode=1):\n    github.com/filecoin-project/lotus/chain/vm.(*Runtime).shimCall.func1\n        /Users/raul/go/pkg/mod/github.com/filecoin-project/lotus@v0.4.2-0.20200703133307-d3a1261f1e81/chain/vm/runtime.go:150"}	{"req_id": "d2d63420"}

Taking a stroll down Debugging Lane

Debugging away, I found that the ChannelInfo is storing a suspicious NextLane value, which looks like max(int64) + an increment.

image

The piece of code that generates this value seems to be maxLaneFromState():

lotus/paychmgr/paych.go

Lines 57 to 65 in 8233143

func maxLaneFromState(st *paych.State) (uint64, error) {
maxLane := uint64(math.MaxInt64)
for _, state := range st.LaneStates {
if (state.ID)+1 > maxLane+1 {
maxLane = state.ID
}
}
return maxLane, nil
}

If there are no lanes (as is the case), this method will return max int64, which then specs-actors doesn't appreciate.

maxLaneFromState() is called from loadOutboundChannelInfo(), which gets called from waitForPaychCreateMsg() (and other places, but this is the relevant one for us).

When we first create a channel, waitForPaychCreateMsg() waits until the message is posted on chain, followed by build.MessageConfidence confirmations, and "imports" the paych actor's state into our paych store (storing a ChannelInfo struct).

Findings/bugs

  1. The logic for lane management seems broken in the way stated above. I'm assuming that the lane index should start off from 0 or 1. Although it's worth looking into specs-actors too. It shouldn't panic in this way.
    • This is a blocker for further testing this scenario.
  2. I wasn't able to locate any code that keeps the channels that are being "tracked" in the paych store in sync with the actors state on chain.
    • This code assumes that no other node will be altering the payment state outside of this node, which is a false assumption IMO.
@raulk raulk changed the title payments channel lanes features appears broken payments channel lanes appear broken Jul 7, 2020
@raulk
Copy link
Member Author

raulk commented Jul 7, 2020

cc @hannahhoward @anorth @schomatis

@raulk
Copy link
Member Author

raulk commented Jul 7, 2020

BTW -- we really should have tests for this package. I'm pretty sure there are more dragons lurking inside. Maybe Oni can help with that?

@dirkmc
Copy link
Contributor

dirkmc commented Aug 25, 2020

Closing as this should now be fixed

@dirkmc dirkmc closed this as completed Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants