-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] Do not merge: Subxt UnstableBackend: Timeout issue repro #1318
Conversation
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
This reverts commit 63aa514. Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
The issue has been reproduced at: https://github.com/paritytech/subxt/actions/runs/7184693579/job/19591409961 Potential repro factors:
|
The lightclient test might be failing because of: smol-dot/smoldot#1442 From CI https://github.com/paritytech/subxt/actions/runs/7193534907/job/19592205260?pr=1318
|
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Logs reproducing the timeout issue: MemLog: [
(
39.333µs,
Validated,
),
(
2.845179181s,
BestChainBlockIncluded {
block: Some(
TransactionBlockDetails {
hash: 0x59d21905fc5e145ac3c43d9629e978b65e5cfb9abd42d57d76cef1b87f77d152,
index: 1,
},
),
},
),
(
10.597509211s,
Finalized {
block: TransactionBlockDetails {
hash: 0x59d21905fc5e145ac3c43d9629e978b65e5cfb9abd42d57d76cef1b87f77d152,
index: 1,
},
},
),
]
SeenBlocksLog: {} |
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Extended logs https://github.com/paritytech/subxt/actions/runs/7207617809/job/19641867036:
From the logs, the block hash |
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Logs from https://github.com/paritytech/subxt/actions/runs/7450115302/job/20268303243?pr=1318: MemLog: [
(
47.158µs,
Validated,
),
(
1.23551272s,
BestChainBlockIncluded {
block: Some(
TransactionBlockDetails {
hash: 0x5b035c49782c71635672af2dc8fcdbe0da80a56cc8a3b313359a22a3c16e6247,
index: 1,
},
),
},
),
(
8.134685163s,
Finalized {
block: TransactionBlockDetails {
hash: 0x5b035c49782c71635672af2dc8fcdbe0da80a56cc8a3b313359a22a3c16e6247,
index: 1,
},
},
),
]
SeenBlocksLog: {
0xbcd3b1b6b9235f303e726ee85fd2ea6f887392025b2f054123b586e872b8c49d: (
New,
BlockRef {
inner: BlockRefInner {
hash: 0xbcd3b1b6b9235f303e726ee85fd2ea6f887392025b2f054123b586e872b8c49d,
unpin_flags: Mutex {
data: {},
poisoned: false,
..
},
},
},
),
0xd3da03a16da3dda362ec954a43681adc99864387fc0824dda2fe9fcdf70aaaa0: (
Finalized,
BlockRef {
inner: BlockRefInner {
hash: 0xd3da03a16da3dda362ec954a43681adc99864387fc0824dda2fe9fcdf70aaaa0,
unpin_flags: Mutex {
data: {},
poisoned: false,
..
},
},
},
),
0x5b035c49782c71635672af2dc8fcdbe0da80a56cc8a3b313359a22a3c16e6247: (
New,
BlockRef {
inner: BlockRefInner {
hash: 0x5b035c49782c71635672af2dc8fcdbe0da80a56cc8a3b313359a22a3c16e6247,
unpin_flags: Mutex {
data: {},
poisoned: false,
..
},
},
},
),
The transaction class reports |
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Other logs form failures above: tx:
2.8s 0x223f8682299015d50dd00165ce89409da6025eec8257d3bb2b3120f12fe592f2
10s: 0x223f8682299015d50dd00165ce89409da6025eec8257d3bb2b3120f12fe592f2
chainhead:
init: 0xfd337c5cecaa60e83ab9ee8eec1cdbba64fd1495648f6e2889807886ded918a3
18us
new: 0x924bb93101b733a14dc142036cf053f4e95d616616909a7952836a34ba850f43
parent: 0xfd337c5cecaa60e83ab9ee8eec1cdbba64fd1495648f6e2889807886ded918a3
22.4us
new: 0xcd73e9a9969d84ad32d9dbf3532bfe9d18e72481abdf4e61ad9584c685f42bbc
parent: 0x924bb93101b733a14dc142036cf053f4e95d616616909a7952836a34ba850f43
2.8
new: 0x223f8682299015d50dd00165ce89409da6025eec8257d3bb2b3120f12fe592f2
parent: 0xcd73e9a9969d84ad32d9dbf3532bfe9d18e72481abdf4e61ad9584c685f42bbc
4.7s
fin: 0x924bb93101b733a14dc142036cf053f4e95d616616909a7952836a34ba850f43
5.8s
new: 0xd9fd435f5a764bc1b3e14993694b24780a1f16215c6b56822b0df9d8d4b1515d
parent: 0x223f8682299015d50dd00165ce89409da6025eec8257d3bb2b3120f12fe592f2
7.4s
fin: 0xcd73e9a9969d84ad32d9dbf3532bfe9d18e72481abdf4e61ad9584c685f42bbc
8.8s
new: 0xe815e7d6a8a3eb9956969e731b3f8896f4c3b5593a083a52ddbdb291ded6cdd9
parent: 0xd9fd435f5a764bc1b3e14993694b24780a1f16215c6b56822b0df9d8d4b1515d
Issues
Added a couple of tests for expected order of blocks:
However, I could not reproduce the issue locally (the 2 tests mentioned above are passing locally) |
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Thankyou for continuing to dig into this @lexnv! Do you have any ideas on what to try next to get to the bottom of the issue? Its weird that we can't reproduce it locally; I wonder what might be different in CI nodes which leads to it. If the issue seems to be on the Smoldot side, then let's raise an issue there to discuss the problem and provide steps to reproduce (even if only in CI and not reliably; whatever the best we can do is). @tomaka might be able to help us get to the bottom of it faster :) |
This timeout is related to our Unstable Backend.
The issue that handles the light-client is: #1346; I'll use a git-dependency for smoldot to see if the issue is solved on the main branch. |
Considering the small amount of memory logs collected, I expect the issue is related to a second subscription to Tests AnalysisThe flaky issue of timeout happens for different tests:
However, with this patch (panic on out of order events) the issue reproduce constantly for With that patch, I was able to reproduce the issue locally for Issue TimelineTimeline detected by panicking when receiving a 0x1 ------ 0x2 ------ 0x3 ------ 0x4 ------ 0x5 ------ 0x6 ------ 0x7 ------ 0x8
T0 init new new new
T1 fin
T2 new
T3 fin
T4 new
T5 fin
T6 new
T5 fin
Second Subscription
T0 init
T1 new
T2 fin
Root CauseFor the Second subscription:
The blocks 0x6 and 0x7 are not reported because we are only keeping a window of events from the last finalized to the present moment. However this approach will miss the descendant blocks already encountered of the finalized block: subxt/subxt/src/backend/unstable/follow_stream_driver.rs Lines 168 to 170 in d9169e2
|
Aah, I think I see now, thank you! so when you start a subscription, you're told about all new current new blocks first and then get the regular events in. when you start a second follow subscription in subxt, it reuses the existing one and does something like reports the current finalised block etc that have been cached, before reporting back the underlying subscription events. so it sounds like that logic has an issue! Edit: maybe the issue is here:
We clear all events when we see a new finalised block but whoops, that may include some new block events for blocks that have yet to be finalized! We should probably do something like clear events only for blocks that have been mentioned in that finalised event or something (taking care to avoid letting an endless stream of old events that we aren't clearing hit that are no longer relevant accumulate, so think about how all events need to be handled which may end up leading to the cached event structure changing a bit as needed) :) maybe you knew this all already @lexnv and I'm just repeating things; sorry if so :) im on my phone so only skimming things hehe Edit 2:
ah yup, you'd already seen this all :) |
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Continued at: #1358 |
No description provided.