-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make test_esplora_syncs
more robust
#2033
Make test_esplora_syncs
more robust
#2033
Conversation
ae0002c
to
6d8dff9
Compare
Just got another report of a
In this case the error source seems to be |
That shounds like an underlying socket implementation is buggy. |
Yeah, got similar reports before, but so far the hypothesis was that some process (virus scanner or similar) would have blocked/reset/messed with the RPC calls. While retrying might work for us in this instance (although not fully, as it still means we mine more blocks than necessary), this probably should be reported and fixed somewhere upstream. Will see if I can reliably reproduce the behavior on a really low-resource machine. |
Right, it should be fixed in whatever is doing the actual socket reads - that should retry on the socket. IIRC read calls are allowed to spuriously EAGAIN/EWOULDBLOCK (which are the same on Linux), so we always have to try again to be standards conformant. |
Good point. Seems like a pretty common issue in the space. Source is likely TLDR: might be a while until upstream is fixed. I wonder if I should add another simpler test case somehow, and enable the error prone one for now, if it keeps failing our CI? |
Let's wait and see what kind of failure rate we see. So far I've only seen it once so it may not be worth worrying about. |
Alright. As we don't use the result anyways and we still wait for enough blocks to arrive, I for now just removed the |
I'm getting this error at https://github.com/lightningdevkit/rust-lightning/actions/runs/4198096993/jobs/7281278685:
Should this be fixed by this PR? I think it's unrelated to the linked PR |
Oh wow, this is a new one. Looks like this is not coming from on our side, but while building the used |
Yeah, seems this is simply an error coming from Github failing to reach |
let cur_height = get_bitcoind().client.get_block_count().expect("failed to get current block height"); | ||
let address = get_bitcoind().client.get_new_address(Some("test"), Some(AddressType::Legacy)).expect("failed to get new address"); | ||
// TODO: expect this Result once the WouldBlock issue is resolved upstream. | ||
let _block_hashes_res = get_bitcoind().client.generate_to_address(num as u64, &address); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the blocks actually being mined here such that not unwrapping the error actually moves things forward? Otherwise, won't the test just fail in wait_for_block
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from the user reports I got it indeed seems to be the case that bitcoind
keeps generating the blocks in the background. So we should pick them up eventually via wait_for_block
.
Feel free to squash. IMO it doesn't really hurt to merge this since it's just an extra second delay (though hard to tell yet if that's enough), even if we're not seeing failures often. |
The test generally works and the parts in question *should* never fail. However, they did, so we give the test server instances some leeway.
1058150
to
e878f75
Compare
Squashed without further changes. I think probably the biggest improvement in terms of robustness here is the retrying of failing calls. Also agree regarding the added delays: 1 second is really arbitrarily chosen and could go both ways: it might be to little in some cases, but also likely is an unnecessary delay in most cases. We could also leave them out for now and see if most issues are already addressed by the other fixes? |
`OnceCell` doesn't call `drop`, which makes the spawned `bitcoind`/`electrsd` instances linger around after our tests have finished. To fix this, we move them out of `OnceCell` and let every test that needs them spawn their own instances. This additional let us drop the `OnceCell` dev dependency. Additionally, we improve the test robustness by applying most of the changes from lightningdevkit/rust-lightning#2033.
`OnceCell` doesn't call `drop`, which makes the spawned `bitcoind`/`electrsd` instances linger around after our tests have finished. To fix this, we move them out of `OnceCell` and let every test that needs them spawn their own instances. This additional let us drop the `OnceCell` dev dependency. Additionally, we improve the test robustness by applying most of the changes from lightningdevkit/rust-lightning#2033.
`OnceCell` doesn't call `drop`, which makes the spawned `bitcoind`/`electrsd` instances linger around after our tests have finished. To fix this, we move them out of `OnceCell` and let every test that needs them spawn their own instances. This additional let us drop the `OnceCell` dev dependency. Additionally, we improve the test robustness by applying most of the changes from lightningdevkit/rust-lightning#2033.
* Add inital implementation of persisted payment store * Implement `KVStoreUnpersister` * Reorganize IO things into `io` submodules * Reorganize test folder for consistency * Move test `bitcoind`/`electrsd` out of `OnceCell` `OnceCell` doesn't call `drop`, which makes the spawned `bitcoind`/`electrsd` instances linger around after our tests have finished. To fix this, we move them out of `OnceCell` and let every test that needs them spawn their own instances. This additional let us drop the `OnceCell` dev dependency. Additionally, we improve the test robustness by applying most of the changes from lightningdevkit/rust-lightning#2033. * Introduce `KVStore` trait and `FilesystemStore` impl Rather than further relying on the upstream `KVStorePersister`/`KVStoreUnpersister`, we here implement a general `KVStore` trait that allows access to `Read`s/`TransactionalWrite`s which may be used to deserialize/serialize data via the `Readable`/`Writeable` implementations. Notably `TransactionalWrite` is a `Write` for which the written data needs to be explictly `commit`ed, asserting that we always persist either the whole new change or no change at all. Additionally, we avoid the `Info` umbrella term but opt to introduce `PaymentDetails` to align naming with upcoming `PeerDetails` and `ChannelDetails`. * Unpin BDK/esplora-client
Fixes #2032.
The test generally works fine and the calls in question should never fail. However, they did break eventually in CI.
I'm guessing that in the case of #2032 the
electrsd
for some reason hadn't finished launching yet when we tried to to access its interface. Therefore this PR adds some changes that should give thebitcoind
/electrsd
server instances some more time to fully get setup and generally some more leeway.