Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with flakes so far #5281

Merged
merged 32 commits into from
Jun 27, 2022

Commits on Jun 26, 2022

  1. lightningd: multiple log-file options allow more than one log output.

    I've wanted this for a while: the ability to log to multiple places
    at once.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Changelog-Changed: lightningd: `log-file` option specified multiple times opens multiple log files.
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    6ed5f31 View commit details
    Browse the repository at this point in the history
  2. pytest: Remove all trace of python's "flaky" module.

    Over time, it has cost us more developer cycles than it has gained.
    It has hidden intermittant bugs, and allowed cruft to accumulate:
    when we eventually tried to figure out what was going wrong, the
    actual change which caused it was now stale and forgotten.
    
    This was a particular bane during the connectd rewrite, and I
    worked through some issues which had occurred before, but were not
    more likely.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    ac7ad63 View commit details
    Browse the repository at this point in the history
  3. pytest: fix flake in test_gossip_timestamp_filter

    We didn't get the second channel_announcement in the filter:
    ```
            # Now choose range which will only give second update.
            msgs = l4.query_gossip('gossip_timestamp_filter',
                                   genesis_blockhash,
                                   after_12 - backdate,
                                   after_23 - after_12 + 1,
                                   filters=['0109', '0012'])
        
            # 0x0100 = channel_announcement
            # 0x0102 = channel_update
            # (Node announcement may have any timestamp)
            types = Counter([m[0:4] for m in msgs])
    >       assert types['0100'] == 1
    E       assert 0 == 1
    ```
    
    Looking at the gossip store:
    
    ```
    1089: t=1653008725 channel_announcement: 0100420c13f233d9b461b035257b77704e0003a0280efdbb8f8964bf4c5bd22b8f8b2e205a3c57d7d5f4dcd812cd41be3d8f84c60046c1fd31abefd0985c2263f8f5074bb7dff548ac106dc10ef4485f5921d9b252d094f9518b44a842c1a2c7f98759fa2f8803febeb59e50b65a8f258877e18f6224262e218de38f21438b2112ef0425d64174092bcd8d757f18982c3fb83c8ae2fb5ee88363248e51bdea247cef055a01a17580b60735b82b88fea87dccb724a743beaf4bf3cf578b0da445afcc786fe3b061429f3288e6970819236e97134898ce900b7b61159148d6785e26932e21f1c337cb6f86a16e967e27396b168a19b0d1478c4be2579a9b96aa0125aa000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e0000010000022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d02d595ae92b3544c3250fb772f214ad8d4c51425033740a5bcc357190add6d7e7a02d6063d022691b2490ab454dee73a57c6ff5d308352b461ece69f3c284f2c2412
    1533: channel_amount: 100000sat
    1555: t=1653008725 channel_update: 01022b98e7d680ffa8096d0a2bda62c0b8b3fc975a5f91c0cff0a42a76b37285276c0156581f85c0be2ee0ed7e7003cb67f5d476a7e3714960836b882bb3ad78097406226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e955010000060000000000000000000000010000000a0000000005e69ec0
    1705: t=1653008729 node_announcement: 010175f8ed9f48db511d43f1da96aa307ccc53a521dcc2f8fb732c6938650e6cc00f672e772134a29e16b723a1e9cf626cf942323bd3fa294befd25430a9395c1fa40007800000080269a26286e959035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d035d2b484f5050494e47464952452d34646636366661000000000000000000000000000000
    1866: t=1653008720 channel_update: 010266385083f1d6e554e1701f2456c24212c5aa937f7b090d32d243678b2fe78f7f52ed3626d6cdd52703bf78013bb9d1a621ae5b7d68fc965d29bb94574e340b5606226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f00006e00000100006286e950010100060000000000000000000000010000000a00000
    ```
    
    It's not clear what we times we were querying, but let's make it
    clearer by putting a full 10 seconds between them, and making our
    queries more exact.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    f29f877 View commit details
    Browse the repository at this point in the history
  4. pytest: fix flakes in test_gossip_query_channel_range

    Nodes now spray their own gossip at the start; this can race with
    our query, and they can give us gossip we didn't ask for.
    
    The fix is to always query an uninvolved node: in this case, we
    only connected l3 and l4 to test zlib, which is gone, so now we can
    remove that and simply always query l4.
    
    ```
    2022-05-23T01:32:53.4695809Z         # reply_channel_range == 264
    2022-05-23T01:32:53.4696416Z >       assert msgs == ['0108'
    2022-05-23T01:32:53.4696835Z                         # blockhash
    2022-05-23T01:32:53.4697328Z                         + genesis_blockhash
    2022-05-23T01:32:53.4697841Z                         # first_blocknum, number_of_blocks, complete
    2022-05-23T01:32:53.4698685Z                         + format(0, '08x') + format(65535, '08x') + '01'
    2022-05-23T01:32:53.4699245Z                         # encoded_short_ids
    2022-05-23T01:32:53.4699859Z                         + format(len(encoded) // 2, '04x')
    2022-05-23T01:32:53.4700255Z                         + encoded]
    2022-05-23T01:32:53.4700799Z E       AssertionError: assert ['0100194b419...700000010000'] == ['010806226e4...700000010000']
    2022-05-23T01:32:53.4710678Z E         At index 0 diff: '0100194b4190af98beedf231901aeda2dcee0e757dd5327219aa007ea6027e769c98665f49e0f99e03b91e80896ed01cd9c4b157c77e908852097d2ba753c6ca01ae41a90a81c9f492c770d9dd35010c702c7c564045cf85a4bf6efaf25cceb9b81b56d1526429d3c60bf1c7ed6a835720730fd7ad1851a937d18aecf8bc3400dd4e0a30d5c4d65999ac39ae6eb3b98d27e3157973598256d2dd4718cd39010a3612580e7f3bba9eda0fb595976f955e345b6affa62fd556f7257a30d54dd87743dc38585e7c31e1ca80b42cf803bd4c1aaa3127351c95de8a4198c46eb0bd7b4ce175a4b51fcaf9676b01806d237758494fcf3d1558279060a0c3b84806a3330c41000006226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f0000700000010000035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d0382ce59ebf18be7d84677c2e35f23294b9992ceca95491fcf8a56c6cb2d9de19903c47b1b2afcfd68c7b86c0976adb9a4f0835bc78242fced745d78433497a867d6021c29746e4136ddff456483df3980c2d0d5f31c93ef5ded564f7294a10d7414aa' != '010806226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f000000000000ffff01001900000068000001000100006900000100000000700000010000'
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    43284ca View commit details
    Browse the repository at this point in the history
  5. pytest: fix flake in test_htlc_too_dusty_outgoing

    We can sneak the HTLC add in before we do the update_fee which fails:
    
    ```
            # the channel should start warning -- too much dust
            inv = l2.rpc.invoice(htlc_val_msat, str(num_dusty_htlcs + 1), str(num_dusty_htlcs + 1))
            with pytest.raises(RpcError, match=r'WIRE_TEMPORARY_CHANNEL_FAILURE'):
    >           l1.rpc.sendpay(route, inv['payment_hash'], payment_secret=inv['payment_secret'])
    E           Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'>
    
    tests/test_pay.py:2654: Failed
    ```
    
    From the logs:
    
    ```
    lightningd-1: 2022-05-20T02:16:40.008Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: NEW:: HTLC LOCAL 14 = SENT_ADD_HTLC/RCVD_ADD_HTLC
    lightningd-1: 2022-05-20T02:16:40.010Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Adding HTLC 14 amount=10000000msat cltv=113 gave CHANNEL_ERR_ADD_OK
    lightningd-1: 2022-05-20T02:16:40.012Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_UPDATE_ADD_HTLC
    lightningd-1: 2022-05-20T02:16:40.015Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: REPLY WIRE_CHANNELD_OFFER_HTLC_REPLY with 0 fds
    lightningd-1: 2022-05-20T02:16:40.026Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_out WIRE_WARNING
    lightningd-1: 2022-05-20T02:16:40.029Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: billboard perm: Too much dust to update fee (Desired feerate update 20000)
    lightningd-1: 2022-05-20T02:16:40.035Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld WARNING: Too much dust to update fee (Desired feerate update 20000)
    lightningd-1: 2022-05-20T02:16:40.039Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: Will try reconnect in 60 seconds
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    521e78b View commit details
    Browse the repository at this point in the history
  6. pyln-testing: try harder if cleaning directory fails.

    This happens under CI, but it's not informative.  Sleep and retry.
    
    Also, "except (OSError, Exception)" does not seem to do what you'd think:
    this clause never gets run.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    f0c0d4b View commit details
    Browse the repository at this point in the history
  7. connectd: fix debug message on bind fail.

    It doesn't get the right errno, and it says "create" not "bind".
    
    ```
    2022-05-20T03:04:46.498Z DEBUG   connectd: Failed to create 2 socket: Success
    2022-05-20T03:04:46.500Z DEBUG   connectd: REPLY WIRE_CONNECTD_INIT_REPLY with 0 fds
    2022-05-20T03:04:46.501Z DEBUG   connectd: connectd_init_done
    2022-05-20T03:04:46.503Z **BROKEN** connectd: Failed to bind socket for 127.0.0.1:37871: Address already in use
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    1ad24ac View commit details
    Browse the repository at this point in the history
  8. pytest: fix port allocation race when nodes restart.

    We use ephemeral_port_reserve to grab ports, but this can fail when we
    restart a node, since the port can be reallocated at that point.
    
    Attempt to overcome this using a global reserved list (is there a
    neater way?).
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    a48e1d7 View commit details
    Browse the repository at this point in the history
  9. pytest: try to fix bitcoind socket timeout in test_channel_lease_unil…

    …at_closes
    
    Split harder!
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    
    ```
     # This can timeout, so do it in four easy stages.
            for i in range(4):
    >           bitcoind.generate_block(4032 // 4)
    
    tests/test_closing.py:971: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    contrib/pyln-testing/pyln/testing/utils.py:475: in generate_block
        return self.rpc.generatetoaddress(numblocks, to_addr)
    contrib/pyln-testing/pyln/testing/utils.py:372: in f
        res = proxy._call(name, *args)
    /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:233: in _call
        response = self._get_response()
    /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/bitcoin/rpc.py:263: in _get_response
        http_response = self.__conn.getresponse()
    /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:1373: in getresponse
        response.begin()
    /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:319: in begin
        version, status, reason = self._read_status()
    /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/http/client.py:280: in _read_status
        line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = <socket.SocketIO object at 0x7f1a4b3702d0>
    b = <memory at 0x7f1a4ac51600>
    
        def readinto(self, b):
            """Read up to len(b) bytes into the writable buffer *b* and return
            the number of bytes read.  If the socket is non-blocking and no bytes
            are available, None is returned.
        
            If *b* is non-empty, a 0 return value indicates that the connection
            was shutdown at the other end.
            """
            self._checkClosed()
            self._checkReadable()
            if self._timeout_occurred:
                raise OSError("cannot read from timed out object")
            while True:
                try:
    >               return self._sock.recv_into(b)
    E               socket.timeout: timed out
    ```
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    b81dc46 View commit details
    Browse the repository at this point in the history
  10. pytest: fix race in test_multichan

    We can fail to use larger channel if it's not ready yet:
    
    ```
    2022-05-23T01:20:05.5325600Z         # Check it used the larger channel!
    2022-05-23T01:20:05.5326376Z >       assert before[chan23a_idx]['to_us_msat'] == after[chan23a_idx]['to_us_msat']
    2022-05-23T01:20:05.5326961Z E       assert 1000000000msat == 900000000msat
    2022-05-23T01:20:05.5327240Z 
    2022-05-23T01:20:05.5327621Z tests/test_connection.py:3896: AssertionError
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    7b2f199 View commit details
    Browse the repository at this point in the history
  11. pytest: fix flake in test_wallet.py::test_hsm_secret_encryption

    Under valgrind, at least, this test fails regularly, and this sleep
    fixes it.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    714607e View commit details
    Browse the repository at this point in the history
  12. pytest: wait for stderr, rather than asserting.

    Two almost identical assertion fails under CI.  Try waiting.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    13bf228 View commit details
    Browse the repository at this point in the history
  13. gossmap: handle case where private channel turns into public.

    Usually we won't see this, since private is deleted.  But we could
    have already read the private channel before that.  Handle it properly.
    
    (Tested by removing the gossip_store deletion code and making sure
    this worked).
    
    We have to fix up the test, which announces a channel twice!
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    74f7847 View commit details
    Browse the repository at this point in the history
  14. gossipd: fix gossmap race.

    When upgrading a channel from private to public, we would delete the
    private channel then add the public one.  However, this is visible in
    the gossmap!  In particular, the node may be removed from the gossmap
    and then re-added, so it may temporarily vanish!
    
    I was seeing an occasional assertion inside node_factory.line_graph:
    
    ```
    @pytest.mark.developer
        def test_reconnect_remote_sends_no_sigs(node_factory):
            """We re-announce, even when remote node doesn't send its announcement_signatures on reconnect.
            """
    >       l1, l2 = node_factory.line_graph(2, wait_for_announce=True, opts={'may_reconnect': True})
    
    tests/test_connection.py:870: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    contrib/pyln-testing/pyln/testing/utils.py:1467: in line_graph
        self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels)
    contrib/pyln-testing/pyln/testing/utils.py:1460: in join_nodes
        wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes']))
    contrib/pyln-testing/pyln/testing/utils.py:88: in wait_for
        while not success():
    contrib/pyln-testing/pyln/testing/utils.py:1460: in <lambda>
        wait_for(lambda: 'alias' in only_one(end.rpc.listnodes(n.info['id'])['nodes']))
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    arr = []
    
        def only_one(arr):
            """Many JSON RPC calls return an array; often we only expect a single entry
            """
    >       assert len(arr) == 1
    E       AssertionError
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    935a9ba View commit details
    Browse the repository at this point in the history
  15. pytest: test to demonstrate BROKEN message on onchaind htlc close.

    We assumed that if the outgoing htlc has failed, we should have always
    failed the incoming one.  This is not true, as this testcase demonstrates:
    
    ```
    lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-onchaind-chan#2: Sending 1 missing htlc messages
    lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: onchain_failed_our_htlc
    lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 failonion = 0x55cb8bc045d8, failmsg = (nil), preimage = (nil)
    lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: HTLC id 0 already complete, but ->in not resolved! failonion = 206816a391d3a7666ddd5213914cbb68f5da1fc4a0937e729de5a1990c94d26312caa5f2778e8da0c6bdefc68dd1a3bc28b5b5650fc0bdb3c2247ecca94ed0bbb224c8448c2c71eb1656a8740cadb301bd1ee1c1e774a8fef817352f502e4217b11e93aa6877b88b37afab0e4d4e49ed0385be9ab9a1ab1ac0e3460e41cfafb30ed896cea96e346041919a6c524ce56c3e5f27c7cd78a36b6df221e90a1c6e048c72b4146a5a51885fb70649037fe7ace77a016ae3ec8aee97960d0e5f0582713f671df79d8dee11b82708b6d882ee5adbb328db1938e824110f57ead1b27410bc6f775c7bb4ae40c1768d77a166c9bfda8f634ba0ac4f8a9fe199894dd3754c5ce41c9694544c805ffc177517661f11221dd8dffac60ce1c8c5bf54cda8e5ea44d8ec6b, failmsg = (null), preimage = (null)
    lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#2: MISSING incoming fail for 0: failing incoming now
    lightningd-2: 2022-05-25T04:38:31.449Z **BROKEN** 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: MISSED incoming fail for 0: failing now
    lightningd-2: 2022-05-25T04:38:31.449Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: HTLC in 0 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    e291a2c View commit details
    Browse the repository at this point in the history
  16. lightningd: don't print nasty message when onchaind fails partially-f…

    …ailed HTLC
    
    1. We set an outgoing htlc's `failonion` when we get a commitment_signed.
    2. We don't transfer it to the corresponding incoming HTLC until we send
       commitment_signed and receive revoke_and_ack (meaning, outgoing htlc is
       completely dead).
    3. If between these steps we go onchain, onchaind (after 3 blocks) tells us
       to fail the HTLC.
    4. hout->failonion is set, but hout->hin has not been failed yet.  We
       do a sanity check and print a nasty message, and fail it with
       WIRE_PERMANENT_CHANNEL_FAILURE instead of relaying the error.
    
    So handle this case explicitly.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    e4dad9c View commit details
    Browse the repository at this point in the history
  17. pytest: fix test_ping_timeout flake.

    We can hang up due to ping while waiting for channel to become active.
    But we don't even need an active channel for this test, so simplify.
    
    ```
    ______________________________ test_ping_timeout _______________________________
    [gw1] linux -- Python 3.7.13 /opt/hostedtoolcache/Python/3.7.13/x64/bin/python3
    
    node_factory = <pyln.testing.utils.NodeFactory object at 0x7f132cca8a90>
    
        @pytest.mark.developer("dev-disconnect required")
        def test_ping_timeout(node_factory):
            # Disconnects after this, but doesn't know it.
            l1_disconnects = ['xWIRE_PING']
        
            l1, l2 = node_factory.line_graph(2, opts=[{'dev-no-reconnect': None,
                                                       'disconnect': l1_disconnects},
    >                                                 {}])
    
    tests/test_connection.py:3826: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    contrib/pyln-testing/pyln/testing/utils.py:1493: in line_graph
        self.join_nodes(nodes, fundchannel, fundamount, wait_for_announce, announce_channels)
    contrib/pyln-testing/pyln/testing/utils.py:1470: in join_nodes
        nodes[i].wait_channel_active(scids[i])
    contrib/pyln-testing/pyln/testing/utils.py:1003: in wait_channel_active
        wait_for(lambda: self.is_channel_active(chanid))
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    success = <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830>
    timeout = 900
    
        def wait_for(success, timeout=TIMEOUT):
            start_time = time.time()
            interval = 0.25
            while not success():
                time_left = start_time + timeout - time.time()
                if time_left <= 0:
    >               raise ValueError("Timeout while waiting for {}", success)
    E               ValueError: ('Timeout while waiting for {}', <function LightningNode.wait_channel_active.<locals>.<lambda> at 0x7f132c3ef830>)
    ```
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    bba2414 View commit details
    Browse the repository at this point in the history
  18. pyln-testing: use files for stdout and stderr, not threads.

    Some flakes are caused by weird races in this code.  Plus, if we
    get things to write straight to files, we might see things in
    there on post-mortem which happen after the python runner exits.
    
    It's a bit less efficient, but much simpler.  Let's see if it helps!
    
    Some tests need a rework now, since we don't get a failure (except
    eventual timeout), but they're simpler.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    7fedfd1 View commit details
    Browse the repository at this point in the history
  19. pyln-testing: restore proper streaming behaviour for lightningd.

    Get it to log direct to stdout, so we see what's happening *as it
    happens* rather than as we read it.  We could restore the thread we
    were using before, but that added more problems than it solved.
    
    This means that we need the hsm password prompts in the log though.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    5b2862e View commit details
    Browse the repository at this point in the history
  20. pyln-testing: increase default daemon.wait() timeout.

    Valgrind in CI is slow:
    
    ```
        def test_bitcoin_failure(node_factory, bitcoind):
    ...
            # Ignore BROKEN log message about blocksonly mode.
            l2 = node_factory.get_node(start=False, expect_fail=True,
                                       allow_broken_log=True)
            l2.daemon.start(wait_for_initialized=False)
            # Will exit with failure code.
    >       assert l2.daemon.wait() == 1
    
    tests/test_misc.py:114: 
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    3b8233e View commit details
    Browse the repository at this point in the history
  21. pytest: fix timeout in test_option_types

    We were waiting for the start to timeout waiting for the "public key"
    message.  Instead, start manually.
    
    Before, this took (with TIMEOUT=30) 97 seconds, after 8 seconds.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    ae518f8 View commit details
    Browse the repository at this point in the history
  22. lightningd: don't run off end of buffer if db_hook returns nonsense.

    It shouldn't return nonsense, but it did, and we segfaulted.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    2819e96 View commit details
    Browse the repository at this point in the history
  23. pytest: allow more time for test_waitblockheight !DEVELOPER.

    It actually timed out with the default 60 seconds, just before it
    saw the block:
    
    ```
    2022-06-07T02:16:05.2557049Z         bitcoind.generate_block(1)
    2022-06-07T02:16:05.2557300Z         sync_blockheight(bitcoind, [node])
    2022-06-07T02:16:05.2557594Z         fut1.result(5)
    2022-06-07T02:16:05.2557912Z         assert not fut2.done()
    2022-06-07T02:16:05.2558121Z     
    2022-06-07T02:16:05.2558370Z         # Trigger two blocks.
    2022-06-07T02:16:05.2558689Z         bitcoind.generate_block(1)
    2022-06-07T02:16:05.2558941Z         sync_blockheight(bitcoind, [node])
    2022-06-07T02:16:05.2559219Z >       fut2.result(5)
    2022-06-07T02:16:05.2559350Z 
    2022-06-07T02:16:05.2559508Z tests/test_misc.py:2138: 
    ...
    2022-06-07T02:16:05.2586947Z         elif "error" in resp:
    2022-06-07T02:16:05.2587398Z >           raise RpcError(method, payload, resp['error'])
    2022-06-07T02:16:05.2588026Z E           pyln.client.lightning.RpcError: RPC call failed: method: waitblockheight, payload: {'blockheight': 103}, error: {'code': 2000, 'message': 'Timed out.'}
    2022-06-07T02:16:05.2588325Z 
    2022-06-07T02:16:05.2588563Z contrib/pyln-client/pyln/client/lightning.py:387: RpcError
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    131f3f8 View commit details
    Browse the repository at this point in the history
  24. lightningd: fix transient leak report when openingd shutting down.

    ```
    >           raiseValueError(str(errors))
    E           ValueError: 
    E           Node errors:
    E           Global errors:
    E            - Node /tmp/ltests-x5sfpiwp/test_openchannel_hook_chaining_1/lightning-2/ has memory leaks: [
    E               {
    E                   "backtrace": [
    E                       "ccan/ccan/tal/tal.c:442 (tal_alloc_)",
    E                       "ccan/ccan/io/io.c:91 (io_new_conn_)",
    E                       "lightningd/subd.c:773 (new_subd)",
    E                       "lightningd/subd.c:827 (new_channel_subd_)",
    E                       "lightningd/opening_control.c:870 (peer_start_openingd)",
    E                       "lightningd/peer_control.c:1307 (peer_active)",
    E                       "lightningd/connect_control.c:457 (connectd_msg)",
    E                       "lightningd/subd.c:556 (sd_msg_read)",
    E                       "lightningd/subd.c:357 (read_fds)",
    E                       "ccan/ccan/io/io.c:59 (next_plan)",
    E                       "ccan/ccan/io/io.c:407 (do_plan)",
    E                       "ccan/ccan/io/io.c:417 (io_ready)",
    E                       "ccan/ccan/io/poll.c:453 (io_loop)",
    E                       "lightningd/io_loop_with_timers.c:22 (io_loop_with_timers)",
    E                       "lightningd/lightningd.c:1181 (main)",
    E                       "../csu/libc-start.c:308 (__libc_start_main)"
    E                   ],
    E                   "label": "ccan/ccan/io/io.c:91:struct io_conn",
    E                   "parents": [
    E                       "lightningd/lightningd.c:107:struct lightningd"
    E                   ],
    E                   "value": "0x2b5a898"
    E               }
    E           ]
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    50f003d View commit details
    Browse the repository at this point in the history
  25. pytest: fix test_channel_lease_post_expiry flake.

    Make sure bitcoind gets tx before mining blocks!
    
    ```
    # l1<->l2 mutual close should work
            chan = l1.get_channel_scid(l2)
            l2.rpc.connect(l1.info['id'], 'localhost', l1.port)
            l1.rpc.close(chan)
            l2.daemon.wait_for_log('State changed from CLOSINGD_SIGEXCHANGE to CLOSINGD_COMPLETE')
            bitcoind.generate_block(2)
            sync_blockheight(bitcoind, [l1, l2])
    >       l1.daemon.wait_for_log('Resolved FUNDING_TRANSACTION/FUNDING_OUTPUT by MUTUAL_CLOSE')
    tests/test_closing.py:851:
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    73858d6 View commit details
    Browse the repository at this point in the history
  26. pytest: don't assume that join_nodes needs only check ends.

    I saw another "only_one()" fail on alias checking: it's not
    entirely clear to me with the more aggressive sending of
    own gossip, that we necessarily process in order, so we might
    not have actually seen all channels just because we saw
    the farthest one.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    077da8d View commit details
    Browse the repository at this point in the history
  27. lightningd: track weird CI crash in test_important_plugin

    Looks like we woke one of the startup io_loops early, and thus
    we thought we'd finished connectd_activate and we hadn't.  This
    caused us to use an uninitialized ld->announceable array, and
    finally caused an assert fail in the main loop.
    
    Make *every* loop assert that it was exited for the correct reason,
    so if it happens again, we can maybe figure out what part of
    the code to look at.
    
    ```
    lightningd: lightningd/lightningd.c:1186: main: Assertion `io_loop_ret == ld' failed.
    lightningd: FATAL SIGNAL 6 (version 4df66fa)
    ...
    ------------------------------- Valgrind errors --------------------------------
    Valgrind error file: valgrind-errors.895509
    ==895509== Conditional jump or move depends on uninitialised value(s)
    ==895509==    at 0x22C58E: to_tal_hdr_or_null (tal.c:184)
    ==895509==    by 0x22D531: tal_bytelen (tal.c:637)
    ==895509==    by 0x1F10B6: towire_gossipd_init (gossipd_wiregen.c:100)
    ==895509==    by 0x13AC6E: gossip_init (gossip_control.c:254)
    ==895509==    by 0x1497EC: main (lightningd.c:1090)
    ==895509== 
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    4c97a07 View commit details
    Browse the repository at this point in the history
  28. lightningd: add more information should subd send wrong message.

    I saw this once, but could not track it down.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    37a1094 View commit details
    Browse the repository at this point in the history
  29. pytest: fix gossipwith flake harder.

    Even though we generally wait until a node has seen the gossip,
    that doesn't mean that connectd has processed it!  This means when
    we connect it may still send us "old" gossip.
    
    So we set the OPT_GOSSIP_QUERIES bit, which means don't send until we
    ask.  But now it sends us WIRE_QUERY_CHANNEL_RANGE, so everyone needs
    to filter that out.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    1a78804 View commit details
    Browse the repository at this point in the history
  30. pytest: fix flake in test_query_short_channel_id

    Again, our new behaviour of sending our own gossip even before
    they ask can confuse our gossip query tests.
    
    Create a new node, attach it, and perform queries on it.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    ba93c9b View commit details
    Browse the repository at this point in the history
  31. pytest: fix flake in test_node_reannounce

    Again, our new behaviour of sending our own gossip even before
    they ask can confuse our gossip query tests.
    
    In this case, simply eliminate duplicates.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    9489a47 View commit details
    Browse the repository at this point in the history
  32. pytest: fix test_multifunding_feerates

    There's a 1 in 256 chance that our signature on the transaction is 70,
    not 71 bytes long.  This changes the freerate.  So fix up the weight in
    this case, to be the expected weight.
    
    ```
    @unittest.skipIf(TEST_NETWORK == 'liquid-regtest', "Fees on elements are different")
    @pytest.mark.developer("uses dev-fail")
    @pytest.mark.openchannel('v1')  # v2 the weight calculation is off by 3
    deftest_multifunding_feerates(node_factory, bitcoind):
    '''
        Test feerate parameters for multifundchannel
        '''
            funding_tx_feerate = '10000perkw'
            commitment_tx_feerate_int = 2000
            commitment_tx_feerate = str(commitment_tx_feerate_int) + 'perkw'
            l1, l2, l3 = node_factory.get_nodes(3, opts={'log-level': 'debug'})
            l1.fundwallet(1 << 26)
    def_connect_str(node):
    return'{}@localhost:{}'.format(node.info['id'], node.port)
            destinations = [{"id": _connect_str(l2), 'amount': 50000}]
            res = l1.rpc.multifundchannel(destinations, feerate=funding_tx_feerate,
                                          commitment_feerate=commitment_tx_feerate)
            entry = bitcoind.rpc.getmempoolentry(res['txid'])
            weight = entry['weight']
            expected_fee = int(funding_tx_feerate[:-5]) * weight // 1000
    >       assert expected_fee == entry['fees']['base'] * 10 ** 8
    E       AssertionError: assert 7000 == (Decimal('0.00007010') * (10 ** 8))
    tests/test_connection.py:1982: AssertionError
    ```
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    rustyrussell committed Jun 26, 2022
    Configuration menu
    Copy the full SHA
    2996c27 View commit details
    Browse the repository at this point in the history