Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macos tests failed due to timeout #6854

Closed
drew2a opened this issue Apr 11, 2022 · 7 comments · Fixed by #6855 or #7704
Closed

Macos tests failed due to timeout #6854

drew2a opened this issue Apr 11, 2022 · 7 comments · Fixed by #6855 or #7704

Comments

@drew2a
Copy link
Contributor

drew2a commented Apr 11, 2022

The most common case is the following (but sometimes channels tests also failed due to timeout):

FAILED src/tribler/core/components/tunnel/tests/test_full_session/test_tunnel_community.py::test_hidden_services

https://github.com/Tribler/tribler/runs/5969308803?check_suite_focus=true#step:7:16

src/tribler/core/components/tunnel/tests/test_full_session/test_tunnel_community.py . [ 50%]
F                                                                        [100%]

=================================== FAILURES ===================================
_____________________________ test_hidden_services _____________________________

kwargs = {'hidden_seeder_comm': <tribler.core.components.tunnel.community.tunnel_community.TriblerTunnelCommunity object at 0x1...object at 0x11213bfa0>, 'video_tdef': <tribler.core.components.libtorrent.torrentdef.TorrentDef object at 0x112124d30>}
coro = <coroutine object test_hidden_services at 0x112[18](https://github.com/Tribler/tribler/runs/5969308803?check_suite_focus=true#step:7:18)70c0>
task = <Task pending name='Task-97' coro=<test_hidden_services() running at /Users/runner/work/tribler/tribler/src/tribler/co...l_session/test_tunnel_community.py:294> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x112eddd30>()]>>

    @functools.wraps(func)
    def inner(**kwargs):
        coro = func(**kwargs)
        if coro is not None:
            task = asyncio.ensure_future(coro, loop=_loop)
            try:
>               _loop.run_until_complete(task)

../../../hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/pytest_asyncio/plugin.py:[19](https://github.com/Tribler/tribler/runs/5969308803?check_suite_focus=true#step:7:19)5: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../hostedtoolcache/Python/3.8.12/x64/lib/python3.8/asyncio/base_events.py:603: in run_until_complete
    self.run_forever()
../../../hostedtoolcache/Python/3.8.12/x64/lib/python3.8/asyncio/base_events.py:570: in run_forever
    self._run_once()
../../../hostedtoolcache/Python/3.8.12/x64/lib/python3.8/asyncio/base_events.py:1823: in _run_once
    event_list = self._selector.select(timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <selectors.KqueueSelector object at 0x11[21](https://github.com/Tribler/tribler/runs/5969308803?check_suite_focus=true#step:7:21)8[32](https://github.com/Tribler/tribler/runs/5969308803?check_suite_focus=true#step:7:32)e0>
timeout = 0.441998801000004

    def select(self, timeout=None):
        timeout = None if timeout is None else max(timeout, 0)
        max_ev = len(self._fd_to_key)
        ready = []
        try:
>           kev_list = self._selector.control(None, max_ev, timeout)
E           Failed: Timeout >[40](https://github.com/Tribler/tribler/runs/5969308803?check_suite_focus=true#step:7:40).0s

../../../hostedtoolcache/Python/3.8.12/x64/lib/python3.8/selectors.py:[55](https://github.com/Tribler/tribler/runs/5969308803?check_suite_focus=true#step:7:55)8: Failed
@drew2a drew2a added this to the 7.13.0 April milestone Apr 11, 2022
@drew2a drew2a self-assigned this Apr 11, 2022
@drew2a drew2a mentioned this issue Apr 11, 2022
@qstokkink
Copy link
Contributor

Not sure if this is the cause of the timeout or another unrelated issue, but this is a super weird part of the linked log:

Traceback (most recent call last):
  File "/Users/runner/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/ipv8/test/mocking/endpoint.py", line 16, in crash_event_loop
    raise forwarded_exception
  File "/Users/runner/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/ipv8/messaging/anonymization/community.py", line 909, in exit_data
    self.exit_sockets[circuit_id].sendto(data, destination)
  File "/Users/runner/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/ipv8/test/mocking/exit_socket.py", line 23, in sendto
    self.endpoint.send(destination, data)
  File "/Users/runner/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/ipv8/test/mocking/endpoint.py", line 58, in send
    raise e
AssertionError: Attempted to send data to unregistered address UDPv4Address(ip='127.0.0.1', port=52921)

The exit node is being instructed to exit data to its localhost address using the public/non-anonymous libtorrent port of the downloader?! The public port of libtorrent should never ever be forwarded through the tunnels, this error is very warranted. Perhaps the setting that causes libtorrent to only advertise its outgoing port (i.e., the outgoing port of the exit node, not the downloaded) was removed or is no longer functional? This should be investigated.

@drew2a
Copy link
Contributor Author

drew2a commented Apr 11, 2022

@qstokkink

This setting comes from:

def start_anon_download(tunnel_community: TriblerTunnelCommunity,
seeder_port,
tdef: TorrentDef,
hops=1):
"""
Start an anonymous download in the main Tribler session.
"""
download_manager = tunnel_community.dlmgr
dscfg = DownloadConfig()
dscfg.set_dest_dir(download_manager.state_dir)
dscfg.set_hops(hops)
download = download_manager.start_download(tdef=tdef, config=dscfg)
tunnel_community.bittorrent_peers[download] = [("127.0.0.1", seeder_port)]
return download

And the most probable cause to the weird log messages is

6678acb#diff-b655336187e9d2c4b142d38ec44dadc44c8b92c91baab4d8f2b592bd1498183aR153

@qstokkink
Copy link
Contributor

If it doesn't exist in the production code I guess it's fine. 🤷 Still strange, but seems to be unrelated to this issue.

@drew2a
Copy link
Contributor Author

drew2a commented Apr 11, 2022

@qstokkink
Regarding the tests, I don't really understand the design of tunnel community tests and can not verify their correctness.

Regarding the production code (agree it should not be there) but now we cannot be sure that hidden seeding is even working because of #6841

@qstokkink
Copy link
Contributor

@drew2a [Disclaimer: mostly trivia] This documentation is still reasonably up to date: https://github.com/Tribler/tribler/wiki/Hidden-Services-Specifications-for-anonymous-seeding If you look at the lowest image on the page (in this section - click me), the Tribler test is forcing an external data flow from the introduction point to 127.0.0.1 with the seeder's port (a port which is not even used in this data flow). The AssertionError is happening because the test notices that there is data being sent in the "DHT/PEX" layer (colored green) that is not being sent to another peer in the mocked network (("127.0.0.1", seeder_port) does not exist there).

@qstokkink
Copy link
Contributor

Observed again in https://github.com/Tribler/tribler/actions/runs/6932482788/job/18856533640?pr=7703

Raw logs:
36.txt

@egbertbouman
Copy link
Member

egbertbouman commented Nov 20, 2023

Helpful log for the hidden services failure. A couple of things that I'm seeing:

  • BalanceRequests time out, since the is no BandwidthAccountingCommunity. This takes 10s every attempt. When we're in luck, we'll get a random slot. Maybe we're just less lucky when on MacOS.
  • The download also tries to download from the seeder directly, since it's peer is added to the swarm. The mock exit socket is blocking this. So, while it might look confusing, it not really a bug. It would be better to not actually add the peer, though.
  • It's joining a hidden swarm twice, leaving an old seeder key. This could randomly add 30s to the test.

I'll have a go at fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants