Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host name resolution error on MacOS #6147

Closed
elliotfontaine opened this issue Jun 17, 2024 · 3 comments · Fixed by #5411
Closed

Host name resolution error on MacOS #6147

elliotfontaine opened this issue Jun 17, 2024 · 3 comments · Fixed by #5411
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@elliotfontaine
Copy link

Versions

  • Cylc: 8.2.4
  • System: MacOS Monterey (12.7.4)

Description

On MacOS, local workflow schedulers sometimes stop responding, because of an host resolution error. It usually resolves itself after some time, and the workflow run starts responding again.

cylc stop bioreactor-workflow/run20

Traceback (most recent call last):
  File "/Users/elliotfontaine/mambaforge/envs/cylc/bin/cylc", line 10, in <module>
    sys.exit(main())
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/scripts/cylc.py", line 660, in main
    execute_cmd(command, *cmd_args)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/scripts/cylc.py", line 286, in execute_cmd
    entry_point.resolve()(*args)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/terminal.py", line 232, in wrapper
    wrapped_function(*wrapped_args, **wrapped_kwargs)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/scripts/stop.py", line 275, in main
    rets = call_multi(
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/network/multi.py", line 29, in call_multi
    return asyncio.run(call_multi_async(*args, **kwargs))
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/network/multi.py", line 88, in call_multi_async
    async for (workflow_id, *args), result in unordered_map(
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/async_util.py", line 460, in unordered_map
    yield task._args, task.result()
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/scripts/stop.py", line 223, in run
    pclient = get_client(workflow_id, timeout=options.comms_timeout)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/network/client_factory.py", line 63, in get_client
    return get_runtime_client(get_comms_method(), workflow, timeout=timeout)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/network/client_factory.py", line 57, in get_runtime_client
    return WorkflowRuntimeClient(workflow, timeout=timeout)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/network/client.py", line 241, in __init__
    WorkflowRuntimeClientBase.__init__(self, workflow, host, port, timeout)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/network/client.py", line 71, in __init__
    host, port, _ = get_location(workflow)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/network/__init__.py", line 84, in get_location
    host = get_fqdn_by_host(host)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/hostuserutil.py", line 265, in get_fqdn_by_host
    return HostUtil.get_inst().get_fqdn_by_host(target)
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/hostuserutil.py", line 169, in get_fqdn_by_host
    if not self.is_remote_host(target):
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/hostuserutil.py", line 210, in is_remote_host
    host_info != self._get_host_info())
  File "/Users/elliotfontaine/mambaforge/envs/cylc/lib/python3.9/site-packages/cylc/flow/hostuserutil.py", line 135, in _get_host_info
    self._host_exs[target] = socket.gethostbyname_ex(target)
socket.gaierror: [Errno 8] nodename nor servname provided, or not known: '1.0.0.127.in-addr.arpa'

cylc tui bioreactor-workflow/run20

[Errno 8] nodename nor servname provided, or not known: '1.0.0.127.in-addr.arpa'

Reproducible Example

I didn't find a way to accurately replicate the issue, from my pov it seems to happen at random.

@elliotfontaine elliotfontaine added the bug Something is wrong :( label Jun 17, 2024
@oliver-sanders
Copy link
Member

This issue is caused by a long standing Python bug on Mac OS: python/cpython#49254

We've hacked around it, but our hack isn't holding up against your setup, sorry.

@oliver-sanders
Copy link
Member

oliver-sanders commented Jun 17, 2024

This (aborted) pull request would extend the hack to cover the address you have reported above: #5411

I aborted this pull request as I was unable to reproduce the setup / circumstances under which that address is returned.

If you have the ability to modify your installation, you can try out this patch (single file, couple of lines).

@oliver-sanders oliver-sanders linked a pull request Jun 17, 2024 that will close this issue
8 tasks
@oliver-sanders oliver-sanders self-assigned this Jul 15, 2024
@oliver-sanders oliver-sanders added this to the 8.3.1 milestone Jul 15, 2024
@oliver-sanders
Copy link
Member

Closed by #5411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants