Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard] OSError: [Errno 99] error while attempting to bind on address ('::1', 8265, 0, 0): cannot assign requested address #7084

Closed
thavlik opened this issue Feb 7, 2020 · 19 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't

Comments

@thavlik
Copy link

thavlik commented Feb 7, 2020

What is the problem?

I am building a Docker image with my branch and am unable to start the dashboard. Node 13.x is installed. The issue appears to be a port conflict. Perhaps there is something already listening on 8265?

$ docker logs -f rl-actor
[ray] Forcing OMP_NUM_THREADS=1 to avoid performance degradation with many workers (issue #6998). You can override this by explicitly setting OMP_NUM_THREADS.
/opt/conda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2020-02-07 15:50:54,893	WARNING services.py:592 -- setpgrp failed, processes may not be cleaned up properly: [Errno 1] Operation not permitted.
2020-02-07 15:50:54,894	INFO resource_spec.py:212 -- Starting Ray with 35.25 GiB memory available for workers and up to 17.64 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-02-07 15:50:55,481	INFO services.py:1093 -- View the Ray dashboard at localhost:8265
2020-02-07 15:50:58,493	WARNING worker.py:1071 -- The dashboard on node c9ba97c06401 failed with the following error:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/asyncio/base_events.py", line 1045, in create_server
    sock.bind(sa)
OSError: [Errno 99] Cannot assign requested address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ray/python/ray/dashboard/dashboard.py", line 760, in <module>
    dashboard.run()
  File "/ray/python/ray/dashboard/dashboard.py", line 335, in run
    aiohttp.web.run_app(self.app, host=self.host, port=self.port)
  File "/opt/conda/lib/python3.6/site-packages/aiohttp/web.py", line 433, in run_app
    reuse_port=reuse_port))
  File "/opt/conda/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.6/site-packages/aiohttp/web.py", line 359, in _run_app
    await site.start()
  File "/opt/conda/lib/python3.6/site-packages/aiohttp/web_runner.py", line 104, in start
    reuse_port=self._reuse_port)
  File "/opt/conda/lib/python3.6/asyncio/base_events.py", line 1049, in create_server
    % (sa, err.strerror.lower()))
OSError: [Errno 99] error while attempting to bind on address ('::1', 8265, 0, 0): cannot assign requested address

Reproduction (REQUIRED)

Here is the Dockerfile I'm using, which is based off base-deps:

FROM tensorflow/tensorflow:nightly-gpu-py3
# install ray dependencies
RUN apt-get update \
    && apt-get install -y \
        curl \
        tmux \
        screen \
        rsync \
        apt-transport-https \
        zlib1g-dev \
        libgl1-mesa-dev \
        git \
        wget \
        cmake \
        build-essential \
        curl \
        unzip \
    && apt-get clean \
    && echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh \
    && wget \
        --quiet 'https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-x86_64.sh' \
        -O /tmp/anaconda.sh \
    && /bin/bash /tmp/anaconda.sh -b -p /opt/conda \
    && rm /tmp/anaconda.sh \
    && /opt/conda/bin/conda install -y \
        libgcc \
    && /opt/conda/bin/conda clean -y --all \
    && /opt/conda/bin/pip install \
        flatbuffers \
        cython==0.29.0 \
        numpy==1.15.4
ENV PATH "/opt/conda/bin:$PATH"
RUN conda remove -y --force wrapt
RUN pip install -U pip
# To avoid the following error on Jenkins:
# AttributeError: 'numpy.ufunc' object has no attribute '__module__'
RUN /opt/conda/bin/pip uninstall -y dask
ENV PATH "/opt/conda/bin:$PATH"
# For Click
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
RUN pip install gym[atari]==0.10.11 opencv-python-headless lz4 pytest-timeout smart_open torch torchvision
RUN pip install --upgrade bayesian-optimization
RUN pip install --upgrade hyperopt==0.1.2
RUN pip install ConfigSpace==0.4.10
RUN pip install --upgrade sigopt nevergrad scikit-optimize hpbandster lightgbm xgboost tensorboardX
RUN pip install -U mlflow
RUN pip install -U pytest-remotedata>=0.3.1

# install custom ray branch
RUN git clone --single-branch --branch warmstart2 https://github.com/thavlik/ray.git
RUN ray/ci/travis/install-bazel.sh
WORKDIR /ray/python
RUN pip install -U -e . --verbose
RUN python ray/setup-dev.py --yes

# install node and build dashboard
RUN curl -sL https://deb.nodesource.com/setup_13.x | bash -
RUN apt-get install -y nodejs
RUN cd ray/dashboard/client && npm ci && npm run build

# install dependencies for my python project
RUN pip install tqdm==4.41.1 \
    tensorflow-gpu==2.1.0 \
    tensorboard==2.1.0 \
    Keras==2.3.1 \
    absl-py==0.9.0 \
    boto3==1.11.1 \
    psutil==5.6.7 \
    gym==0.15.4 \
    GPUtil==1.4.0 \
    opencv-python==4.1.2.30 \
    lz4==3.0.2 \
    setproctitle==1.1.10 \
    tensorboardX==2.0

Running any tune experiment produces the warning.

@thavlik thavlik added the bug Something that is supposed to be working; but isn't label Feb 7, 2020
@rkooo567
Copy link
Contributor

rkooo567 commented Feb 8, 2020

I highly doubt that it is a port problem because dashboard will increase a port number if it is not available before it runs a dashboard process. (For example, if the port is already used, it increases a number to be 8266). There could be many factors that can cause OSError: [Errno 99] Cannot assign requested address, but I assume it is related to how Docker. @wuisawesome Any thought?

@wuisawesome
Copy link
Contributor

wuisawesome commented Feb 9, 2020

Does adding --webui-host 0.0.0.0 to ray start work to mitigate this?

@thavlik
Copy link
Author

thavlik commented Feb 10, 2020

Does adding --webui-host 0.0.0.0 to ray start work to mitigate this?

I am not using ray start - this is with the tune.run API.

@semin-park
Copy link

I had the exact same error. Solved it by adding

ray.init(webui_host='127.0.0.1') at the beginning of the python file.

It seems like hostname '::1' or 'localhost' are sometimes not recognized.

@thavlik
Copy link
Author

thavlik commented Feb 11, 2020

I had the exact same error. Solved it by adding

ray.init(webui_host='127.0.0.1') at the beginning of the python file.

It seems like hostname '::1' or 'localhost' are sometimes not recognized.

This fixes the issue for me as well. Thank you.

@thavlik thavlik closed this as completed Feb 11, 2020
@wuisawesome
Copy link
Contributor

For reference, this issue appears to be described in more detail here: aio-libs/aiohttp#4554

@mjlbach
Copy link

mjlbach commented Feb 12, 2020

Should this be the default?

@DrJimFan
Copy link

I had the same issue. I'm using the ray command line API, and adding --webui-host 0.0.0.0 works for me!

@wuisawesome
Copy link
Contributor

Could the next person who runs into this issue please post the output of cat /etc/hosts | grep localhost and share your OS as well? It would be very useful for understanding how widespread link local ipv6 addresses are for Ray users.

@sumanthratna
Copy link
Member

I'm in a Docker container.

  • host OS: macOS Big Sur Beta (20A5343i)
  • container OS: debian buster (3.8-slim-buster)
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback

@JarnoRFB
Copy link
Contributor

I get the same error in an ubuntu 18.04 container running in jupyterhub on kubernetes. The output of cat /etc/hosts | grep localhost is also

127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback

The error went away for when setting ray.init(dashboard_host="127.0.0.1"). I believe the argument name was changed since @semin-park's answer.

@sukkiCat
Copy link

sukkiCat commented Sep 7, 2020

I had the exact same error. Solved it by adding

ray.init(webui_host='127.0.0.1') at the beginning of the python file.

It seems like hostname '::1' or 'localhost' are sometimes not recognized.

I tried to follow your steps but got the following errors

TypeError: init() got an unexpected keyword argument 'webui_host'

@wuisawesome
Copy link
Contributor

The argument name was changed to ray.init(dashboard_host="127.0.0.1")

@sukkiCat
Copy link

sukkiCat commented Sep 8, 2020

The argument name was changed to ray.init(dashboard_host="127.0.0.1")

it works. thanks.

@Capitolhill
Copy link

The argument name was changed to ray.init(dashboard_host="127.0.0.1")

Has the argument name been changed again? I get the following error.
TypeError: init() got an unexpected keyword argument 'dashboard_host'

@wuisawesome
Copy link
Contributor

@Capitolhill I don't think it changed, but can you file a new issue if this is still a problem?

@lious68
Copy link

lious68 commented Apr 24, 2021

ray.init(webui_host='127.0.0.1')

where to add?
or
which file?

@lious68
Copy link

lious68 commented Apr 24, 2021

I had the exact same error. Solved it by adding

ray.init(webui_host='127.0.0.1') at the beginning of the python file.

It seems like hostname '::1' or 'localhost' are sometimes not recognized.

which python file?

@mwtian
Copy link
Member

mwtian commented Nov 12, 2021

@lious68 are you running a Ray script or Ray Tune etc? Anyway this issue should have been fixed in the more recent Ray versions.

scv119 pushed a commit that referenced this issue Nov 13, 2021
Some Ray client users are likely seeing an issue similar to #7084. Inside a container, connecting to localhost: fails but connecting to 127.0.0.1: succeeds. Changing Ray client to use 127.0.0.1 for localhost connection / serving should fix the issue.
josura added a commit to codethazine/makemelaugh that referenced this issue Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't
Projects
None yet
Development

No branches or pull requests