Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot install runhouse on static cluster #1166

Open
ashah03 opened this issue Aug 17, 2024 · 6 comments
Open

Cannot install runhouse on static cluster #1166

ashah03 opened this issue Aug 17, 2024 · 6 comments

Comments

@ashah03
Copy link

ashah03 commented Aug 17, 2024

I have an Ubuntu (24.04 - LTS) machine on my local network that I would like to use as a run.house cluster. Here's a minimal snippet to reproduce:

import runhouse as rh

def get_platform(a=0):
    import platform
    return platform.platform()

cluster = rh.cluster(
    name="rh-cluster",
    host="hostname.local",  # hostname or ip address,
    ssh_creds={"ssh_user": "user", "ssh_private_key": "~/.ssh/id_rsa"},
)

if __name__ == "__main__":
    remote_get_platform = rh.function(get_platform).to(cluster)
    print(remote_get_platform())

However, I get the following error

Logs + traceback
00:38:10.554378 | Cluster rh-cluster is up, but the Runhouse API server may not be up.
INFO | 2024-08-17 00:38:10.554551 | Restarting Runhouse API server on rh-cluster.
INFO | 2024-08-17 00:38:10.555490 | Running command on rh-cluster: runhouse --version
INFO | 2024-08-17 00:38:10.737680 | Running command on rh-cluster: ray --version
INFO | 2024-08-17 00:38:10.929170 | Running command on rh-cluster: python3 -m pip install ray==2.34.0
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.12/README.venv for more information.

...
...

Traceback (most recent call last):
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/urllib3/util/retry.py", line 474, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/urllib3/util/util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/urllib3/connection.py", line 464, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/sentry_sdk/integrations/stdlib.py", line 129, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/http/client.py", line 1428, in getresponse
    response.begin()
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/socket.py", line 720, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/hardware/cluster.py", line 631, in check_and_call
    self.client.check_server()
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/servers/http/http_client.py", line 250, in check_server
    resp = session.get(
           ^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/requests/adapters.py", line 682, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/hardware/cluster.py", line 645, in call_client_method
    return check_and_call()
           ^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/hardware/cluster.py", line 642, in check_and_call
    raise ConnectionError(f"Check server failed: {e}.")
ConnectionError: Check server failed: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/adit/code/cradle/tech_testing/runhouse_test.py", line 17, in <module>
    remote_get_platform = rh.function(get_platform).to(cluster)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/functions/function.py", line 95, in to
    return super().to(
           ^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/module.py", line 485, in to
    env = env.to(system, force_install=force_install)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/envs/env.py", line 238, in to
    system.call(key, "_install_reqs", reqs=new_env.reqs)
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/hardware/cluster.py", line 1064, in call
    return self.call_client_method(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/hardware/cluster.py", line 660, in call_client_method
    self.restart_server()
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/hardware/cluster.py", line 879, in restart_server
    self._sync_runhouse_to_cluster(
  File "/Users/adit/code/cradle/tech_testing/.pixi/envs/default/lib/python3.12/site-packages/runhouse/resources/hardware/cluster.py", line 492, in _sync_runhouse_to_cluster
    raise ValueError(
ValueError: Error installing runhouse on cluster <rh-cluster> node <cradletr1.local>
Sentry is attempting to send 2 pending events

My understanding is it trying to use the base python, but Ubuntu is not letting it do a pip install in the base python environment. How can I tell it to use a virtual environment? Don't see any docs for this.

@ashah03
Copy link
Author

ashah03 commented Aug 17, 2024

I have a temporary solution which is manually creating the env and starting the runhouse server on the machine - is that the way to go?

Also: is it possible to have multiple ssh creds, one for each ip address in the cluster if I pass in a list?

@jlewitt1
Copy link
Collaborator

Hey Adit thanks for reaching out! You can create an env and send it to the cluster independently, or create it as part of sending the function to the cluster in the .to() call. By sending the env to the cluster Runhouse will construct and cache the env (along with its specified packages, env vars, or secrets). We also support conda envs.

We have some more general info on Envs here.

For the example provided you could do something like:

cluster_env = rh.env(reqs=[<packages to install>], name="my_env")
remote_get_platform = rh.function(get_platform).to(cluster, env=cluster_env)

Let us know if this helps.

As for SSH creds, we don’t currently support multiple SSH creds for the cluster factory. Just confirming for your use case each cluster IP would have its own set of unique creds?

@ashah03
Copy link
Author

ashah03 commented Aug 18, 2024

Hi @jlewitt1, thanks for the response!

Regarding the env, I'm referring not to the runhouse concept of an env, but rather the python env in which the runhouse package is originally installed. When I tried to initialize a local ubuntu machine as a cluster, it tried to use the base pip (not in any virtual environment). Check out PEP 668, new versions of Ubuntu don't allow package installs on the base pip which is why this error is happening (see my traceback as well as https://www.omgubuntu.co.uk/2023/04/pip-install-error-externally-managed-environment-fix)

Yes, the use case is mainly that different clusters might have different usernames. The credentials is an easy fix since I can just use an SSH key. Would it be difficult to support a list of credentials that is the same length as the list of IPs, rather than just one?

Regarding the Env or CondaEnv, I ran into an install issue due to what seems like a bug in the code, see #1167.

Appreciate your help!

@jlewitt1
Copy link
Collaborator

Gotcha ok thx for clarifying - we've actually talked about adding virtualenv or pyenv as additional Runhouse supported env types, but until then manually creating the env as you did sounds like the best way around that.

For supporting a list of creds, that's probably something we could support, do you currently have multiple VMs that you are trying to string together? curious to hear more about the use case

Also we're pushing a fix to the condaEnv issue shortly, thx for flagging!

@ashah03
Copy link
Author

ashah03 commented Aug 20, 2024

Appreciate it!

The use-case for the multiple credentials is we want to make an ad-hoc cluster out of on-prem machines, and possibly scale up to cloud VMs as well, and these may have different creds.

@jlewitt1
Copy link
Collaborator

Makes sense, that's also something we've talked about - assuming all the clusters are in the same VPC we could probably support that, otherwise would require a bit of hoop jumping to allow each of the worker nodes to communicate with the head node. Happy to keep you posted on the timing of that feature!

Also the conda issue is now fixed, you can install from main if that's a blocker for now until the next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants