-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HTTPS calls to clusters #114
Conversation
7c1f6c9
to
c9f6780
Compare
324cb6d
to
54bdea7
Compare
d498520
to
99d7ef3
Compare
…based on their RH token
…ster, for function calls validate read or write
99d7ef3
to
145de09
Compare
docs/auth_and_data_collection.rst
Outdated
=========================== | ||
By default, Runhouse collects metadata from provisioned clusters and data relating to performance and error monitoring. | ||
This data will only be used by Runhouse to improve the product. | ||
No Personal Identifiable Information (PII) is collected and we will not sell or buy data about you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this line, it's more complex than it seems.
docs/auth_and_data_collection.rst
Outdated
Runhouse provides a couple of options to manage the connection to the Runhouse API server running on a cluster. | ||
|
||
|
||
API Server Connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me everything that follows this should be in the Cluster docs, and everything before we already cover in the data collection doc. What do you think of removing this file and just putting the info below into the Cluster docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, you added the auth into here. I don't think they're really the same, so I would revert it (but still remove the line above), and just add the new auth info into the Cluster docs.
class Cluster(Resource): | ||
RESOURCE_TYPE = "cluster" | ||
REQUEST_TIMEOUT = 5 # seconds | ||
DEFAULT_HOST = "127.0.0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that if the connection type is tls or none (i.e. the calls are over HTTP alone and not a tunnel, the server will need to start with a host of 0.0.0.0. So maybe we should just make that the default, and only use localhost when the user starts with connection of ssh.
class Cluster(Resource): | ||
RESOURCE_TYPE = "cluster" | ||
REQUEST_TIMEOUT = 5 # seconds | ||
DEFAULT_HOST = "127.0.0.1" | ||
DEFAULT_HTTP_PORT = 50052 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take the opportunity to change this now?
runhouse/servers/http/http_server.py
Outdated
@wraps(func) | ||
async def wrapper(*args, **kwargs): | ||
request: Request = kwargs.get("request") | ||
is_https: bool = request.url.scheme == "https" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this mean the uesr can bypass auth if they just send a regular http request?
runhouse/servers/http/http_server.py
Outdated
DEN_AUTH = False | ||
memory_exporter = None | ||
|
||
# NOTE: This is a temp in-mem cache, we will move this out into the object store for future permissions support | ||
AUTH_CACHE = {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't love using the class methods this way... but I don't know why. I think globals may be a bit safer.
runhouse/servers/http/http_server.py
Outdated
@classmethod | ||
def get(cls, key) -> dict: | ||
"""Get resources associated with a particular user""" | ||
return cls.AUTH_CACHE.get(key, {}) | ||
|
||
@classmethod | ||
def put(cls, key, value): | ||
"""Update server cache with a user's resources and access type""" | ||
cls.AUTH_CACHE[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing for these to collide with getting and putting resources
runhouse/servers/http/http_server.py
Outdated
@staticmethod | ||
@app.get("/cert") | ||
@validate_user | ||
def get_cert(request: Request, message: Message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... is it a problem to be sending the cert over HTTP? Also, is it a problem to be sending the user's token over HTTP before they have the cert?
docs/auth_and_data_collection.rst
Outdated
|
||
Removing Collected Data | ||
------------------------------------ | ||
If you would like us to remove your collected data, please contact us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we provide a link for contact (e.g. an email address or a contact form)?
docs/auth_and_data_collection.rst
Outdated
The below options can be specified with the ``server_connection_type`` parameter | ||
when :ref:`initializing a cluster <Cluster Factory Method>`: | ||
|
||
- ``ssh``: Connects to the cluster via port forwarding. The API server will be started with HTTP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to say "on port 80", given our conversation earlier today?
docs/auth_and_data_collection.rst
Outdated
|
||
- ``ssh``: Connects to the cluster via port forwarding. The API server will be started with HTTP. | ||
- ``tls``: Connects to the cluster via port forwarding and enforces verification via TLS certificates. The API server | ||
will be started with HTTPS. Only users with a valid cert will be able to make requests to the API server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... on port 443"
?
docs/auth_and_data_collection.rst
Outdated
- ``tls``: Connects to the cluster via port forwarding and enforces verification via TLS certificates. The API server | ||
will be started with HTTPS. Only users with a valid cert will be able to make requests to the API server. | ||
- ``none``: Does not use any port forwarding or enforce any authentication. The API server will be started | ||
with HTTP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
docs/auth_and_data_collection.rst
Outdated
to enable token authentication. Runhouse will handle adding the token to each subsequent request as an auth header with | ||
format: :code:`{"Authorization": "Bearer <token>"}` | ||
|
||
Enabling TLS and Den Auth for the API server makes it incredibly fast and easy to stand up a microservice with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly mention "Runhouse Den Auth" for folks who are dropping in just for this section / Google searchability?
docs/index.rst
Outdated
@@ -82,7 +82,7 @@ Table of Contents | |||
|
|||
debugging_logging | |||
troubleshooting | |||
data_collection | |||
auth_and_data_collection | |||
Source Code <https://github.com/run-house/runhouse> | |||
REST API Guide <https://api.run.house/docs> | |||
Dashboard <https://www.run.house/dashboard> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runhouse Den Dashboard?
restart_ray, | ||
screen, | ||
create_logfile=True, | ||
host=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a "port" parameter here and elsewhere in this file, or would this be handled outside of this PR?
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class ServerConnectionType(Enum): | ||
"""Enum to manage the type of connection Runhouse will make with the Runhouse API server started on the cluster. | ||
``ssh``: Use port forwarding to connect to the server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The API server will be started with HTTP."
?
@@ -24,6 +28,15 @@ def cluster( | |||
host (str or List[str], optional): Hostname, IP address, or list of IP addresses for the BYO cluster. | |||
ssh_creds (dict, optional): Dictionary mapping SSH credentials. | |||
Example: ``ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'}`` | |||
server_port (bool, optional): Port to use for the server. (Default: ``50052``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging both lines for a potential change given our conversation earlier today.
…ttp server, using nginx as a reverse proxy
…art command needed for non ssh server connections
cd875d6
to
d9bf096
Compare
94a0578
to
0255048
Compare
# Conflicts: # runhouse/resources/hardware/cluster.py # runhouse/servers/http/http_client.py # runhouse/servers/http/http_utils.py
68bc94d
to
8fbe7c5
Compare
8fbe7c5
to
d41b822
Compare
…put and waiting for successful Uvicorn start line. Much better visibility when server fails to start properly! - Print server logfile output to terminal on `runhouse restart`, even when screen is enabled. - Fix https function and cluster fixtures - Change check to not require auth or Ray status, just send a ping - Fix async bug with HTTP server
2a78242
to
fc2cd96
Compare
fc2cd96
to
7bf8f60
Compare
docs/api/python/cluster.rst
Outdated
- ``tls``: Connects to the cluster via HTTPS (by default on port :code:`443`) and enforces verification via TLS | ||
certificates. Only users with a valid cert will be able to make requests to the API server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is out of date. Should just say "Connects to the cluster via HTTPS (by default on port :code:443
) using either a provided certificate, or creating a new self-signed certificate just for this cluster. You must open the needed ports in the firewall, such as via the open_ports
argument in the OnDemandCluster
, or manually in the compute itself or cloud console.
docs/api/python/cluster.rst
Outdated
- ``none``: Does not use any port forwarding or enforce any authentication. Connects to the cluster with HTTP by | ||
default on port :code:`80`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add: "This is useful when connecting to a cluster within a VPC, or creating a tunnel manually on the side with custom settings."
docs/api/python/cluster.rst
Outdated
- ``paramiko``: Uses `Paramiko <https://www.paramiko.org/>`_ to create an SSH tunnel to the cluster. This | ||
is relevant if you are using a cluster which require existing credentials (e.g. a password). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should just say "using a cluster which requires a password to authenticate."
docs/api/python/cluster.rst
Outdated
The ``tls`` connection type is the most secure and is recommended for production use if you are not running inside | ||
of a VPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ``tls`` connection type is the most secure and is recommended for production use if you are not running inside | |
of a VPC. | |
The ``tls`` connection type is the most secure and is recommended for production use if you are not running inside | |
of a VPC. However, be mindful that you must secure the cluster with authentication (see below) if you open it to the public internet. |
docs/api/python/cluster.rst
Outdated
Runhouse allows you to authenticate users via their Runhouse token (generated when | ||
:ref:`logging in <Login/Logout>`) and saved to local Runhouse configs in path: :code:`~/.rh/config.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runhouse allows you to authenticate users via their Runhouse token (generated when | |
:ref:`logging in <Login/Logout>`) and saved to local Runhouse configs in path: :code:`~/.rh/config.yaml`. | |
If desired, Runhouse provides out-of-the-box authentication via users' Runhouse token (generated when | |
:ref:`logging in <Login/Logout>` and set locally at: :code:`~/.rh/config.yaml`). This is crucial if the cluster has ports open to the public internet, as would usually be the case when using the ``tls`` connection type. You may also set up your own authentication manually inside of your own code, but you should likely still enable Runhouse authentication to ensure that even your non-user-facing endpoints into the server are secured. |
runhouse/servers/http/http_server.py
Outdated
detail=f"Failed to validate Runhouse user: {load_resp_content(resp)}", | ||
) | ||
|
||
if use_den_auth or func_call: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be if func_call
, if use_den_auth is false we return above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I don't see how we're validating auth into the cluster for the non-func_call case
runhouse/servers/http/http_server.py
Outdated
def _load_current_cluster(kwargs) -> Union[str, None]: | ||
current_cluster = _get_cluster_from(_current_cluster("config")) | ||
if current_cluster: | ||
return current_cluster.rns_address | ||
|
||
# If no cluster config saved yet on the cluster try getting the cluster uri from the message object | ||
# included in the request | ||
message: Message = kwargs.get("message") | ||
resource = json.loads(message.data) | ||
return resource.get("name") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this anymore?
@@ -225,7 +372,8 @@ def lookup_env_for_name(name, check_rns=False): | |||
|
|||
@staticmethod | |||
@app.post("/resource") | |||
def put_resource(message: Message): | |||
@validate_user | |||
def put_resource(request: Request, message: Message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding Request objects here?
tests/conftest.py
Outdated
.save() | ||
) | ||
|
||
c.restart_server() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we restart the server here?
|
||
configs.set("token", "abcd123") | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to try calling a bunch of other cluster methods with a bad token too, not just a function call
dc05a72
to
cdce735
Compare
cdce735
to
a586afd
Compare
…d new pytest fixture for testing sharing resources
ad5842c
to
261e1fb
Compare
…reate & share resources. update auth logic in object store
261e1fb
to
ce76d5e
Compare
# Conflicts: # runhouse/resources/hardware/cluster.py
Adds option for starting up the Runhouse API server on the cluster with HTTPS. This makes it incredibly fast and easy to stand up a microservice with standard bearer token authentication (using a Runhouse token), allowing users to share Runhouse resources with collaborators, teams, customers, etc.
Highlights:
New server connection types:
ssh
: Connects to the cluster via an SSH tunnel.tls
: Connects to the cluster via HTTPS (port 443) and enforces verification via TLS certificates. Only users with a valid cert will be able to make requests to the API server.none
: Does not use any port forwarding or enforce any authentication. Connects to the cluster via HTTP (port 80).aws_ssm
: Uses the AWS Systems Manager tocreate an SSH tunnel to the cluster. Note: this is currently only relevant for SageMaker Clusters
paramiko
: Uses Paramiko to create an SSH tunnel to the cluster. This is relevant if you are using a cluster which require existing credentials (e.g. a password)The API server will be started by default on port 32300.
Validating resource access on the cluster:
NGINX (optional):
The Runhouse API server (a Fast API app) will by default run on a higher, non-privileged port (32300). Nginx will run in front of uvicorn and serve as a reverse proxy to forward requests from port 80 (default for HTTP) or port 443 (default for HTTPS) to the API server's port.
TODOs: