-
Notifications
You must be signed in to change notification settings - Fork 88
[FEATURE] uv provider
#164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Wauplin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @burtenshaw , thanks for the ping! Happy to review a PR :) I've left quite a lot of comments on how I would implement things if starting the project right now. Feel free to ignore when not applicable or if I did not take into account other relevant parts of the library. In general my main comment is that I always tend to reduce the number of available parameters ("if no-one asked for it, let's not implement it") and isolate logic ("let's not have Hub-logic or docker-logic in uv provider").
Given that it's the second provider implemented, I think it's good to take some time to think about which API makes the most sense for the future of the library.
Let me know if you have any questions / comments 🤗
|
|
||
| import requests | ||
|
|
||
| from .providers import ContainerProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since uv is not per-se a container, I would rename the abstract class RuntimeProvider (or simply Runtime?) and the abstract methods start, stop, and wait_for_ready (instead of start_container, etc.). I feel that semantically it would be more accurate.
(just my 2c, feel free to ignore 🤗 )
| response = requests.get(health_url, timeout=2.0) | ||
| if response.status_code == 200: | ||
| return | ||
| except requests.RequestException: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| response = requests.get(health_url, timeout=2.0) | |
| if response.status_code == 200: | |
| return | |
| except requests.RequestException: | |
| pass | |
| timeout = max(0.0001, min(deadline - time.time(), 2.0)) | |
| response = requests.get(health_url, timeout=timeout) | |
| if response.status_code == 200: | |
| return | |
| except requests.RequestException: | |
| continue |
(nit) to avoid waiting 2s + 0.5s if timeout_s is less than 2s.
|
|
||
|
|
||
| @dataclass | ||
| class UVProvider(ContainerProvider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class UVProvider(ContainerProvider): | |
| class UVRuntime(RuntimeProvider): |
(naming suggestion to be aligned with the "runtime/" folder)
| host: str = "0.0.0.0" | ||
| port: Optional[int] = None | ||
| reload: bool = False | ||
| project_url: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is project_url for compared to repo_id? I find this redundancy a bit confusing. What about having a single project: str argument which will then be forwarded to uv? This project can be either a local path or a git+... url.
If we do so, UVProvider becomes independent from HF Hub and all the Hub-related logic is delegated to HTTPEnvClient.fromHub (i.e. we build the project url based on repo_id in fromHub and that's it).
| return command | ||
|
|
||
|
|
||
| @dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal opinion but I feel that using a dataclass for a subprocess runtime is not very Pythonic (and not aligned with ContainerProvider / LocalDockerProvider). Maybe better to remove it and define a proper __init__ method?
| client_host = self.connect_host or ( | ||
| "127.0.0.1" if self.host in {"0.0.0.0", "::"} else self.host | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a big difference between host and a connect_host? Like in other comments, I would suggest exposing a single argument to avoid bloated signature.
| connect_host: Optional[str] = None, | ||
| extra_env: Optional[Dict[str, str]] = None, | ||
| **provider_kwargs: Any, | ||
| ) -> EnvClientT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this signature very hard to understand without context since it's mixing kwargs for docker and for uv. Also I don't think one needs use_docker, provider, and runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to remove provider and runner. Also remove project_url and connect_host. Have a single env (instead of env for docker and extra_env for uv). And use some typed dict + typing.overload so that IDEs can have correct autocompletion. Here is a simplified example:
from typing import Any, Dict, NotRequired, TypedDict, Unpack, overload
class DockerKwargs(TypedDict, total=False):
tag: NotRequired[str]
env: NotRequired[Dict[str, str]]
class UVProvider(TypedDict):
host: NotRequired[str]
port: NotRequired[int]
reload: NotRequired[bool]
timeout_s: NotRequired[float]
env: NotRequired[Dict[str, str]]
@overload
def from_hub(repo_id: str, *, use_docker: bool = True, **kwargs: Unpack[DockerKwargs]) -> str: ...
@overload
def from_hub(repo_id: str, *, use_docker: bool = False, **kwargs: Unpack[UVProvider]) -> str: ...
def from_hub(repo_id: str, *, use_docker: bool = False, **kwargs: Any) -> str:
raise NotImplementedError()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I'm not a fan of overloads and typed dict but it's the only solution I see to correctly document the signature while keeping a single method.
Another solution is to have from_hub_docker and from_hub_uv (more explicit but less elegant)
|
|
||
| tag = provider_kwargs.pop("tag", "latest") | ||
| image = provider_kwargs.pop( | ||
| "image", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not have an image option in from_hub (if user wants to define their own image, they should use from_docker_image)
| base_url = uv_runner.start_container( | ||
| repo_id, | ||
| port=port, | ||
| env_vars=env_vars, | ||
| **non_docker_kwargs, | ||
| ) | ||
|
|
||
| try: | ||
| uv_runner.wait_for_ready(base_url, timeout_s=timeout_s) | ||
| except Exception: | ||
| uv_runner.stop_container() | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the runner is defined and instantiated, I feel that the start and wait calls should be made in the __init__ instead. This way you get the same behavior both in from_hub and from_docker. With current implementation, one stops the runner if failing to start, the other don't (which is not consistent).
| uv_runner = runner or UVProvider( | ||
| repo_id=repo_id, | ||
| host=host, | ||
| port=port, | ||
| reload=reload, | ||
| project_url=project_url, | ||
| connect_host=connect_host, | ||
| extra_env=extra_env, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| uv_runner = runner or UVProvider( | |
| repo_id=repo_id, | |
| host=host, | |
| port=port, | |
| reload=reload, | |
| project_url=project_url, | |
| connect_host=connect_host, | |
| extra_env=extra_env, | |
| ) | |
| runner = UVProvider( | |
| project=f"git+https://huggingface.co/spaces/{repo_id}", | |
| host=host, | |
| port=port, | |
| reload=reload, | |
| env=env, | |
| ) |
With all the suggestions above, that would be the runner call (i.e. provide a project url, remove connect_host, remove project_url, rename extra_env to env, and remove runner). I don't think final user loose much in this simplification.
This PR adds a uv provider to open env core which can be used to run envs on spaces without docker. The main intention here is to make a pure python option for notebook environments like colab.
The above example, depends on this specific env, which is the result of #160 .