-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Local Execution of Training Jobs #2231
Comments
Thank you for creating this @franciscojavierarceo. Please can you explain details on how KFP runs pipelines locally ? Do I need to have Docker runtime in my local environment to run it and do I need to have local Kind cluster running ? /area sdk |
/remove-label lifecycle/needs-triage |
So they allow for a local subprocess runner and docker runner. The docker container approach is pretty straightforward (code below) but I actually like the Subprocess approach even though the KFP docs recommend the DockerRunner. I understand why they recommend the Docker based approach but the Subprocess is just easier for data scientists. You can pass in a list of packages to the virtual environment that will be created to run the pipeline locally. I think that's probably the lowest-friction approach for Data Scientists to get started with the Training on Kubeflow (especially those unfamiliar with k8s). I think the docker approach or the venv approach is probably all we would need as a start. Pipelines has to deal with complex DAG orchestration where Training only needs to worry about executing the Glad to hear you're supportive of this! I'll talk with folks on the team to investigate creating a spec on the implementation. 👍 Kubeflow Pipeline's docker runner implementation# https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/local/docker_task_handler.py
def run_docker_container(
client: 'docker.DockerClient',
image: str,
command: List[str],
volumes: Dict[str, Any],
) -> int:
image = add_latest_tag_if_not_present(image=image)
image_exists = any(
image in existing_image.tags for existing_image in client.images.list())
if image_exists:
print(f'Found image {image!r}\n')
else:
print(f'Pulling image {image!r}')
repository, tag = image.split(':')
client.images.pull(repository=repository, tag=tag)
print('Image pull complete\n')
container = client.containers.run(
image=image,
command=command,
detach=True,
stdout=True,
stderr=True,
volumes=volumes,
)
for line in container.logs(stream=True):
# the inner logs should already have trailing \n
# we do not need to add another
print(line.decode(), end='')
return container.wait()['StatusCode'] |
My root question is, how will we orchestrate multiple Nodes (machines) and multiple Roles (networking and storage) without Kubernetes? |
Throw an
Yes |
Uhm, the next question is, will you support a single Docker container in a single machine or multiple Docker containers in a single machine? |
We'd probably outline the details about this in a tech spec that we would share with the community before doing the implementation. |
That makes sense. I would recommend sharing the outline of this feature, like the scope of support, in the community meeting or issue. Actual design often does not align with existing specifications. By sharing the outline before the actual design, we can avoid situations where the design is not implementable due to existing specifications. |
Agreed! |
What you would like to be added?
The Kubeflow Pipelines v2 API supports running and testing pipelines locally without the need for Kubernetes. Ideally, the TrainingClient could also be extended to run locally for both the v1 and forthcoming v2 API.
This is particularly appealing to Data Scientists who may not be as familiar with Kubernetes or Data Scientists that aim to develop and test their training jobs locally for a faster feedback loop.
As a means of comparison, this is what makes Ray's library so easy to get started for data scientists; i.e., their code just works without having to think too much about kubernetes.
Why is this needed?
Providing a great developer experience for Data Scientists is extremely valuable for growing adoption and catering to our end users.
Love this feature?
Give it a 👍 We prioritize the features with most 👍
The text was updated successfully, but these errors were encountered: