Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add envd-server runtime proposal #303

Merged
merged 20 commits into from
Oct 11, 2022
Merged

Conversation

gaocegege
Copy link
Member

@gaocegege gaocegege commented Jun 13, 2022

@gaocegege gaocegege changed the title feat: Add Kubernetes runtime proposal WIP feat: Add Kubernetes runtime proposal Jun 29, 2022
@gaocegege gaocegege requested a review from VoVAllen June 29, 2022 08:27
@gaocegege
Copy link
Member Author

/cc @Xuanwo @hezhizhen @knight42

docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
}
```

## Design Details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will envd play with charts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean helm charts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users expect this, but I am not sure how to integrate it with charts.

Copy link
Member

@VoVAllen VoVAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. One thing I'm concerning is that whether syncing is needed at MVP. Code should be on a PV-like thing I think and usually people works with git. Might not need syncing when working on cluster.

The problem of syncing is about the dataset related stuffs. User may need fine-grained control of what to sync, which increases the complexity

@VoVAllen
Copy link
Member

Other options for file syncing:

@knight42
Copy link
Member

One thing I'm concerning is that whether syncing is needed at MVP. Code should be on a PV-like thing I think and usually people works with git

My 2 cents is syncing might be needed ultimately, but might in some different ways. Let's say we are working with git, and push some new commits to a branch, I think it would be better if envd could pull the new commits automatically to simulate the local development experience.

@gaocegege
Copy link
Member Author

My 2 cents is syncing might be needed ultimately, but might in some different ways. Let's say we are working with git, and push some new commits to a branch, I think it would be better if envd could pull the new commits automatically to simulate the local development experience.

Thanks for the advice! Automatic push/pull looks magic to me. And it is complex. If the container is crashed, we also may lost the commits if we do not run push.

@gaocegege
Copy link
Member Author

As discussed with some infra engineers interested in envd, port-forwarding may consume many API server CPUs. And tools like virtual kubelet does not support port forwarding.

@knight42
Copy link
Member

port-forwarding may consume many API server CPUs.

Would you mind elaborating? AFAIK if there is no much traffic, port-forwarding should not consume too much cpu resources, as it is simply a SPDY connection under the hood.

virtual kubelet does not support port forwarding

Indeed. But what are we going to do to access the services inside the container without port-forwarding?

@gaocegege
Copy link
Member Author

gaocegege commented Jun 29, 2022

Thanks for the advice! Automatic push/pull looks magic to me. And it is complex. If the container is crashed, we also may lost the commits if we do not run push.

Or, we can forget the sync things. We request users to develop on the remote container, instead of the host. The build.envd may look like this:

def build():
    base(os="ubuntu20.04", language="python3")
    install.vscode_extensions([
        "ms-python.python",
    ])
    #config.pip_index(url = "https://pypi.tuna.tsinghua.edu.cn/simple")
    install.python_packages([
        "tensorflow",
        "numpy",
    ])
    shell("zsh")
    config.jupyter(password="", port=8888)
+    config.working_dir(local=".", remote="https://github.com/tensorchord/envd.git")

envd mounts the local dir with docker runner and downloads the repo with Kubernetes runner.

@gaocegege
Copy link
Member Author

gaocegege commented Jun 29, 2022

Would you mind elaborating? AFAIK if there is no much traffic, port-forwarding should not consume too much cpu resources, as it is simply a SPDY connection under the hood.

There should not be huge traffic by design. But algorithm engineers may use it to copy data:

scp <10G-file> container:~

@gaocegege
Copy link
Member Author

But what are we going to do to access the services inside the container without port-forwarding?

They may use service and ingress to achieve this. Thus we may need a mechanism to support customization here.

Maybe just like the design of the device plugin, we provide an interface to communicate between envd and a CLI shim. The shim does the critical logic like port forwarding. The envd just communicate with the shim and show information to users.

Port forwarding can be used in our default shim, while users can write their own shim to customize, e.g. using service and ingress.

@gaocegege
Copy link
Member Author

But what are we going to do to access the services inside the container without port-forwarding?

They may use service and ingress to achieve this. Thus we may need a mechanism to support customization here.

Maybe just like the design of the device plugin, we provide an interface to communicate between envd and a CLI shim. The shim does the critical logic like port forwarding. The envd just communicate with the shim and show information to users.

Port forwarding can be used in our default shim, while users can write their own shim to customize, e.g. using service and ingress.

Device plugin is a not-so-good comparison. Let's say kubectl plugin mechanism. The shim maintained by users can be integrated into the envd.

@knight42
Copy link
Member

My concern is that working dir seems to be a command line argument to me, otherwise it might prevent the reuse of build.envd in different working dir, just like we don't specify the build context in Dockerfile. Besides if we need to specify the repo address, should we need to specify the branch as well?

algorithm engineers may use it to copy data:

Got it 👌 If we need to transfer such huge file via port-forwarding without rate limiting, the functionality of apiserver might be affected.

while users can write their own shim to customize, e.g. using service and ingress.

I think it make sense 👍

@gaocegege
Copy link
Member Author

My concern is that working dir seems to be a command line argument to me, otherwise it might prevent the reuse of build.envd in different working dir, just like we don't specify the build context in Dockerfile. Besides if we need to specify the repo address, should we need to specify the branch as well?

Sounds reasonable. It should be a runtime argument instead of build time.

@VoVAllen Do you have opinion on it?

@VoVAllen
Copy link
Member

Agree it better to be a runtime option. If using git, user should handle the sync related thing by himself (clone git repo and git pull/push). Therefore config.working_dir might not be needed.

@gaocegege

This comment was marked as outdated.

@VoVAllen
Copy link
Member

VoVAllen commented Jul 6, 2022

Found one new syncing tool:
https://github.com/mutagen-io/mutagen

@aseaday
Copy link
Member

aseaday commented Jul 11, 2022

How is it going now?

gaocegege and others added 2 commits September 29, 2022 18:25
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
@gaocegege
Copy link
Member Author

PTAL

Copy link
Contributor

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo cleanup

docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
gaocegege and others added 13 commits September 30, 2022 10:08
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
@gaocegege
Copy link
Member Author

@Xuanwo Thanks for your fix!

@kemingy
Copy link
Member

kemingy commented Sep 30, 2022

@Xuanwo Thanks for your fix!

Probably you should try Grammarly. There are still some syntax errors.


#### Data and code integration

`runtime.docker.mount` and other docker-specific funcs will be ignored in kubernetes context. `runtime.kubernetes.mount()` should create/use the corresponding PV/PVC in the backend pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we expose the backend implementation to the starlark frontend?
Do we need to rename the current Docker-specific implementations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about it. I opened an issue to keep track #530

docs/proposals/20220603-kubernetes-vendor.md Outdated Show resolved Hide resolved
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
@gaocegege
Copy link
Member Author

@Xuanwo Thanks for your fix!

Probably you should try Grammarly. There are still some syntax errors.

It should be fixed. PTAL.

@gaocegege gaocegege changed the title feat: Add Kubernetes runtime proposal feat: Add envd-server runtime proposal Oct 11, 2022
@gaocegege
Copy link
Member Author

I am merging it since we are already starting the development of envd-server.

@gaocegege gaocegege merged commit d87afda into tensorchord:main Oct 11, 2022
@gaocegege gaocegege deleted the proposal branch October 11, 2022 04:34
enjoyliu pushed a commit to enjoyliu/envd that referenced this pull request Oct 12, 2022
* feat: Add proposal

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

* WIP

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

* feat: Update

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

* fix: Update

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

* fix: Update

Signed-off-by: Ce Gao <ce.gao@outlook.com>

* fix: Update

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* Update docs/proposals/20220603-kubernetes-vendor.md

Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Ce Gao <ce.gao@outlook.com>

* chore: Fix formats

Signed-off-by: Ce Gao <cegao@tensorchord.ai>

Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Signed-off-by: Ce Gao <ce.gao@outlook.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Signed-off-by: luisyjliu <luisyjliu@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants