Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Image #173

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

Docker Image #173

wants to merge 32 commits into from

Conversation

Scot-Survivor
Copy link

Added some commits, and double-backed off of DanCodes, the original pull request that was closed.

@Scot-Survivor
Copy link
Author

This will likely need a better cleanup from people who know how to write Docker files better than I do.

But I believe this is a good starting point.

@AlexCheema
Copy link
Contributor

AlexCheema commented Aug 24, 2024

This looks great! Thanks for the contribution (also fixes #119)

Some small things:

  • (EDIT - just saw you already have this) EXPOSE in Dockerfile for the ports that are exposed (it’s only for documentation has no effect)
  • one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)
  • What’s the thinking with continuous delivery? Official exo docker images on dockerhub?
  • It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly
  • Related to above: if we can run a multi-node test in CI that would be super

@Axodouble
Copy link

It may be wise to provide 2 separate dockerfiles, as not all devices run NVIDIA GPU's, however I have not looked a lot at the source code, I assume Cuda isn't a fixed requirement?

@dan-online
Copy link

Heya, I'll be checking this out today, some context with the original PR is that I was just merging it to main in our fork late at night so I messed up the target howwweeeever, I'm glad to see it would be helpful here. Firstly I'll rebase this to resolve conflicts I'm seeing. As for your comments:

  • one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)

I agree here, that's probably the best way to move forward, would you prefer it in say a docker/ folder or just at root? Personally I try to limit files at root but obviously if you have a preference I'll follow that.

  • What’s the thinking with continuous delivery? Official exo docker images on dockerhub?

Yep I can add a CD github action to this PR, just up to you guys to create an org and add the token to the repo action secrets.

  • It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly

Great idea, this could also go in the aforementioned docker folder

  • Related to above: if we can run a multi-node test in CI that would be super

Up to you if you think this is in scope for this PR, I think possibly it's a nice-to-have so maybe for a future feature

@AlexCheema
Copy link
Contributor

Heya, I'll be checking this out today, some context with the original PR is that I was just merging it to main in our fork late at night so I messed up the target howwweeeever, I'm glad to see it would be helpful here. Firstly I'll rebase this to resolve conflicts I'm seeing. As for your comments:

  • one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)

I agree here, that's probably the best way to move forward, would you prefer it in say a docker/ folder or just at root? Personally I try to limit files at root but obviously if you have a preference I'll follow that.

At the root is fine.

  • What’s the thinking with continuous delivery? Official exo docker images on dockerhub?

Yep I can add a CD github action to this PR, just up to you guys to create an org and add the token to the repo action secrets.

We can create an org. Someone has already taken exolabs unfortunately, so I've requested to claim that name.

  • It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly

Great idea, this could also go in the aforementioned docker folder

:)

  • Related to above: if we can run a multi-node test in CI that would be super

Up to you if you think this is in scope for this PR, I think possibly it's a nice-to-have so maybe for a future feature

Let's leave it to a future PR then. For now, the docker-compose.yml can serve as documentation / quick test locally.

@Scot-Survivor
Copy link
Author

Does exo not use all available GPU's to the pc by default?

Why would someone want multi workers in a compose, compose only works with one host, it's not multi node orchestrated like Kubernetes

@AlexCheema
Copy link
Contributor

Does exo not use all available GPU's to the pc by default?

Why would someone want multi workers in a compose, compose only works with one host, it's not multi node orchestrated like Kubernetes

exo does not use multi-gpu by default. If you have a single device with multiple GPUs you can (e.g. with the tinygrad backend) set VISIBLE_DEVICES={index} where {index} starts from 0 e.g. VISIBLE_DEVICES=1 for index 1. Or specifically for CUDA, this would be CUDA_VISIBLE_DEVICES={index}

@dan-online
Copy link

@AlexCheema Feel free to review!

@Scot-Survivor
Copy link
Author

@dan-online , at least one other Dockerfile for none GPU accelerated computers would be useful (and to us)

@Scot-Survivor
Copy link
Author

Did Alpine work? Ubuntu is massive.
Python Alpine Base image should work?

@dan-online
Copy link

Alpine was- tricky so I pushed an ubuntu image first just to check if it would work before I try tackling alpine again

@dan-online
Copy link

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

@AlexCheema
Copy link
Contributor

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

we shouldn't have a tensorflow dependency. when I run pip list tensorflow does not come up. why do we need tensorflow?

@AlexCheema
Copy link
Contributor

Secured the exolabs dockerhub namespace now!

@Scot-Survivor
Copy link
Author

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

we shouldn't have a tensorflow dependency. when I run pip list tensorflow does not come up. why do we need tensorflow?

@dan-online you got a chance to follow up today?

@dan-online
Copy link

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:

tensorflow request

@AlexCheema
Copy link
Contributor

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:

tensorflow request

This looks fine. That warning can be ignored, it comes from the transformers library but we don't use models from there.

@dan-online
Copy link

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:
tensorflow request

This looks fine. That warning can be ignored, it comes from the transformers library but we don't use models from there.

Weirdly it didn't actually boot anything without tensorflow installed, it would just stop at that warning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants