Docker Image #173

Scot-Survivor · 2024-08-24T18:31:09Z

Added some commits, and double-backed off of DanCodes, the original pull request that was closed.

Scot-Survivor · 2024-08-24T18:33:20Z

This will likely need a better cleanup from people who know how to write Docker files better than I do.

But I believe this is a good starting point.

AlexCheema · 2024-08-24T18:58:34Z

This looks great! Thanks for the contribution (also fixes #119)

Some small things:

(EDIT - just saw you already have this) EXPOSE in Dockerfile for the ports that are exposed (it’s only for documentation has no effect)
one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)
What’s the thinking with continuous delivery? Official exo docker images on dockerhub?
It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly
Related to above: if we can run a multi-node test in CI that would be super

Axodouble · 2024-08-26T06:02:41Z

It may be wise to provide 2 separate dockerfiles, as not all devices run NVIDIA GPU's, however I have not looked a lot at the source code, I assume Cuda isn't a fixed requirement?

dan-online · 2024-08-28T12:47:11Z

Heya, I'll be checking this out today, some context with the original PR is that I was just merging it to main in our fork late at night so I messed up the target howwweeeever, I'm glad to see it would be helpful here. Firstly I'll rebase this to resolve conflicts I'm seeing. As for your comments:

one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)

I agree here, that's probably the best way to move forward, would you prefer it in say a docker/ folder or just at root? Personally I try to limit files at root but obviously if you have a preference I'll follow that.

What’s the thinking with continuous delivery? Official exo docker images on dockerhub?

Yep I can add a CD github action to this PR, just up to you guys to create an org and add the token to the repo action secrets.

It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly

Great idea, this could also go in the aforementioned docker folder

Related to above: if we can run a multi-node test in CI that would be super

Up to you if you think this is in scope for this PR, I think possibly it's a nice-to-have so maybe for a future feature

AlexCheema · 2024-08-28T13:12:59Z

Heya, I'll be checking this out today, some context with the original PR is that I was just merging it to main in our fork late at night so I messed up the target howwweeeever, I'm glad to see it would be helpful here. Firstly I'll rebase this to resolve conflicts I'm seeing. As for your comments:

one Dockerfile for each target (Dockerfile-Mac, Dockerfile-NVIDIA, etc…)

I agree here, that's probably the best way to move forward, would you prefer it in say a docker/ folder or just at root? Personally I try to limit files at root but obviously if you have a preference I'll follow that.

At the root is fine.

What’s the thinking with continuous delivery? Official exo docker images on dockerhub?

Yep I can add a CD github action to this PR, just up to you guys to create an org and add the token to the repo action secrets.

We can create an org. Someone has already taken exolabs unfortunately, so I've requested to claim that name.

It would be cool to have an example docker-compose.yml that can run a multi-node setup with networking set up properly

Great idea, this could also go in the aforementioned docker folder

:)

Related to above: if we can run a multi-node test in CI that would be super

Up to you if you think this is in scope for this PR, I think possibly it's a nice-to-have so maybe for a future feature

Let's leave it to a future PR then. For now, the docker-compose.yml can serve as documentation / quick test locally.

Scot-Survivor · 2024-08-28T14:05:34Z

Does exo not use all available GPU's to the pc by default?

Why would someone want multi workers in a compose, compose only works with one host, it's not multi node orchestrated like Kubernetes

AlexCheema · 2024-08-28T15:50:50Z

Does exo not use all available GPU's to the pc by default?

Why would someone want multi workers in a compose, compose only works with one host, it's not multi node orchestrated like Kubernetes

exo does not use multi-gpu by default. If you have a single device with multiple GPUs you can (e.g. with the tinygrad backend) set VISIBLE_DEVICES={index} where {index} starts from 0 e.g. VISIBLE_DEVICES=1 for index 1. Or specifically for CUDA, this would be CUDA_VISIBLE_DEVICES={index}

Co-authored-by: Scot_Survivor <40865296+Scot-Survivor@users.noreply.github.com>

dan-online · 2024-08-28T16:04:41Z

@AlexCheema Feel free to review!

Scot-Survivor · 2024-08-28T16:30:23Z

@dan-online , at least one other Dockerfile for none GPU accelerated computers would be useful (and to us)

Scot-Survivor · 2024-08-28T16:40:53Z

Did Alpine work? Ubuntu is massive.
Python Alpine Base image should work?

dan-online · 2024-08-28T16:48:33Z

Alpine was- tricky so I pushed an ubuntu image first just to check if it would work before I try tackling alpine again

dan-online · 2024-08-28T17:37:24Z

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

AlexCheema · 2024-08-30T11:30:04Z

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

we shouldn't have a tensorflow dependency. when I run pip list tensorflow does not come up. why do we need tensorflow?

AlexCheema · 2024-08-30T13:29:17Z

Secured the exolabs dockerhub namespace now!

Scot-Survivor · 2024-09-03T15:15:45Z

It seems that tensorflow hates alpine so at least for today I'm giving up on this endeavour haha

we shouldn't have a tensorflow dependency. when I run pip list tensorflow does not come up. why do we need tensorflow?

@dan-online you got a chance to follow up today?

dan-online · 2024-09-05T17:14:40Z

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:

AlexCheema · 2024-09-24T12:04:45Z

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:

This looks fine. That warning can be ignored, it comes from the transformers library but we don't use models from there.

dan-online · 2024-09-25T15:18:15Z

Heya @AlexCheema it seems tensorflow (or similar) is requested upon boot:

This looks fine. That warning can be ignored, it comes from the transformers library but we don't use models from there.

Weirdly it didn't actually boot anything without tensorflow installed, it would just stop at that warning

Scot-Survivor mentioned this pull request Aug 24, 2024

feat(docker): add dockerfile to build #125

Closed

dan-online force-pushed the main branch from 1221f3f to a8673ad Compare August 28, 2024 12:49

dan-online force-pushed the main branch from 8970450 to 3c53b06 Compare August 28, 2024 16:00

dan-online and others added 19 commits August 28, 2024 17:01

feat: add continuous delivery

a834887

feat(docker): add dockerfile for building (#1)

ee5ac10

Co-authored-by: Scot_Survivor <40865296+Scot-Survivor@users.noreply.github.com>

chore: cleanup dockerfile

0bf0b3e

chore: cache with github

a6bcbb9

chore: remove unused arg

5c16014

chore: use equals sign with env

a4383e1

chore: add concurrency limit

b1c84e1

chore: set debian env

5cc2369

chore: help the cache

0d157a8

chore: remove poetry install

43799c2

chore(dockerfile): Add comments

7d355c3

chore: docs and fixes for dockerfile

44381df

ci: add hadolint for dockerfiles

b255301

fix: add gcc to install

b0f4d06

chore: push dockerfile with suffix

d7910ac

ci: fail on error

bbe9c33

chore: add python-dev to deps

b4702e0

chore: use python3-dev

73c4a06

chore: use 3.12 dev

d0bb3db

dan-online force-pushed the main branch from 3c53b06 to d0bb3db Compare August 28, 2024 16:01

chore: remove os import

899dc52

dan-online force-pushed the main branch from 0da157c to 899dc52 Compare August 28, 2024 16:03

feat(docker): add non-nvidia image

7848fe2

chore: add gpg to install

0acca90

dan-online added 8 commits August 28, 2024 17:55

chore: try alpine image

2e83f41

chore: upgrade pip first

4c3bab5

chore: add cargo

98e3557

chore: add required deps

3d38536

chore: add openssl

00d2c72

chore: add pkgconfig to deps

ab1e8b9

chore: add perl to deps

81254dd

chore: alpine doesn't mix with tensorflow

2213364

dan-online added 2 commits September 5, 2024 17:36

fix: single dockerfile and python change

b3bc56f

chore: add tensorflow back

2fd2e93

rallisf1 mentioned this pull request Sep 12, 2024

Multi-GPU on same device #123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Image #173

Docker Image #173

Scot-Survivor commented Aug 24, 2024

Scot-Survivor commented Aug 24, 2024

AlexCheema commented Aug 24, 2024 •

edited

Loading

Axodouble commented Aug 26, 2024

dan-online commented Aug 28, 2024

AlexCheema commented Aug 28, 2024

Scot-Survivor commented Aug 28, 2024

AlexCheema commented Aug 28, 2024

dan-online commented Aug 28, 2024

Scot-Survivor commented Aug 28, 2024

Scot-Survivor commented Aug 28, 2024

dan-online commented Aug 28, 2024

dan-online commented Aug 28, 2024

AlexCheema commented Aug 30, 2024

AlexCheema commented Aug 30, 2024

Scot-Survivor commented Sep 3, 2024

dan-online commented Sep 5, 2024

AlexCheema commented Sep 24, 2024

dan-online commented Sep 25, 2024

Docker Image #173

Are you sure you want to change the base?

Docker Image #173

Conversation

Scot-Survivor commented Aug 24, 2024

Scot-Survivor commented Aug 24, 2024

AlexCheema commented Aug 24, 2024 • edited Loading

Axodouble commented Aug 26, 2024

dan-online commented Aug 28, 2024

AlexCheema commented Aug 28, 2024

Scot-Survivor commented Aug 28, 2024

AlexCheema commented Aug 28, 2024

dan-online commented Aug 28, 2024

Scot-Survivor commented Aug 28, 2024

Scot-Survivor commented Aug 28, 2024

dan-online commented Aug 28, 2024

dan-online commented Aug 28, 2024

AlexCheema commented Aug 30, 2024

AlexCheema commented Aug 30, 2024

Scot-Survivor commented Sep 3, 2024

dan-online commented Sep 5, 2024

AlexCheema commented Sep 24, 2024

dan-online commented Sep 25, 2024

AlexCheema commented Aug 24, 2024 •

edited

Loading