-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: add a docker-compose-distributed example with multiple workers #1064
FEAT: add a docker-compose-distributed example with multiple workers #1064
Conversation
Here's the log of the
UI is accessible http://0.0.0.0:9997/ui/ but connecting via another |
Hi, @bufferoverflow . Thanks for contributing! Could you please modify your PR in these respects:
|
thanks @ChengjieLi28 I change accordingly, but the main problem here is that it does not work on my end with multiple workers, do I miss some parameters? |
I tried your docker-compose.yml on my machine. It seems that you need to ensure that supervisor has already been started, and then you can start the workers. That is, the worker must be started after the supervisor is started. From the log, the worker tries to connect to the supervisor (restful api) endpoint when the supervior has not been started yet. |
c58d8d5
to
2cb7d90
Compare
@ChengjieLi28 The issue was the missing
Not sure regarding this as docker compose is quite static, what did you had in mind? btw. the supervisor requires gpu as well but it should not from my perspective. |
2cb7d90
to
c009b8a
Compare
|
Also, remain the volumn related comments to this new file. This allows users to use the mounted directory without having to repeatedly download the model. |
xinference-supervisor without GPU:
I agree supervisor does work without, but I was unable to find a parameter to make it work without. I guess it's because the docker image is for GPU. |
3ccca40
to
ac14c9e
Compare
@ChengjieLi28 added the volume and worker comments. Please let me know if there is anything else I can do. |
ac14c9e
to
2f1bb4e
Compare
Great! Thank you. I will test this file on my machine. If everything works, I would approve this PR. This PR will be included in the next release. |
2f1bb4e
to
1801e05
Compare
Hi, @bufferoverflow . When I tested this PR on my machine. I found that some errors happen and I coundn't open the web ui on the port 9997.
|
It seems that the worker and supervisor are started together, and just rely on the restart policy to finally be able to start successfully. This can cause confusions for users. And then port 9997 of the supervisor needs to be open to the host, otherwise it's not accessible externally. |
I just added a healthcheck and depends_on condition with 4ccd3a0 |
Thanks. I have already tested your PR on my machine. Everything works fine. Please reduce the interval of failure checks and I will merge this PR. |
@ChengjieLi28 thanks for the feedback, I just made 20221c0 to set interval and start_period to 5s |
I was unable to make a distributed setup work using this docker compose file, I guess there is some missing back connect from supervisor to worker or so. Would be great if some of you could guide me on how to make this work.