Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Wrong detection of the space available. #46868

Closed
Bye-legumes opened this issue Jul 30, 2024 · 4 comments
Closed

[Core] Wrong detection of the space available. #46868

Bye-legumes opened this issue Jul 30, 2024 · 4 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks

Comments

@Bye-legumes
Copy link
Contributor

Bye-legumes commented Jul 30, 2024

What happened + What you expected to happen

When we run ray in container, ray will have log looks like this
image
That means the capacity is detected from the host machine (1000GB), while the available gets from the docker file. So although I have 32GB available as it believe the disk is almost full 95%, it still has misleading log.

Versions / Dependencies

latest.

Reproduction script

build:
      context: .
      network: host
      dockerfile: Dockerfile
    volumes:
      # With this you can use docker outside (bmc host docker engine) of docker (devcontainer)
      # Forwards the local Docker socket to the container.
      # - /var/run/docker.sock:/var/run/docker-host.sock

      # Update this to wherever you want VS Code to mount the folder of your project and change in devcontainer.json to auto open it
      - ../:/workspace:consistent
      - type: tmpfs
        target: /workspace/tmp/
        tmpfs:
          size: 16G  # Size in bytes, 15GB
    # allow usage of gpus
    deploy:
      resources:
        #reservations:
        #  devices:
        #    - driver: nvidia
        #      count: all
        #      capabilities: [gpu]
        limits:
          cpus: "10.0"
          memory: 32G

    # shared mem used by Ray otherwise it will use /tmp that reduce performance
    shm_size: '15gb' # default 96/3

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@Bye-legumes Bye-legumes added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 30, 2024
@ruisearch42 ruisearch42 added P1 Issue that should be fixed within a few weeks core Issues that should be addressed in Ray Core P0.5 and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) P1 Issue that should be fixed within a few weeks labels Aug 5, 2024
@rkooo567 rkooo567 added P1 Issue that should be fixed within a few weeks and removed P0.5 labels Aug 5, 2024
@rkooo567
Copy link
Contributor

rkooo567 commented Aug 5, 2024

based on my search, it seems like docker doesn't isolate disk space https://www.google.com/search?q=docker+isolate+disk+space%3F&oq=docker+isolate+disk+space%3F&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigATIHCAMQIRifBTIHCAQQIRifBTIHCAUQIRifBTIHCAYQIRifBTIHCAcQIRifBTIHCAgQIRifBTIHCAkQIRifBdIBCDMzNzBqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8.

How did you set 32GB disk space limit on your container? also what's the command you use to get 32GB disk space? (is it df -h)

@Bye-legumes
Copy link
Contributor Author

based on my search, it seems like docker doesn't isolate disk space https://www.google.com/search?q=docker+isolate+disk+space%3F&oq=docker+isolate+disk+space%3F&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigATIHCAMQIRifBTIHCAQQIRifBTIHCAUQIRifBTIHCAYQIRifBTIHCAcQIRifBTIHCAgQIRifBTIHCAkQIRifBdIBCDMzNzBqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8.

How did you set 32GB disk space limit on your container? also what's the command you use to get 32GB disk space? (is it df -h)

maybe I set wrong mount disk. Sry for confusion.

@Bye-legumes
Copy link
Contributor Author

close

@Liquidmasl
Copy link

Liquidmasl commented Aug 21, 2024

I am running in this issue as well I believe.

as available space it shows my 4tb, but the available is... a lot less, even though i have a lot of space left, so ray stops while it should not.

I dont set any memory limit in the docker compose. and I set ray _memory to 800gb
shm is set in the docker compose to 200gb

I dont understand whats happening
(raylet) [2024-08-21 13:11:24,500 E 254 284] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-08-21_12-57-11_901253_74 is over 95% full, available space: 147223207936; capacity: 3936290357248. Object creation will fail if spilling is required.

So where does it get available space from? and where does he get total memory from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants