Problem with usage of juicedata/mount docker image #5206

halsbox · 2024-10-05T18:42:46Z

What happened:
Juicefs is not properly unmounted when stopping docker container created as documented in juicefs_on_docker#method-2-use-the-officially-maintained-image

What you expected to happen:
Juicefs is unmounted on container stop

How to reproduce it (as minimally and precisely as possible):
Follow juicefs_on_docker#method-2-use-the-officially-maintained-image documentation to create and start a container, then stop container. Run stat or ls on directory that was bind-mounted in the container to get Transport endpoint is not connected error. Unmount the directory manually.

Anything else we need to know?
juicefs mount command starts two processes, of which only the second one is listening for signals (SIGTERM, etc) and only the second process does graceful stop with unmount when receiving a signal from docker engine (or anything else). When run in docker container as described in docs, the signal is sent to the first process with PID 1 and container is just killed without graceful unmount. The bug can be reproduced without using containers: just run juicefs mount in foreground, then go to another terminal and do ps auxww | grep juicefs, you'll get two processes with their PIDs. Run kill -SIGTERM with first PID as argument.

Environment:

JuiceFS version (use juicefs --version) or Hadoop Java SDK version: 1.2.1+2024-08-30.cd871d19
OS (e.g cat /etc/os-release): Debian GNU/Linux 12 (bookworm)
Kernel (e.g. uname -a): 6.1.0-26-amd64 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

halsbox · 2024-10-05T23:50:33Z

My ugly workaround is running juicefs mount command in shell with signal trap, e.g. in docker compose:

command: sh -c "trap 'test -e /fs/.config && umount /fs' TERM INT HUP; juicefs mount redis://redis:6379/1 /fs & p=$$!; wait $$p"

instead of:

command: ["juicefs", "mount", "redis://redis:6379/1", "/fs"]

halsbox · 2024-10-07T13:16:37Z

I looked in source code and made a little patch against main branch to solve the problem:

diff --git a/cmd/mount_unix.go b/cmd/mount_unix.go
index 2e60e8d1..f43844c6 100644
--- a/cmd/mount_unix.go
+++ b/cmd/mount_unix.go
@@ -780,6 +780,16 @@ func launchMount(mp string, conf *vfs.Config) error {
                os.Unsetenv("_FUSE_STATE_PATH")
                mountPid = cmd.Process.Pid
 
+               signalChan := make(chan os.Signal, 10)
+               signal.Notify(signalChan, syscall.SIGTERM, syscall.SIGINT, syscall.SIGHUP)
+               go func() {
+                       for {
+                               sig := <-signalChan
+                               logger.Infof("Received signal %s, propagating to child process %d...", sig.String(), mountPid)
+                               cmd.Process.Signal(sig)
+                       }
+               }()
+
                ctx, cancel := context.WithCancel(context.TODO())
                go watchdog(ctx, mp)
                err = cmd.Wait()

Should I make it a PR?

zhijian-pro · 2024-10-08T09:47:13Z

@halsbox I did not duplicate the issue. I was able to stop the container normally using docker stop and no lingering processes were observed. When I mounted the foreground mount point using juicefs and then killed the daemon process with -SIGTERM, no lingering processes were observed either.

halsbox · 2024-10-08T16:56:42Z

@zhijian-pro
That's so strange. I can reproduce the bug with ease and also understand why it's happening after debuging with strace and studying the source code. I've recorded a short screencast to show the bug, please take a look:
https://github.com/user-attachments/assets/259cfdcb-ffef-4f4b-a785-8898ea7e4c9b
In the video I'm using the following self-contained docker compose file:

services:
  redis:
    image: redis:alpine
    restart: always
    volumes:
      - "${PWD}/data/redis:/data"
    command: ["redis-server", "--appendonly", "yes"]
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      timeout: 10s
      retries: 3
      start_period: 10s
  minio:
    image: minio/minio
    volumes:
      - "${PWD}/data/s3:/data"
    environment:
      MINIO_ROOT_USER: secret
      MINIO_ROOT_PASSWORD: secretkey
    command: server --console-address ":9001" /data
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://127.0.0.1:9000/minio/health/live']
      timeout: 10s
      retries: 3
      start_period: 10s
  create_bucket:
    image: minio/mc
    volumes:
      - "${PWD}/data/state:/state"
    entrypoint: >
      sh -c "test -e /state/bucket_created || \
      (mc config host add s3 http://minio:9000 secret secretkey && \
      mc mb s3/juicefs && \
      touch /state/bucket_created && \
      exit 0)"
    depends_on:
      minio:
        condition: service_healthy
  format_juicefs:
    image: juicedata/mount:latest
    volumes:
      - "${PWD}/data/state:/state"
    environment:
      ACCESS_KEY: secret
      SECRET_KEY: secretkey
    command: >
      sh -c "test -e /state/juicefs_created || \
      (juicefs format --storage minio --bucket http://minio:9000/juicefs redis://redis:6379/1 testfs && \
      touch /state/juicefs_created)"
    depends_on:
      redis:
        condition: service_healthy
      create_bucket:
        condition: service_completed_successfully
  juicefs:
    image: juicedata/mount:latest
    volumes:
      - "${PWD}/data/mnt:/mnt:rw,rshared"
    cap_add:
      - SYS_ADMIN
    devices:
      - /dev/fuse
    security_opt:
      - apparmor:unconfined
    command: ["juicefs", "mount", "redis://redis:6379/1", "/mnt"]
    restart: unless-stopped
    depends_on:
      redis:
        condition: service_healthy
      format_juicefs:
        condition: service_completed_successfully

Commands to reproduce:

ls -la && \
docker compose pull && \
docker compose up -d && \
sleep 3 && \
ls -la data/mnt && \
docker compose stop juicefs && \
ls -la data/mnt

data/mnt is now a broken mount point.

halsbox · 2024-10-08T17:12:13Z

The workflow is following:

juicefs mount is started as PID 1 in container
It forks a new child process with same executable and arguments and some PID NN in cmd/mount_unix.go:func launchMount
This child process installs a listener for os signals in cmd/mount_unix.go:func installHandler
Docker engine sends SIGTERM to PID 1 on container stop
Process with PID 1 does not propagate signal to child process with PID NN to gracefully shutdown and container is just killed, leaving broken mountpoint.

zhijian-pro · 2024-10-09T09:47:28Z

@halsbox You are right, welcome PR, thank you very much！

halsbox added the kind/bug Something isn't working label Oct 5, 2024

zhijian-pro added kind/bug Something isn't working and removed kind/bug Something isn't working labels Oct 9, 2024

zhijian-pro self-assigned this Oct 9, 2024

zhijian-pro mentioned this issue Oct 21, 2024

cmd/mount: pass the signal to the child process #5238

Merged

jiefenghuang closed this as completed in #5238 Oct 22, 2024

zhoucheng361 added the missed missed bug label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with usage of juicedata/mount docker image #5206

Problem with usage of juicedata/mount docker image #5206

halsbox commented Oct 5, 2024 •

edited

Loading

halsbox commented Oct 5, 2024

halsbox commented Oct 7, 2024

zhijian-pro commented Oct 8, 2024

halsbox commented Oct 8, 2024 •

edited

Loading

halsbox commented Oct 8, 2024

zhijian-pro commented Oct 9, 2024 •

edited

Loading

Problem with usage of juicedata/mount docker image #5206

Problem with usage of juicedata/mount docker image #5206

Comments

halsbox commented Oct 5, 2024 • edited Loading

halsbox commented Oct 5, 2024

halsbox commented Oct 7, 2024

zhijian-pro commented Oct 8, 2024

halsbox commented Oct 8, 2024 • edited Loading

halsbox commented Oct 8, 2024

zhijian-pro commented Oct 9, 2024 • edited Loading

halsbox commented Oct 5, 2024 •

edited

Loading

halsbox commented Oct 8, 2024 •

edited

Loading

zhijian-pro commented Oct 9, 2024 •

edited

Loading