Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition while shutting down process-compose #197

Closed
3 tasks done
zmitchell opened this issue Jul 11, 2024 · 2 comments
Closed
3 tasks done

Race condition while shutting down process-compose #197

zmitchell opened this issue Jul 11, 2024 · 2 comments
Labels
bug Something isn't working done Done, awaiting release

Comments

@zmitchell
Copy link

Defect

Make sure that these boxes are checked before submitting your issue -- thank you!

  • Included the relevant configuration snippet
  • Included the relevant process-compose log (log location: process-compose info)
  • Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

Version of process-compose:

Verified on main branch and v1.6.1

OS environment:

macOS, Linux

Steps or code to reproduce the issue:

Every now and then (seemingly at random) I would see this error when calling process-compose down:

24-07-11 14:59:45.839 FTL failed to stop project error="Post \"http://unix/project/stop\": EOF"

I thought it was probably me misconfiguring something, but I've come to realize it's a race condition in this function in pc_api.go:

func (api *PcApi) ShutDownProject(c *gin.Context) {
	api.project.ShutDownProject()
	c.JSON(http.StatusOK, gin.H{"status": "stopped"})
}

I suspect there is a race between shutting down the processes and actually sending the response.

Since this is a race condition I suspect you run into it more often when you have processes that are very quick to shut down. This is the case in the scenarios I've encountered the issue:

# service-config.yaml
processes:
  sleeping1:
    command: "sleep 999999"
  sleeping2:
    command: "sleep 999999"

In order to reproduce this error you can run process-compose under a debugger in one terminal and make calls from the client in another terminal.

# term1
$ dlv debug github.com/f1bonacc1/process-compose/src -- -f service-config.yaml -u pc.sock --tui=false
...
> break pc_api.go:270
> c

The debugger will now pause while process-compose waits for API calls.

# term2
$ process-compose down -u pc.sock

term2 will now hang (I suppose it's waiting for a response) and term1 will proceed until it hits the breakpoint. If you go back to term1 and kill process-compose by leaving the debugger, you'll see the exact error I mentioned above.

# term1
> quit
# term2
24-07-11 14:59:45.839 FTL failed to stop project error="Post \"http://unix/project/stop\": EOF"

Expected result:

process-compose should reliably and gracefully shutdown when requested.

Actual result:

In some cases, process-compose does not shut down cleanly.

dcarley added a commit to dcarley/process-compose that referenced this issue Jul 12, 2024
To prevent a race condition seen in F1bonacc1#197:

1. On the client:
  a. User runs `process-compose down`
  b. Client makes request to `/project/stop`
2. On the server:
  a. `ProjectRunner.ShutDownProject()` stops all processes
  b. `cmd.runProject()` stops blocking
  c. Exits before `PcApi.ShutDownProject` has written the response
3. On the client:
  a. Gets `EOF` reading the response because the server has gone away

I can't think of a way to reproduce this in a test, except for something
convoluted like creating a custom client that introduces a delay in
reading the response.

It can be "manually" simulated by adding a small delay here though:

    diff --git a/src/api/pc_api.go b/src/api/pc_api.go
    index d123257..d069399 100644
    --- a/src/api/pc_api.go
    +++ b/src/api/pc_api.go
    @@ -3,6 +3,7 @@ package api
     import (
 	    "net/http"
 	    "strconv"
    +	"time"

 	    "github.com/f1bonacc1/process-compose/src/app"
 	    "github.com/gin-gonic/gin"
    @@ -268,6 +269,7 @@ func (api *PcApi) GetProcessPorts(c *gin.Context) {
     // @router /project/stop [post]
     func (api *PcApi) ShutDownProject(c *gin.Context) {
 	    api.project.ShutDownProject()
    +	time.Sleep(10 * time.Millisecond)
 	    c.JSON(http.StatusOK, gin.H{"status": "stopped"})
     }

And using this config:

    processes:
      one:
        command: sleep infinity
      two:
        command: sleep infinity

Before this change:

    bash-5.2$ go run src/main.go up --config services.yaml --tui=false &
    [1] 8171
    bash-5.2$ go run src/main.go down
    24-07-12 15:38:40.332 FTL failed to stop project error="Post \"http://localhost:8080/project/stop\": EOF"
    exit status 1
    [1]+  Done                    go run src/main.go up --config services.yaml --tui=false

After this change:

    bash-5.2$ go run src/main.go up --config services.yaml --tui=false &
    [1] 8432
    bash-5.2$ go run src/main.go down
F1bonacc1 pushed a commit that referenced this issue Jul 12, 2024
To prevent a race condition seen in #197:

1. On the client:
  a. User runs `process-compose down`
  b. Client makes request to `/project/stop`
2. On the server:
  a. `ProjectRunner.ShutDownProject()` stops all processes
  b. `cmd.runProject()` stops blocking
  c. Exits before `PcApi.ShutDownProject` has written the response
3. On the client:
  a. Gets `EOF` reading the response because the server has gone away

I can't think of a way to reproduce this in a test, except for something
convoluted like creating a custom client that introduces a delay in
reading the response.

It can be "manually" simulated by adding a small delay here though:

    diff --git a/src/api/pc_api.go b/src/api/pc_api.go
    index d123257..d069399 100644
    --- a/src/api/pc_api.go
    +++ b/src/api/pc_api.go
    @@ -3,6 +3,7 @@ package api
     import (
 	    "net/http"
 	    "strconv"
    +	"time"

 	    "github.com/f1bonacc1/process-compose/src/app"
 	    "github.com/gin-gonic/gin"
    @@ -268,6 +269,7 @@ func (api *PcApi) GetProcessPorts(c *gin.Context) {
     // @router /project/stop [post]
     func (api *PcApi) ShutDownProject(c *gin.Context) {
 	    api.project.ShutDownProject()
    +	time.Sleep(10 * time.Millisecond)
 	    c.JSON(http.StatusOK, gin.H{"status": "stopped"})
     }

And using this config:

    processes:
      one:
        command: sleep infinity
      two:
        command: sleep infinity

Before this change:

    bash-5.2$ go run src/main.go up --config services.yaml --tui=false &
    [1] 8171
    bash-5.2$ go run src/main.go down
    24-07-12 15:38:40.332 FTL failed to stop project error="Post \"http://localhost:8080/project/stop\": EOF"
    exit status 1
    [1]+  Done                    go run src/main.go up --config services.yaml --tui=false

After this change:

    bash-5.2$ go run src/main.go up --config services.yaml --tui=false &
    [1] 8432
    bash-5.2$ go run src/main.go down
@F1bonacc1
Copy link
Owner

Fixed in v1.9.0

@F1bonacc1 F1bonacc1 added bug Something isn't working done Done, awaiting release labels Jul 25, 2024
@zmitchell
Copy link
Author

zmitchell commented Aug 1, 2024

I'm not sure this actually fixed the problem. It appears that there was also a race condition on start:

  • process-compose up -f my-config.yaml -u /insufficient_perms.sock
  • which one wins?
    • Starting processes
    • Attempting to connect to that socket

If starting processes wins, then you spawn a process immediately before running into an error trying to create the socket, and the spawned process isn't cleaned up before shutting down.

@dcarley and I were seeing that issue before #201, and not after, but @dcarley also mentioned that we may still be seeing the original shutdown race condition. We'll wait to confirm until @dcarley has a reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Done, awaiting release
Projects
None yet
Development

No branches or pull requests

2 participants