Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detached mode to daemon #7352

Open
ibnesayeed opened this issue May 22, 2020 · 12 comments
Open

Add detached mode to daemon #7352

ibnesayeed opened this issue May 22, 2020 · 12 comments
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/feature A new feature P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked topic/daemon + init

Comments

@ibnesayeed
Copy link

ibnesayeed commented May 22, 2020

I propose -d/--detach flag to ipfs daemon command to automatically fork and detach the process after it is ready to serve. The parent process should take care of the initialization and any reporting on the STDOUT before detaching the child, if successful, and report any issues if unsuccessful. This has been discussed briefly in #5983.

This is often desired to wait for the daemon to be ready before executing any dependent processes. Current workaround is to run the process in background (i.e., using & suffix in *nix environment) and observe the process using an external utility, which is neither ideal nor robust enough, and requires adding boilerplate code in many applications to do the same thing.

For example, in IPWB we do the following:

    # Run the IPFS daemon in background, initialize configs if necessary
    ipfs daemon --init &

    # Wait for IPFS daemon to be ready
    while ! curl -s localhost:5001 > /dev/null
    do
        sleep 1
    done

This mostly works, but had some quirks:

  • It relies on an external tool, in this case, curl
  • It hard-codes the port number (making it configurable/dynamic will make the script more complex)
  • It introduces a quantized duration of sleep (i.e., multiples of one second in this case), rather than moving on immediately, making the value smaller means probing it with the external tool more often
  • The biggest flaw of this process is that it expects that the daemon will eventually be ready, which, if it fails, the loop will keep probing forever

This mostly works for us because we are running this whole process in an isolated container where external factors causing failure or collisions are minimal, but others may not have the same luxury.

@Stebalien suggests a workaround in #5983:

So... the most correct way to do this is to wait for the string "Daemon is ready" on stdout.

However, this approach has some drawbacks. Relying on such string matching is a dangerous proposition as it is brittle. For example, if in a future release, IPFS daemon decides that the phrase can be improved, which is not an unlikely scenario, then application relying on this will start to break. Unixy way of checking status of a process is status codes, not returned messages.

Here is another situation that I encountered recently when creating the IPFS Setup Action to be used in GitHub Workflows CI:

        await exec.exec('ipfs', ['init'], opts)

        if (runDaemon) {
            const daemon = cp.spawn('ipfs', ['daemon'], {detached: true, stdio: 'ignore'})
            daemon.unref()
            let attemptsLeft = MAXATTEMPTS
            while (--attemptsLeft) {
                try {
                    await exec.exec('curl', ['-s', '-X', 'POST', IPFSAPI])
                    break
                } catch (error) {
                    await sleep(1000)
                }
            }
            if (!attemptsLeft) {
                throw new Error('IPFS API service unreachable')
            }
        }

This time, I did not want to wait forever, so I added a hard limit on how many maximum tries it would make to probe the IPFS daemon status. However, when spawning the process in NodeJS, I had to set stdio to ignore, otherwise the parent process will continue to wait even after the child is detached. As a side-effect, now I cannot read STDOUT of the daemon, which, if I could, I would have set some output variables (such as reported connection information and URIs) that could be useful in the following steps of the workflow pipeline.

@Stebalien suggested --wait or --fork flags for this, which are equally as appropriate, but I have seen -d/--detach to be more commonly used for this purpose (e.g., in docker container run).

@ibnesayeed ibnesayeed added kind/feature A new feature need/triage Needs initial labeling and prioritization labels May 22, 2020
@Stebalien Stebalien added exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue P2 Medium: Good to have, but can wait until someone steps up topic/daemon + init and removed need/triage Needs initial labeling and prioritization labels May 22, 2020
@Stebalien
Copy link
Member

Thank you for the well thought out proposal. I completely agree this would be an awesome feature.

Go, unfortunately, makes this a bit difficult because forking isn't possible to do safely. docker container can do this because it's actually just firing off an HTTP request. To do this in ipfs, we'd need to:

  1. Determine the correct location of the daemon (error prone but doable).
  2. Re-exec the daemon.
  3. Poll the API until ready (or grep stdout?).

@ibnesayeed
Copy link
Author

I would confess that I do not have enough experience around system calls in Go. After your comment I looked at the literature around this issue and found that the situation is not very encouraging. There were proposals for this in the language itself, but people seemed to be reluctant in baking it in the language due to issues when it comes to multi-threading. I have seen an external package for this, but that one does not seem to be cross-platform. That said, allow me to think out loud below:

  1. Determine the correct location of the daemon (error prone but doable).

Can't we rely on os.Args[0] for this?

  1. Re-exec the daemon.

Alternatively, a companion wrapper binary (e.g., ipfs-daemon-runner) can be shipped for this particular use-case, but this will be no better than having an execution branch in the main binary to handle detached mode. While daemonizing self is tricky, doing so on another instance of self should be alright, especially, if it is done before acquiring any resources (such as, a port) that may cause a conflict in the subsequent call.

  1. Poll the API until ready (or grep stdout?).

I think, inspecting STDOUT would be good enough in this case, as it will be a behavior local to the process, which can adapt to any changes in the string (this is not a luxury an external tool will have).

@Stebalien
Copy link
Member

Can't we rely on os.Args[0] for this?

You can look at os.Args[0] and the PATH to try to guess which binary was invoked, but it's not foolproof. On linux, it's best to look at /proc/self/exe.

Take a look at https://godoc.org/github.com/docker/docker/pkg/reexec.

Alternatively, a companion wrapper binary (e.g., ipfs-daemon-runner) can be shipped for this particular use-case, but this will be no better than having an execution branch in the main binary to handle detached mode.

Yeah, building this into the main ipfs binary is probably simpler.

I think, inspecting STDOUT would be good enough in this case, as it will be a behavior local to the process, which can adapt to any changes in the string (this is not a luxury an external tool will have).

👍

@RubenKelevra
Copy link
Contributor

You could run ipfs as a user service with systemd. Since ipfs got notify-support with 0.5.0, the systemctl --user start ipfs.service call will return when the daemon is successfully started (or failed to start).

@ibnesayeed
Copy link
Author

@RubenKelevra thanks for this info. However, integrating services with an init system is not always an option or preference. For example, when running in a container, setting systemd up inside will be a big hammer and a repeated overhead.

@RubenKelevra
Copy link
Contributor

@ibnesayeed There's no need to setup a container. You can setup ipfs as a system daemon, this way systemd offers a lot of additional hardening options to capsulate the process from the rest of the system.

Take a look at the hardened service file.

@ibnesayeed
Copy link
Author

@RubenKelevra containers are not always used as needs, they have their own advantages to offer in complex deployment where IPFS is just one of many components. If I have a cluster of nodes where I deploy all my services like web servers, reverse proxies, databases, caches, and whatnot using containers, I would perhaps have hard time setting up IPFS daemon on the host machine directly outside of the overlay network of containers. While IPFS may run fabulously when setup on the host with a powerful init system in place, saying that there is no need to run it in containers is perhaps a little too limiting. Testing is another use-case where running instances in throw away containers is wonderful which ensures that no state information is shared across consecutive runs, unless explicitly done so.

@RubenKelevra
Copy link
Contributor

Sure, containers are nice. I was just referring to the capsulation a container do, to protect the rest of the system. There's no need to use a container and setup systemd inside of it, to get these advantages.

Which container solution is affected by the inability to detach, since docker seems to be fine if I understand that correctly?

@ibnesayeed
Copy link
Author

@RubenKelevra: Which container solution is affected by the inability to detach...

This is a feature request, not a bug report. I did not say that running IPFS in containers is causing any issues as such. If you read the first post above, I have described the need or advantage of having this ability with the help of two real examples where I wished I could detach it. It is possible that what I was trying to achieve in the two examples noted above could have been done more elegantly, but offloading this ability to the daemon itself would save everyone from adding some boilerplate code in many applications in different languages for the same objective.

@RubenKelevra
Copy link
Contributor

I see.

If the detaching isn't easy to achieve with Go, we might want to avoid doing all kinds of boiler plating by implementing a command which just returns when the API is reachable, while being able to specify a timeout for this.

So maybe something like this would be easier and less error prone to implement:

ipfs daemon &
ipfs check-api --timeout "2m"

@Stebalien
Copy link
Member

We should have a --detach flag. It's not particularly hard and has been requested several times at this point.

@ibnesayeed
Copy link
Author

@RubenKelevra what you suggested is going to solve the repetition of the boilerplate code, but the situation I described in the second example where I had to give up on the STDIO access of the subprocess from NodeJS will continue to exist. In addition to that, the check-api sub-command needs to accept more arguments such as port number and host, if those were customized in the original daemon command. Moreover, someone has to decide a timeout, which will be yet another arbitrary number and we will have no cleanup ability after timeout occurs. On the other hand, if this ability is built in the daemon itself, then a timeout configuration may ask the daemon to kill itself if it fails to boot up within that limit.

That said, what I understood from the earlier conversation between @Stebalien and me and from the literature online, it is doable here and not too difficult. Situations where forking and detaching could be an issue in Golang may not apply to IPFS daemon and platforms where locating the binary in question might be unreliable are not the primary ones where we expect many IPFS servers to be running. If we do come across a situation where it is unreliable, we may fix it by improving the documentation and ask the user to use absolute path of the binary instead.

@aschmahmann aschmahmann added the status/ready Ready to be worked label Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue kind/feature A new feature P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked topic/daemon + init
Projects
None yet
Development

No branches or pull requests

4 participants