Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs commands such as id fail when daemon has been started and stopped. #5784

Closed
cboddy opened this issue Nov 19, 2018 · 21 comments
Closed

ipfs commands such as id fail when daemon has been started and stopped. #5784

cboddy opened this issue Nov 19, 2018 · 21 comments
Labels
kind/bug A bug in existing code (including security flaws) topic/docs-ipfs Topic docs-ipfs

Comments

@cboddy
Copy link
Member

cboddy commented Nov 19, 2018

Version information:

Output From ipfs version --all

go-ipfs version: 0.4.18-
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

Type:

Bug

Description:

When running an ipfs command that is not ipfs daemon, the existence of the file $IPFS_PATH/api decides whether to use the HTTP API to complete the command or not (eg. ipfs id does this). This happens in fsrepo.go

The file $IPFS_PATH/api is created when the ipfs daemon command is started. the problem is that when the daemon process is shut down via a signal it does not remove $IPFS_PATH/api.

This leads to behaviour that after an ipfs daemon process has been started and terminated, other ipfs commands that can be fulfilled with or without the HTTP api (eg. ipfs id) will always fail with the message

Error: api not running

A minimal example that reproduces the failure is shown below:

#!/bin/bash
set -x
set -e

IPFS=/path/to/your/ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 5
    kill $!
    $IPFS id # will always fail
}

run_it

I don't think this is the intended behaviour (?).

If there isn't already a signal handler to remove $IPFS_PATH/api (via FSRepo.Close when the daemon is interrupted then could we add one for this?

Alternatively, if the command fails in this manner but it can be fulfilled without the ipfs HTTP api why not do it without HTTP and remove $IPFS_PATH/api at that point?

@cboddy cboddy changed the title ipfs commands such as id fail when daemon has been started and stopped. ipfs commands such as id fail when daemon has been started and stopped. Nov 19, 2018
cboddy added a commit to Peergos/Peergos that referenced this issue Nov 19, 2018
@eingenito
Copy link
Contributor

Thanks for the bug report, and the script to reproduce. My api directory does get cleaned out on macos but I see the same behavior you experienced on linux. That's definitely messed up.

@eingenito eingenito added the kind/bug A bug in existing code (including security flaws) label Nov 29, 2018
@kevina
Copy link
Contributor

kevina commented Nov 29, 2018

@cboddy

The ipfs daemon may take a little while to shutdown after you kill it with a signal, so if you try and use a command while it is shutting down you will get this error. Try adding a sleep after the kill and it should work. A better error message could be useful though.

@cboddy
Copy link
Member Author

cboddy commented Nov 29, 2018

@kevina thanks for the suggestion but the file $IPFS_PATH/api is never removed on linux after being shutdown via a signal so

  • no amount of sleeping will help
  • this is a problems since there is no "clean" shutdown command AFAIK.

A simple work-around in Peergos (where we manage an ipfs runtime programmatically) is to remove $IPFS_PATH/api when shutting down the daemon.

However, it is surprising behaviour that requires time to understand, hence the bug-report.

@schomatis
Copy link
Contributor

@cboddy Thanks so much for reporting this in such a clear and descriptive way. I'm personally assigning myself here because this was the error I encountered when trying IPFS for the very first time and drove me (out of my frustration as a newbie who wants it all and wants it now) pretty close to leaving the project altogether.

@schomatis schomatis self-assigned this Nov 29, 2018
@kevina
Copy link
Contributor

kevina commented Nov 29, 2018

go-ipfs version: 0.4.19-dev-af73c50
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

This works for me:

#!/bin/bash
set -x
set -e

IPFS=ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 5
    kill $!
    #$IPFS id # will fail
    sleep 10
    $IPFS id # ok
}

run_it

@eingenito
Copy link
Contributor

@cboddy; @kevina is right in my case. My api directory is consistently deleted, and ipfs id succeeds some brief time after the signal is sent. @schomatis are you able to reproduce the behavior in this bug?

@cboddy - if you run the daemon (in you script for instance) with the -D option and look at the last dozen lines of output do you see logging from the cleanup code like?

Received interrupt signal, shutting down...
10:57:25.463 DEBUG    bitswap: bitswap task worker shutting down... workers.go:88
(Hit ctrl-c again to force-shutdown the daemon.)
10:57:25.464 DEBUG    bitswap: bitswap task worker shutting down... workers.go:88
10:57:25.464 DEBUG    bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464 DEBUG    bitswap: bitswap task worker shutting down... workers.go:88
10:57:25.464 DEBUG    bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464  INFO core/serve: server at /ip4/127.0.0.1/tcp/8080 terminating... corehttp.go:107
10:57:25.464 DEBUG    bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464 DEBUG    bitswap: bitswap task worker shutting down... workers.go:85
10:57:25.464 DEBUG    bitswap: bitswap task worker shutting down... workers.go:88
10:57:25.464 WARNI     swarm2: swarm listener accept error: process closing swarm_listen.go:77
10:57:25.464  INFO core/serve: server at /ip4/127.0.0.1/tcp/5001 terminating... corehttp.go:107
10:57:25.464 DEBUG    bitswap: provideKeys channel closed workers.go:126
10:57:25.464  INFO core/serve: server at /ip4/127.0.0.1/tcp/8080 terminated corehttp.go:127
10:57:25.464 DEBUG       core: core is shutting down... core.go:632
10:57:25.464  INFO core/serve: server at /ip4/127.0.0.1/tcp/5001 terminated corehttp.go:127
10:57:25.464 DEBUG blockservi: blockservice is shutting down... blockservice.go:324
10:57:25.464 WARNI     swarm2: swarm listener accept error: accept tcp [::]:4001: use of closed network connection swarm_listen.go:77
10:57:25.465 WARNI     swarm2: swarm listener accept error: accept tcp 0.0.0.0:4001: use of closed network connection swarm_listen.go:77
10:57:25.507 DEBUG       mdns: mdns query complete mdns.go:142
10:57:25.507 DEBUG       mdns: mdns service halting mdns.go:148
10:57:25.654  INFO   cmd/ipfs: Gracefully shut down daemon daemon.go:351```

@schomatis
Copy link
Contributor

@schomatis are you able to reproduce the behavior in this bug?

Yes, if we really kill the process it won't cleanup the api (whatever that means). Granted, this can be fixed by running ipfs daemon again but the user has no way of knowing that and any other command will fail with that cryptic message. At the very least this is a documentation issue but we should find a more elegant way of handling kills/crashes like this one.

#!/bin/bash
set -x
set -e

IPFS=ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 10

    kill -SIGKILL $! # Die fast, don't wait on any cleanup.

    ! $IPFS id # Allow this command to bypass `-e` check.
    while [ $? -eq 0 ]; do
        sleep 3
        ! $IPFS id
    done # This never ends.
}

run_it

@schomatis schomatis added the topic/docs-ipfs Topic docs-ipfs label Nov 29, 2018
@eingenito
Copy link
Contributor

I think I’d say if SIGTERM (or like SIGINT) doesn’t clean the api directory that’s a bug and if SIGKILL doesn’t clean it then we could do better but it’s debatable. What’s the right thing? Should we treat api like a pidfile and exclusive lock it to indicate the presence of the daemon instead of create/delete?

@kevina kevina added kind/enhancement A net-new feature or improvement to an existing feature and removed kind/bug A bug in existing code (including security flaws) labels Nov 30, 2018
@kevina
Copy link
Contributor

kevina commented Nov 30, 2018

I removed the bug label and changed this it to enhancement. It is a bug if killing SIGTERM/SIGINT doesn't (eventually) remove the $IPFS_PATH/api file. When killed with SIGKILL there is no way to clean this up. We should be more intelligent about what to do afterwards. As previously mentioned if ipfs daemon is run again it will clean up the stale state, we should perhaps do that all for any command that requires the repo or at least return a better error message.

@eingenito
Copy link
Contributor

Actually @cboddy if you can confirm again that your api directory isn't removed on SIGTERM then it's a bug of some sort. We just can't reproduce it ourselves. I mean at least I can't. Did you check the debug logging output to see if cleanup operations are taking place?

@kevina
Copy link
Contributor

kevina commented Dec 3, 2018

This is basically a duplicate of #2395.

@schomatis schomatis added the need/author-input Needs input from the original author label Dec 4, 2018
@momack2 momack2 added kind/bug A bug in existing code (including security flaws) and removed kind/enhancement A net-new feature or improvement to an existing feature labels Dec 6, 2018
@momack2
Copy link
Contributor

momack2 commented Dec 6, 2018

I'm personally assigning myself here because this was the error I encountered when trying IPFS for the very first time and drove me (out of my frustration as a newbie who wants it all and wants it now) pretty close to leaving the project altogether.

If this issue (which is labeled as a bug in the dupe'd issue) is so painful that some of our dedicated community members almost dropped the project over it - imagine how many people ran into this issue and didn't have the perseverance/faith to push through that frustration to become a long term contributor. I think that is sufficient reason to prioritize resolving this issue (aka clean this file up consistently) or at least add much better error handling to enable easy debugging and not burn contributor time on this rough edge.

@Stebalien
Copy link
Member

So, the bug here isn't that the API file isn't getting cleaned up on KILL (@kevina's right, there's nothing we can do about that). The bug is that we're not cleaning it up later. This came up here as well:

ipfs/ipfs-desktop#722 (comment)

The tricky part is that having an API file without running a daemon is actually a feature: This is how users tell IPFS to use a remote IPFS instance.

However, while fixing this in ipfs-desktop is a bit tricky, it's actually pretty easy to fix here:

  1. Try connecting to the remote daemon.
  2. If that fails, just run the command locally (ignoring the API file).

Step 2 will fail if the local repo is either locked (a daemon is running) or the local repo hasn't been initialized (we're using a remote daemon).

The downside is that this prevents users from switching back and forth between a local and a remote daemon. I'm not sure if that's something we really need to support.

@schomatis
Copy link
Contributor

So, the bug here isn't that the API file isn't getting cleaned up on KILL (@kevina's right, there's nothing we can do about that). The bug is that we're not cleaning it up later.

Yes, the KILL was just an example, the system can also crash and we would be in the same scenario.

The tricky part is that having an API file without running a daemon is actually a feature: This is how users tell IPFS to use a remote IPFS instance.

That may be so, but the first steps any user will perform don't acknowlegde that, nor the importance of having a daemon running, nor the basic client-server architecture we implement here:

From my own personal interpretation of reading the docs, the idea we're normally selling here (and that's something we may need to revise) is that you can just run ipfs <any-command> without caring about the daemon; and we also have the feature to share your personal IPFS piece of land (repo) with the rest of the world by doing all of the commands you were already doing while the daemon is running.

  1. Try connecting to the remote daemon.
  2. If that fails, just run the command locally (ignoring the API file).

Step 2 will fail if the local repo is either locked (a daemon is running) or the local repo hasn't been initialized (we're using a remote daemon).

If we all agree on this I'll try to apply this fix.

@Stebalien
Copy link
Member

From my own personal interpretation of reading the docs

In all cases where we have a valid local repo, yes.

@magik6k
Copy link
Member

magik6k commented Dec 9, 2018

The downside is that this prevents users from switching back and forth between a local and a remote daemon. I'm not sure if that's something we really need to support.

We can instruct users to create a separate IPFS_PATH with api file referencing remote daemon end use that.

@magik6k
Copy link
Member

magik6k commented Dec 9, 2018

We should also prevent ipfs init from working if api file is preset

@cboddy
Copy link
Member Author

cboddy commented Dec 12, 2018

go-ipfs version: 0.4.19-dev-af73c50
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

This works for me:

#!/bin/bash
set -x
set -e

IPFS=ipfs
function run_it() {
    export IPFS_PATH=$(mktemp -d)
    $IPFS init
    $IPFS id # completes fine
    $IPFS daemon &
    sleep 5
    kill $!
    #$IPFS id # will fail
    sleep 10
    $IPFS id # ok
}

run_it

@kevina thanks for that; yes that works for me as well.

I'd also suggest pinging the remote-endpoint defined in the api file to ensure it is available (as first suggested by @Stebalien) since:

  • that should not be a large overhead for each command
  • it would eliminate this source of ambiguity/confusion.

@schomatis schomatis removed the need/author-input Needs input from the original author label Dec 12, 2018
@Stebalien
Copy link
Member

that should not be a large overhead for each command

It shouldn't really add any overhead. If the api file exists, we have to try the API anyways. Today, if that fails, we abort. Instead, we could just fallback on running the command locally.

@schomatis
Copy link
Contributor

Unfortunately, I don't understand enough about the go-ipfs-cmds.http.Client interface to code @Stebalien's proposal, that is, I understand we should try to connect to the daemon to check if it's running but I'm not even sure how to use the Send method, I don't think I can just send the original Request in commandShouldRunOnDaemon because that would result in the same command being executed twice, I suppose we should craft a special "ping" Request if that even exists, but I'm just speculating here.

Hopefully, ipfs/go-ipfs-cmds#138 should mitigate this issue providing enough information to the user to bypass this (blocking) problem.

@schomatis schomatis removed their assignment Dec 24, 2018
@Stebalien
Copy link
Member

@schomatis the only way you're going to better understand how all this works is by reading the code and trying to fix bugs like this.

lanzafame pushed a commit that referenced this issue Jul 30, 2019
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) topic/docs-ipfs Topic docs-ipfs
Projects
None yet
Development

No branches or pull requests

7 participants