ERR_ENCRYPTION_FAILED when reconnecting to previously known peer #3702

ukstv · 2021-05-27T13:06:58Z

With DEBUG=libp2p:* one could see an interesting (and alarming) messages when a node connects to another node:

libp2p:upgrader:err Failed to upgrade outbound connection Error: Error occurred during XX handshake: The operation was aborted // on connection initiator side
libp2p:websockets:listener:error inbound connection failed to upgrade Error: Error occurred during XX handshake: Value is null // on connection receiver side

Repo to reproduce: https://github.com/ukstv/ipfs-encryption-repro

Sometimes this error on reconnection persists: one could see repeated messages about XX handshake in a log, but this one I can not reproduce on request.

This worries me, as it might be related to instability of IPFS-to-IPFS connection. Sometimes messages get lost, or one could not retrieve a record, if it exists on another seemingly connected node.

The text was updated successfully, but these errors were encountered:

ukstv · 2021-05-27T13:48:08Z

One surprising thing here. When you restart node:a and node:b they do not try to connect each other. The expectation was that peer list survives a node restart.

ukstv · 2021-05-27T14:54:15Z

Might be related: when node:a connects to node:b (single port and WebSocket provided in multiaddr), there are two TCP connections from node:a or node:b.

yusefnapora · 2021-06-07T18:39:52Z

@ukstv Thanks for reporting this and making the repo to reproduce the issue. Unfortunately, it "works on my machine," and I can't get it to reproduce the bug 😞

Here's what I see in the log for "node b":

  libp2p:websockets:listener new inbound connection /ip4/127.0.0.1/tcp/58473 +0ms
  libp2p:upgrader Starting the inbound connection upgrade +0ms
  libp2p:upgrader handling inbound crypto protocol selection [ '/noise' ] +0ms
  libp2p:upgrader encrypting inbound connection... +4ms
  libp2p:noise Stage 0 - Responder waiting to receive first message... +0ms
  libp2p:noise Stage 0 - Responder received first message. +14ms
  libp2p:noise Stage 1 - Responder sending out first message with signed payload and static key. +0ms
  libp2p:noise Stage 1 - Responder sent the second handshake message with signed payload. +25ms
  libp2p:noise Stage 2 - Responder waiting for third handshake message... +0ms
  libp2p:noise Stage 2 - Responder received the message, finished handshake. +54ms
  libp2p:upgrader inbound handling muxers [ '/mplex/6.7.0' ] +105ms
  libp2p:upgrader Successfully upgraded inbound connection +3ms

Do you always see the errors, or is it an intermittent thing?

yusefnapora · 2021-06-07T18:41:29Z

Also, if you could give some info about your environment (operating system, architecture, nodejs version, etc), that would be great. I have some other machines lying around I could test on, just in case the issue only shows up on some platforms.

ukstv · 2021-06-10T14:54:09Z

That's unfortunate @yusefnapora . Actually, I tried to create a more reproducible case in Vagrant, and Docker environments, but that magically works with no errors reported :((

I can still reproduce the issue on my machine. It is Darwin feather 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64. I can reproduce the issue on both (Node v16, npm v7.13.0) and (Node v14, npm v6.14.13).

Logs I have, which you might find useful: https://gist.github.com/ukstv/ac830536e678a38875b4b21f91f4705c

stbrody · 2021-06-10T20:01:47Z

The logs from @ukstv's reproduction look similar, but slightly different, than the logs we're seeing on the nodes running in our production environment. Those logs look like:

2021-06-10T15:55:20.904-04:00	2021-06-10T19:55:20.904Z libp2p:websockets:error connection error: WebSocket was closed before the connection was established
2021-06-10T15:55:20.904-04:00	2021-06-10T19:55:20.904Z libp2p:websockets:error connection error: connect ECONNREFUSED 127.0.0.1:4003
2021-06-10T15:55:32.932-04:00	2021-06-10T19:55:32.932Z libp2p:websockets:listener:error inbound connection failed to upgrade Error: Error occurred during XX handshake: Value is null
2021-06-10T15:55:32.932-04:00	at Noise.performXXHandshake (/js-ceramic/node_modules/libp2p-noise/dist/src/noise.js:166:19)
2021-06-10T15:55:32.932-04:00	at async Noise.performHandshake (/js-ceramic/node_modules/libp2p-noise/dist/src/noise.js:135:20)
2021-06-10T15:55:32.932-04:00	at async Noise.secureInbound (/js-ceramic/node_modules/libp2p-noise/dist/src/noise.js:87:27)
2021-06-10T15:55:32.932-04:00	at async Upgrader._encryptInbound (/js-ceramic/node_modules/libp2p/src/upgrader.js:371:12)
2021-06-10T15:55:32.932-04:00	at async Upgrader.upgradeInbound (/js-ceramic/node_modules/libp2p/src/upgrader.js:101:11)
2021-06-10T15:55:32.932-04:00	at async EventEmitter.<anonymous> (/js-ceramic/node_modules/libp2p-websockets/src/listener.js:22:14) {
2021-06-10T15:55:32.932-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-10T15:55:32.932-04:00	}
2021-06-10T15:55:40.908-04:00	2021-06-10T19:55:40.908Z libp2p:websockets:error connection error: WebSocket was closed before the connection was established
2021-06-10T15:55:40.909-04:00	2021-06-10T19:55:40.908Z libp2p:tcp:listener:error inbound connection failed Error: missing newline
2021-06-10T15:55:40.909-04:00	at Object.exports.read (/js-ceramic/node_modules/multistream-select/src/multistream.js:34:19)
2021-06-10T15:55:40.909-04:00	at runMicrotasks (<anonymous>)
2021-06-10T15:55:40.909-04:00	at processTicksAndRejections (internal/process/task_queues.js:95:5)
2021-06-10T15:55:40.909-04:00	at async module.exports (/js-ceramic/node_modules/multistream-select/src/handle.js:14:23)
2021-06-10T15:55:40.909-04:00	at async Upgrader._encryptInbound (/js-ceramic/node_modules/libp2p/src/upgrader.js:362:36)
2021-06-10T15:55:40.909-04:00	at async Upgrader.upgradeInbound (/js-ceramic/node_modules/libp2p/src/upgrader.js:101:11)
2021-06-10T15:55:40.909-04:00	at async Server.<anonymous> (/js-ceramic/node_modules/libp2p-tcp/src/listener.js:42:14) {
2021-06-10T15:55:40.909-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-10T15:55:40.909-04:00	}
2021-06-10T15:55:40.909-04:00	2021-06-10T19:55:40.909Z libp2p:tcp:listener:error an error occurred closing the connection Error [ERR_STREAM_ALREADY_FINISHED]: Cannot call end after a stream was finished
2021-06-10T15:55:40.909-04:00	at Socket.Writable.end (internal/streams/writable.js:585:11)
2021-06-10T15:55:40.909-04:00	at Socket.end (net.js:591:31)
2021-06-10T15:55:40.909-04:00	at /js-ceramic/node_modules/libp2p-tcp/src/socket-to-conn.js:85:16
2021-06-10T15:55:40.909-04:00	at new Promise (<anonymous>)
2021-06-10T15:55:40.909-04:00	at Object.close (/js-ceramic/node_modules/libp2p-tcp/src/socket-to-conn.js:62:14)
2021-06-10T15:55:40.909-04:00	at attemptClose (/js-ceramic/node_modules/libp2p-tcp/src/listener.js:24:28)
2021-06-10T15:55:40.909-04:00	at Server.<anonymous> (/js-ceramic/node_modules/libp2p-tcp/src/listener.js:45:14) {
2021-06-10T15:55:40.909-04:00	code: 'ERR_STREAM_ALREADY_FINISHED'
2021-06-10T15:55:40.909-04:00	}
2021-06-10T15:55:40.909-04:00	2021-06-10T19:55:40.909Z libp2p:websockets:error connection error: socket hang up
2021-06-10T15:55:50.910-04:00

As far as we can tell these nodes are still behaving fine, but those logs are definitely worrisome. It's also weird because this issue shows up on 2 of the nodes in our infra, but not on any of our other nodes, which are all configured it the same was as far as we can tell.

achingbrain · 2021-06-14T17:03:38Z

@vasco-santos have you got any thoughts on this?

vasco-santos · 2021-06-14T19:54:14Z

On a peer restart, it will try to dial previously known peers stored in the PeerStore (having a new setup will cause to have an empty peerStore).
This can be related to both peers trying to dial at the same time, which results in a connection from one side and the other on going one is aborted. Can you confirm the connection exists after these logs?

stbrody · 2021-06-14T20:02:09Z

These logs are happening extremely consistently on an ongoing basis, not just at node startup. Seems to be happening about 2-4 times a minute. Here are some timestamps from the logs filtered to log lines containing the error message "ERR_ENCRYPTION_FAILED"

On node 1:

2021-06-14T15:29:35.847-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:37.849-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:54.993-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:57.828-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:30:05.803-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:30:11.283-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:30:54.495-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:15.867-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:17.869-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:37.849-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:45.825-04:00	code: 'ERR_ENCRYPTION_FAILED'

Same time frame on node 2:

2021-06-14T15:29:21.211-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:27.328-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:29.331-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:34.936-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:49.310-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:29:57.282-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:30:47.291-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:17.352-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:19.355-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:39.328-04:00	code: 'ERR_ENCRYPTION_FAILED'
2021-06-14T15:31:47.302-04:00	code: 'ERR_ENCRYPTION_FAILED'

vasco-santos · 2021-06-22T08:13:01Z

Hey @stbrody

I got some time to look into your test case. Thanks for providing it. I have a couple of questions.

What happens when you run run-a.js? Does it work?
Where do you get Here note IPFS peer id?

I think that by mistake I got exactly your issue. I got the wrong PeerId. So once I started both nodes, nodeB prints:

generating 2048-bit (rsa only) RSA keypair...
  libp2p:connection-manager options: {"maxConnections":500,"minConnections":200,"maxData":null,"maxSentData":null,"maxReceivedData":null,"maxEventLoopDelay":null,"pollInterval":2000,"autoDialInterval":10000,"movingAverageInterval":60000,"defaultPeerValue":1,"autoDial":true} +0ms
  libp2p:transports adding TCP +0ms
  libp2p:transports adding WebSockets +0ms
  libp2p:transports adding Circuit +0ms
  libp2p:peer-store:proto-book added provided protocols for QmcrGvLkwpfM8fctWwCc7hv3sexSLWet4JHq46ba8NiDf3 +0ms
  libp2p:peer-store:proto-book added provided protocols for QmcrGvLkwpfM8fctWwCc7hv3sexSLWet4JHq46ba8NiDf3 +1ms
  libp2p:peer-store:proto-book added provided protocols for QmcrGvLkwpfM8fctWwCc7hv3sexSLWet4JHq46ba8NiDf3 +4ms
to get started, enter:

	jsipfs cat /ipfs/QmfGBRT6BbWJd7yUc2uYdaUZJBbnEFvTqehPFoSMQ6wgdr/readme

  libp2p:connection-manager options:

I got QmfGBRT6BbWJd7yUc2uYdaUZJBbnEFvTqehPFoSMQ6wgdr as the PeerId and it is not the PeerId of this peer, but the CID of a folder js-ipfs adds:

js-ipfs/packages/ipfs-core/src/runtime/init-assets-nodejs.js

Line 25 in 6870873

print(`\tjsipfs cat /ipfs/${dir.cid}/readme\n`)

The correct peerId is QmcrGvLkwpfM8fctWwCc7hv3sexSLWet4JHq46ba8NiDf3.

So, going ahead with the incorrect peerId and running run-a.js. It fails with:

let error = new HTTP.HTTPError(response)
              ^

HTTPError:
    Error: Error occurred during XX handshake: Error occurred while verifying signed payload: Peer ID doesn't match libp2p public key.

and nodes will still try to connect and fail. NodeB will keep trying and getting:

let error = new HTTP.HTTPError(response)
              ^

HTTPError:
    Error: Error occurred during XX handshake: Error occurred while verifying signed payload: Peer ID doesn't match libp2p public key.

while the output of nodeA when receives this connect is

libp2p:upgrader:err Failed to upgrade outbound connection Error: Error occurred during XX handshake: Error occurred while verifying signed payload: Peer ID doesn't match libp2p public key.

Which is expected, as we are trying to connect a peer with a wrong peerId. So, it will be reachable via the multiaddr but then when they do the encryption, it will fail because peerA understands that peerB is not who it thought so.

Regarding the continuous attempts, when obviously this peer should be discarded, we still need to make these improvements in js-libp2p as part of libp2p/js-libp2p#744 and libp2p/js-libp2p#593

Let me know if this is the problem, as if I use the correct PeerId everything works as expected. And please, remove the repos before starting as they will be polluted with this peer in the PeerStore.

stbrody · 2021-06-22T20:23:37Z

@ukstv @v-stickykeys: see above. Could this be what's happening on our ELP nodes?

ukstv · 2021-06-23T07:41:41Z

IPFS node b reports its peer id on console. So Here note IPFS peer id means grabbing that reported peer and pasting it to run-a.js.

It might be the case with wrong peer ids. I guess, if what we see is straight "invalid peer id" situation, the swarm.connect would throw an error. We do not see this though. We see a connection being established all right (CIDs are exchanged, pubsub works), and this XX handshake error at the same time.

Maybe somehow IPFS and/or libp2p try to reuse peer id from the past? Maybe it is not a peer id, but some other data structure related to peer that is reused while establishing connection? And, this happens while valid peer id is present and the connection overall is established. Like, I do not know, there is an orphan data structure, maybe?

vasco-santos · 2021-06-23T07:50:07Z

Maybe somehow IPFS and/or libp2p try to reuse peer id from the past?

That seems a high possibility, and is why I mention removing the repo. When you dial other peers, the addr is stored in the addressBook and on startup, the node tries to dial the peers previously stored until have the minimum number of peers connected.

If removing the repo is not a possibility for you at the moment, you can add logs for the peerId that is trying to dial and see if it is one of your nodes. If not, you can programatically remove it with https://github.com/libp2p/js-libp2p/blob/master/doc/API.md#peerstoredelete

ukstv · 2021-06-23T08:55:44Z

Something is bothering me about the hypothesis. Two of Ceramic hosted IPFS nodes just wait for an external connection. I run an IPFS node on my machine, and command it to connect to one of the hosted node. Then I see the same XX handshake error on my node's log. I guess, the relationship between stale peer and XX handshake error is not that straightforward. Anyway, we'll try removing peers.

CC @stbrody

stbrody · 2021-06-23T18:10:58Z

@v-stickykeys @zachferland - I didn't quite follow the suggestion of removing the peers from the repo - does this make sense to you? Is this something we could try on our ELP nodes to see if it fixes the issue?

vasco-santos · 2021-06-24T07:16:01Z

I didn't quite follow the suggestion of removing the peers from the repo

If you have problematic addresses in your node's PeerStore (addresses with wrong peerIds), this error will happen as we do not have multiaddr confidence in place yet that would make libp2p intelligent to get rid of them. This is because when you start a node, it will try to dial over time the peers it previously knew (peerStore) in order to get a bigger connection pool.

To be clear, when you try to dial address/peerId, it will be stored in the PeerStore, even if the dial fails

The question I still have in your test case is that I did what is expected #3702 (comment) and you were dialling the incorrect PeerId there.

v-stickykeys · 2021-06-24T15:10:54Z

Could this be what's happening on our ELP nodes?

We are definitely connecting to the correct peerids but there is a chance that the peerstore is polluted. Will look into it now and report back

v-stickykeys · 2021-12-28T20:59:32Z

I just looked into this again and my ability to repro supports what you are saying @vasco-santos

There's two issues that remain:

To be clear, when you try to dial address/peerId, it will be stored in the PeerStore, even if the dial fails

Must have: There seems to be no way to remove the problem addresses from the PeerStore. I see https://docs.ipfs.io/reference/http/api/#api-v0-swarm-peering-rm but it is not supported in js-ipfs.
Should have: The encryption error does not tell us which addresses are having the problem which made it hard for us to understand what was going on.

tinytb · 2023-01-03T16:15:49Z

If you use the PeerStore directly in libp2p, you can delete the addresses
We shouldn't add addresses that fail

SgtPooki · 2023-05-26T20:27:38Z

js-ipfs is being deprecated in favor of Helia. You can #4336 and read the migration guide.

Please feel to reopen with any comments by 2023-06-02. We will do a final pass on reopened issues afterward (see #4336).

Assigning to @achingbrain to answer whether this issue is already resolved in Helia, or if this issue needs to be migrated to that repo!

achingbrain · 2023-05-30T16:22:08Z

This appears to have been fixed - at least, I run the repro example and the nodes reconnect to each other.

If you're still seeing problems, please open an issue against https://github.com/ChainSafe/js-libp2p-noise as that is likely where any fix will land.

ukstv added the need/triage Needs initial labeling and prioritization label May 27, 2021

ukstv mentioned this issue May 27, 2021

Investigate ipfs encryption errors 3box/3box#1404

Closed

lidel added need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up and removed need/triage Needs initial labeling and prioritization labels May 31, 2021

BigLep assigned vasco-santos Jun 18, 2021

ukstv mentioned this issue Oct 13, 2021

Fix IPFS encryption errors on ELP nodes ceramicnetwork/js-ceramic#1473

Closed

stallingerl mentioned this issue Mar 7, 2022

Messages zwischen Peers über PubSub werden nicht empfangen webanizer/BlockPro#13

Closed

tinytb moved this to Weekly Candidates in IP JS (PL EngRes) v2 Oct 14, 2022

tinytb added this to IP JS (PL EngRes) v2 Oct 14, 2022

BigLep added this to js-ipfs deprecation May 17, 2023

SgtPooki self-assigned this May 17, 2023

SgtPooki moved this to 🥞 Todo in js-ipfs deprecation May 17, 2023

BigLep unassigned vasco-santos May 17, 2023

SgtPooki added the kind/maybe-in-helia label May 17, 2023

SgtPooki moved this from 🥞 Todo to 🛑 Blocked in js-ipfs deprecation May 17, 2023

SgtPooki assigned achingbrain and unassigned SgtPooki May 23, 2023

achingbrain closed this as completed May 30, 2023

github-project-automation bot moved this from Next to Done in IP JS (PL EngRes) v2 May 30, 2023

github-project-automation bot moved this from 🛑 Blocked to ✅ Done in js-ipfs deprecation May 30, 2023

holmesworcester mentioned this issue Nov 1, 2023

Increase connection-level security TryQuiet/quiet#1897

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR_ENCRYPTION_FAILED when reconnecting to previously known peer #3702

ERR_ENCRYPTION_FAILED when reconnecting to previously known peer #3702

ukstv commented May 27, 2021

ukstv commented May 27, 2021 •

edited

Loading

ukstv commented May 27, 2021

yusefnapora commented Jun 7, 2021

yusefnapora commented Jun 7, 2021

ukstv commented Jun 10, 2021

stbrody commented Jun 10, 2021

achingbrain commented Jun 14, 2021

vasco-santos commented Jun 14, 2021

stbrody commented Jun 14, 2021 •

edited

Loading

vasco-santos commented Jun 22, 2021

stbrody commented Jun 22, 2021

ukstv commented Jun 23, 2021

vasco-santos commented Jun 23, 2021

ukstv commented Jun 23, 2021

stbrody commented Jun 23, 2021

vasco-santos commented Jun 24, 2021 •

edited

Loading

v-stickykeys commented Jun 24, 2021

v-stickykeys commented Dec 28, 2021 •

edited

Loading

tinytb commented Jan 3, 2023

SgtPooki commented May 26, 2023

achingbrain commented May 30, 2023

ERR_ENCRYPTION_FAILED when reconnecting to previously known peer #3702

ERR_ENCRYPTION_FAILED when reconnecting to previously known peer #3702

Comments

ukstv commented May 27, 2021

ukstv commented May 27, 2021 • edited Loading

ukstv commented May 27, 2021

yusefnapora commented Jun 7, 2021

yusefnapora commented Jun 7, 2021

ukstv commented Jun 10, 2021

stbrody commented Jun 10, 2021

achingbrain commented Jun 14, 2021

vasco-santos commented Jun 14, 2021

stbrody commented Jun 14, 2021 • edited Loading

vasco-santos commented Jun 22, 2021

stbrody commented Jun 22, 2021

ukstv commented Jun 23, 2021

vasco-santos commented Jun 23, 2021

ukstv commented Jun 23, 2021

stbrody commented Jun 23, 2021

vasco-santos commented Jun 24, 2021 • edited Loading

v-stickykeys commented Jun 24, 2021

v-stickykeys commented Dec 28, 2021 • edited Loading

tinytb commented Jan 3, 2023

SgtPooki commented May 26, 2023

achingbrain commented May 30, 2023

ukstv commented May 27, 2021 •

edited

Loading

stbrody commented Jun 14, 2021 •

edited

Loading

vasco-santos commented Jun 24, 2021 •

edited

Loading

v-stickykeys commented Dec 28, 2021 •

edited

Loading