Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Memory leak (dht?) #3469

Closed
v-stickykeys opened this issue Jan 4, 2021 · 7 comments
Closed

Memory leak (dht?) #3469

v-stickykeys opened this issue Jan 4, 2021 · 7 comments
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up status/blocked Unable to be worked further until needs are met status/ready Ready to be worked

Comments

@v-stickykeys
Copy link

v-stickykeys commented Jan 4, 2021

  • Version:
    "ipfs": "0.52.2",
    "ipfs-http-gateway": "0.1.3",
    "ipfs-http-server": "0.1.3",
  • Platform:

AWS ECS FARGATE docker container with base node:14.10.1

  • Subsystem:

ipfs-http-server

Severity:

High

Description:

We are running a dockerized ipfs API server in Node.js 14.10 on AWS ECS. We are seeing memory increase linearly until out capacity limit at which point the instance crashes, which is 75% of 8192 MiB total memory (CPU usage stays at around 30% of 4096 units). Overall usage of the node is minimal (we are probably making less than 100 requests daily in total).

So far this is observed to occur after about 5 days--the drop is where the instance restarts.
image

(Note the dates... will post a more recent image if we see this happen again)

Steps to reproduce the error:

We are running both a server and gateway like so

import IPFS from 'ipfs'
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
// @ts-ignore
import type { IPFSAPI as IpfsApi } from 'ipfs-core/dist/src/components'

import HttpApi from 'ipfs-http-server'
import HttpGateway from 'ipfs-http-gateway'

import dagJose from 'dag-jose'
// @ts-ignore
import multiformats from 'multiformats/basics'
// @ts-ignore
import legacy from 'multiformats/legacy'

import { createRepo } from 'datastore-s3'

const TCP_HOST = process.env.TCP_HOST || '0.0.0.0'

const IPFS_PATH = 'ipfs'
const IPFS_S3_REPO_ENABLED = true

const { AWS_BUCKET_NAME } = process.env
const { AWS_ACCESS_KEY_ID } = process.env
const { AWS_SECRET_ACCESS_KEY } = process.env
const { ANNOUNCE_ADDRESS_LIST } = process.env

const IPFS_SWARM_TCP_PORT = 4011
const IPFS_SWARM_WS_PORT = 4012

const IPFS_API_PORT = 5011
const IPFS_ENABLE_API = true

const IPFS_GATEWAY_PORT = 9011
const IPFS_ENABLE_GATEWAY = true

const IPFS_DHT_SERVER_MODE = true

const IPFS_ENABLE_PUBSUB = true
const IPFS_PUBSUB_TOPICS = []

export default class IPFSServer {

    /**
     * Start js-ipfs instance with dag-jose enabled
     */
    static async start(): Promise<void> {
        const repo = IPFS_S3_REPO_ENABLED ? createRepo({
            path: IPFS_PATH,
        }, {
            bucket: AWS_BUCKET_NAME,
            accessKeyId: AWS_ACCESS_KEY_ID,
            secretAccessKey: AWS_SECRET_ACCESS_KEY,
        }) : null

        // setup dag-jose codec
        multiformats.multicodec.add(dagJose)
        const format = legacy(multiformats, dagJose.name)

        const announceAddresses = ANNOUNCE_ADDRESS_LIST != null ? ANNOUNCE_ADDRESS_LIST.split(',') : []
        const ipfs: IpfsApi = await IPFS.create({
            repo,
            ipld: { formats: [format] },
            libp2p: {
                config: {
                    dht: {
                        enabled: true,
                        clientMode: !IPFS_DHT_SERVER_MODE,
                        randomWalk: false,
                    },
                    pubsub: {
                        enabled: IPFS_ENABLE_PUBSUB
                    },
                },
                addresses: {
                    announce: announceAddresses,
                }
            },
            config: {
                Addresses: {
                    Swarm: [
                        `/ip4/${TCP_HOST}/tcp/${IPFS_SWARM_TCP_PORT}`,
                        `/ip4/${TCP_HOST}/tcp/${IPFS_SWARM_WS_PORT}/ws`,
                    ],
                    ...IPFS_ENABLE_API && { API: `/ip4/${TCP_HOST}/tcp/${IPFS_API_PORT}` },
                    ...IPFS_ENABLE_GATEWAY && { Gateway: `/ip4/${TCP_HOST}/tcp/${IPFS_GATEWAY_PORT}` }
                    ,
                },
                API: {
                    HTTPHeaders: {
                        "Access-Control-Allow-Origin": [
                            "*"
                        ],
                        "Access-Control-Allow-Methods": [
                            "GET",
                            "POST"
                        ],
                        "Access-Control-Allow-Headers": [
                            "Authorization"
                        ],
                        "Access-Control-Expose-Headers": [
                            "Location"
                        ],
                        "Access-Control-Allow-Credentials": [
                            "true"
                        ]
                    }
                },
                Routing: {
                    Type: IPFS_DHT_SERVER_MODE ? 'dhtserver' : 'dhtclient',
                },
            },
        })

        if (IPFS_ENABLE_API) {
            await new HttpApi(ipfs).start()
            console.log('IPFS API server listening on ' + IPFS_API_PORT)
        }
        if (IPFS_ENABLE_GATEWAY) {
            await new HttpGateway(ipfs).start()
            console.log('IPFS Gateway server listening on ' + IPFS_GATEWAY_PORT)
        }

        IPFS_PUBSUB_TOPICS.forEach((topic: string) => {
            ipfs.pubsub.subscribe(topic)
        })
    }
}
@v-stickykeys v-stickykeys added the need/triage Needs initial labeling and prioritization label Jan 4, 2021
@v-stickykeys
Copy link
Author

again, the big dropoffs are where we restart the node Screen Shot 2021-01-13 at 11 56 08 AM

@hugomrdias hugomrdias self-assigned this Jan 14, 2021
@hugomrdias
Copy link
Member

hugomrdias commented Jan 14, 2021

Hello, @valmack can you tell me a little bit more about those 100ish connections you are making.

Also if you don't need the preload feature can you turn it off and report back if memory still grows ?

this.ipfs = await IPFS.create({
      repo,
      preload: {
         enabled: false
      }
    })

@hugomrdias hugomrdias added need/author-input Needs input from the original author and removed need/triage Needs initial labeling and prioritization labels Jan 21, 2021
@oed
Copy link
Contributor

oed commented Feb 9, 2021

Just an fyi here, we are not seeing this problem once we disabled the DHT.

@aschmahmann aschmahmann added kind/bug A bug in existing code (including security flaws) status/ready Ready to be worked labels Mar 8, 2021
@lidel lidel added need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up and removed need/author-input Needs input from the original author labels May 24, 2021
@lidel
Copy link
Member

lidel commented May 24, 2021

This is ready to be worked on, matter of prioritization & resourcing.
Related project proposal: protocol/web3-dev-team#30

@vasco-santos
Copy link
Member

We need to put in place a simulation to gather more information on what is leaking and make it easily reproducible. This is the type of simulation where https://github.com/testground/sdk-js would be extremely helpful.

Without an analysis, I would say this is likely related to leaked DHT Queries that were not aborted/stopped, together with logic bugs in the DHT Query logic. I already saw both problems in the wild as we lack abort support and sometimes a DHT Query will not go straight to the less distant options.

The solution, as part of protocol/web3-dev-team#30 is to probably re-write all the query logic from scratch

@lidel lidel changed the title Memory leak in http server Memory leak (dht?) Jun 7, 2021
@lidel lidel added the status/blocked Unable to be worked further until needs are met label Jun 7, 2021
@lidel
Copy link
Member

lidel commented Jun 7, 2021

Sounds like protocol/web3-dev-team#30 / libp2p/js-libp2p-kad-dht#183 needs to happen first (updating spec + overhaul codebase).

@tinytb
Copy link

tinytb commented Nov 22, 2022

2022-11-22: we think this is fixed, but feel free to let us know if not.

@tinytb tinytb closed this as completed Nov 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up status/blocked Unable to be worked further until needs are met status/ready Ready to be worked
Projects
No open projects
Development

No branches or pull requests

7 participants