Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Constraints + Limits #1482

Closed
1 of 11 tasks
jbenet opened this issue Jul 15, 2015 · 58 comments · Fixed by #8680
Closed
1 of 11 tasks

Resource Constraints + Limits #1482

jbenet opened this issue Jul 15, 2015 · 58 comments · Fixed by #8680
Labels
need/community-input Needs input from the wider community

Comments

@jbenet
Copy link
Member

jbenet commented Jul 15, 2015

We need a number of configurable resource limits. This issue will serve as a meta-issue to track them all and discuss a consistent way to configure/handle them.

I'm going to use a notation like thingA.subthingB.subthingC. we dont have to keep this at all, just helps us bind scoped names to things. (using . instead of / as the . could reflect json hierarchy in the config, but it may not have to (e.g. repo.storage_max and repo.datastore.storage_gc_watermark could be in config as Repo.StorageMax and Repo.StorageGC, or something.).

Possible Limits

This is a list of possible limits. I don't think we need all of them, as other tools could limit this more, particularly in server scenarios. but please keep in mind that some users/use cases of ipfs demand that we have some limits in place ourselves, as many end users cannot be expected to even know what a Terminal is (e.g. if they run ipfs as an elecron-app or as a browser extension).

  • node.repo.storage_max: this affects the physical storage that a repo takes up. this must include all the storage, datastore + config file size (ok to pre-allocate more if neeeded), so that people can set a maximum. (MUST be user configurable) Repo Size Constraints #972
    • node.repo.datastore.storage_max: hard limit on datastore storage size. could be computed as repo.storage_max - configsize where configsize could be live, or could be a reasonable bound. Repo Size Constraints #972
    • node.repo.datastore.storage_gc_watermark: soft limit on datastore storage size. after passing this threshold, automatically run gc. could be computed as node.repo.datastore.storage_max - 1MB or something. Repo Size Constraints #972
  • node.network_bandwidth_max: limit on network bandwidth used.
    • node.gateway.bandwidth_max: limit on bandwidth allocated to running the gateway. this could be calculated from node.network_bandwidth_max - all other bandwidth use. gateway limitations #1070
    • node.swarm.bandwidth_max: limit on network bandiwdth allocated to running the ipfs protocol. this could be calculated from node.network_bandwidth_max - all other bandwidth use.
    • node.dht.bandwidth_max: limit on network bandwidth allocated to running the dht protocol. this could be calculated from node.network_bandwidth_max - all other bandwidth use.
    • node.bitswap.bandwidth_max: limit on network bandwidth allocated to running the bitswap protocol. this could be calculated from node.network_bandwidth_max - all other bandwidth use.
  • node.swarm.connections: soft limit on ipfs protocol network connections to make. the reason for this limit is that there is overhead to every connections kept alive. the node could try to stay within this limit.
  • node.gateway.ratelimit: a number of requests per second. with this limit, the user could reduce the accept load on the gateway. gateway limitations #1070
  • node.memlimit: a limit on the memory allocated to ipfs. could try to use smaller buffers if under different constraints. this is hard to do, prob wont be used end-user-side, and likely easier to do with tools around it sysadmin-side (docker, etc).

note on config: the above keys need not be the config keys, but we should figure out some keys that make sense hierarchically.

What other things are we interested in limiting?

@jbenet
Copy link
Member Author

jbenet commented Jul 15, 2015

The most pressing are:

  • node.repo.storage_max
  • node.network_bandwidth_max

@jbenet
Copy link
Member Author

jbenet commented Jul 15, 2015

@rht would this be an issue you could work on? it's needed sooner than later. particularly node.repo.storage_max (+ running GC if we get close to it) and node.network_bandwidth_max.

@whyrusleeping your help will be needed no matter who implements this.

@whyrusleeping
Copy link
Member

@jbenet yeap. My concern is that before we even think about configurable limits and such, we need to determine how the system behaves when you are out of a certain resource, whether thats open connections, disk space, or memory. Once we determine how a limit will be manifest in the application, we can start setting those limits.

@jbenet
Copy link
Member Author

jbenet commented Jul 15, 2015

We already know how some of those would behave, for example, disk. Trigger gc after a threshold, and stop accepting blocks after the limit.


Sent from Mailbox

On Wed, Jul 15, 2015 at 12:36 PM, Jeromy Johnson notifications@github.com
wrote:

@jbenet yeap. My concern is that before we even think about configurable limits and such, we need to determine how the system behaves when you are out of a certain resource, whether thats open connections, disk space, or memory. Once we determine how a limit will be manifest in the application, we can start setting those limits.

Reply to this email directly or view it on GitHub:
#1482 (comment)

@whyrusleeping
Copy link
Member

okay, when we stop accepting blocks, how does that affect the user? Do we just start returning 'error disk full' up the stack everywhere? (probably)

@jbenet
Copy link
Member Author

jbenet commented Jul 16, 2015

yeah, it's a write error. same would happen if the OS's disk got full.

On Wed, Jul 15, 2015 at 1:12 PM, Jeromy Johnson notifications@github.com
wrote:

okay, when we stop accepting blocks, how does that affect the user? Do we
just start returning 'error disk full' up the stack everywhere? (probably)


Reply to this email directly or view it on GitHub
#1482 (comment).

@rht rht self-assigned this Jul 16, 2015
@jbenet jbenet mentioned this issue Jul 27, 2015
43 tasks
@davidar
Copy link
Member

davidar commented Sep 14, 2015

👍 the daemon keeps consuming my meager ADSL upload bandwidth

@jbenet
Copy link
Member Author

jbenet commented Sep 14, 2015

These are a big deal, we should get back on these.

@slothbag
Copy link

slothbag commented Nov 8, 2015

My VPS runs out of RAM pretty quickly with IPFS consuming 80% of it (this is not adding, just idling).. other daemons start to shut down due to out of memory.

Granted my VPS has only 128 or 256mb (cant remember which), but still, I would think its possible to seed some content with minimal resources.

@jbenet
Copy link
Member Author

jbenet commented Nov 10, 2015

agreed. we should start adding memory constraints as tests for long running nodes to ipfs

@rht
Copy link
Contributor

rht commented Nov 24, 2015

Update here:

  • Datastore.StorageMax, Datastore.StorageGCWatermark has been implemented. However, it'd say it would consume much less resource to simply calculate / keep track of number of hashes stored in datastore.
  • For network bandwidth, I haven't found a battle-scarred rate limiting lib to use (there are plenty, but haven't reviewed them), but meanwhile I propose that unit-less constraint can be implemented with golang.org/x/net/netutil, to limit the number of simultaneous connection to the http api/gateway.
  • swarm bandwidth has been indirectly constrained through the fd limit https://github.com/ipfs/go-ipfs/blob/20b06a4cbce8884f5b194da6e98cb11f2c77f166/p2p/net/swarm/swarm_dial.go#L44 -- if this fdconstraint doesn't exist, does limiting the number of swarms indirectly limits number of fd dials, @whyrusleeping ? If so, it is more intuitive to just limit the swarms, and expose this to config.
  • memory. I don't need to run the ipfsnode long enough to require double C-c to kill it (is this an evidence of zombie goroutines?). More systematic mem leak reports would open path here.

@jbenet
Copy link
Member Author

jbenet commented Nov 30, 2015

Thanks for update @rht

Re limits, i think people will mostly want to set hard BW caps in explicit KB/s.

@SCBuergel
Copy link

What other things are we interested in limiting?

I just randomly found this discussion while trying to limit the overall output traffic (per day / month). I think limiting output traffic could be an interesting thing (especially with respect to file coin one day) as egress traffic is typically limited in cloud settings like AWS or Azure. There I am fine with temporary spikes of high bandwidth as long as my output traffic stays within some bounds per unit of time. Setting a limit per hour / day / month might make sense to prevent from blowing a months volume in a day / hour.

@PlanetPlan
Copy link

Hi, thanks very much for IPFS.

I did not carefully read the above, so some of the following may be duplicates. This is all long-term things to think about, nothing that is a headache for me right now. The following are some usage models that may suggest features for controlling resources:

  • My normal network connection is slow by many standards. When I am using the network interactively, I'd like IPFS to avoid/reduce background traffic, though stills serve my foreground file requests at full bandwidth. When I am idle (not interactive), I'd like IPFS to ramp up network usage so my system can be a friendly member of the caching/serving community.
  • A similar comment applies to IPFS disk bandwidth and CPU usage: back off when I am interactive, use freely when I am idle.
  • I want actual files to be cached someplace other than ~/.ipfs so they are not part of my backup state.
  • On a laptop, I have some network connections that are pay-per-byte. I'd like to leave IPFS enabled so I can use it, but I'd like to be an "unfriendly" member of the community because network traffic costs are quite high. Conversely, when I am on a fast/cheap network, I'd like to build up "credit" so I get good service when I am on a high-price network and being "momentarily unfriendly".
  • A similar comment applies to removable media: I have limited built-in storage on a laptop and so often plug in a removable drive when relatively "stationary". It would be useful to have both a "for sure" area for IPFS on the built-in drive plus an "optional" area on removable drives.

@clownfeces
Copy link

For vpn users, being able to limit the maximum number of connections is a very important feature, since many vpns automatically disconnect you if you have to many open connections (it's probably some sort of protection to fight spammers and ddosers). IPFS by default creates hundreds of connections, so its barely usable, unless you don't care if you regularly get disconnected.

@davidak
Copy link

davidak commented Aug 6, 2016

I want to report some resource usage stats:

I have an ipfs node version 0.4.2 running on a VM with 1 core and 1 GB RAM. No files added or pinned!

bildschirmfoto 2016-08-06 um 18 00 41
bildschirmfoto 2016-08-06 um 18 01 09
It uses 465 MB RAM just to keep connections to 214 peers open. (are that all running nodes?)

@Kubuxu
Copy link
Member

Kubuxu commented Aug 7, 2016

It means that it is directly in connection with 214 peers, those are live nodes in the network, we might want to start limiting that. Deluge (torrent client) by default allows for 200 connections and only 50 active at the time, but it uses utp which we were unable to do successfully due to utp lib for Go hanging.

@davidak is that netdata collector for IPFS? Looks nice, have you published it somewhere?

@davidak
Copy link

davidak commented Aug 7, 2016

@Kubuxu the IPFS netdata plugin just got merged some minutes ago ;)

netdata/netdata#761

@fiatjaf
Copy link

fiatjaf commented Aug 8, 2016

What bothers me is the network usage:

Makes even ssh'ing to my VPS horribly slow.

@slothbag
Copy link

slothbag commented Aug 8, 2016

I've had some luck using linux "tc" command to throttle IPFS down to about 10KB/s outbound.. this has the side-effect of dropping incoming down to about 15-20KB/s

I can see IPFS is using 100% of its allocated 10KB/s all day every day, but at least I can calculate how much bandwidth that is per month to ensure I don't go over my quotas.

And a nice bonus is it significantly reduces memory usage, which is now hovering around 50-100Mb.

@jbenet
Copy link
Member Author

jbenet commented Aug 8, 2016

@slothbag does it work in that condition?
On Mon, Aug 8, 2016 at 19:24 slothbag notifications@github.com wrote:

I've had some luck using linux "tc" command to throttle IPFS down to about
10KB/s outbound.. this has the side-effect of dropping incoming down to
about 15-20KB/s

I can see IPFS is using 100% of its allocated 10KB/s all day every day,
but at least I can calculate how much bandwidth that is per month and
ensure I don't go over my quotas.

And a nice bonus is it significantly reduces memory usage, which is now
hovering around 50-100Mb.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1482 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoVvIbJott0tMhgPRIq8P8Kv9pdA3ks5qd7q9gaJpZM4FZWAT
.

@pataquets
Copy link
Contributor

Where applicable, different bw limits for pinned items would be a nice feature to have. Users might be more inclined to providing bandwidth for files they find important enough to pin.

@ajbouh
Copy link

ajbouh commented Aug 28, 2017 via email

@dokterbob
Copy link
Contributor

+1 for node.memlimit

Although @jbenet suggests we can have this done on a higher level, a long-running actively used IPFS daemon will currently eat all memory available on a system which basically means that, without memory constraints it will not be stable.

Obviously, the memory footprint (#3318) could be reduced but given that the project moves forward very fast feature wise, there will be new kinds of memory waste popping up.

@haasn
Copy link

haasn commented Nov 1, 2017

ipfs for me has several hundreds of open connections, which triggers a number of warning mechanisms including TCP resets/s (many dozens) and makes it look like a network scan.

Connecting to this many peers seems insane for a p2p network. Being able to limit this would be a high priority for me.

@whyrusleeping
Copy link
Member

whyrusleeping commented Nov 1, 2017 via email

@gwpl
Copy link

gwpl commented Jan 17, 2018

I need also limit for maximum open files! (causes: #4589 )

@KrzysiekJ
Copy link

@whyrusleeping: go-ipfs v0.4.13 still maintains several hundreds of open connections.

@whyrusleeping
Copy link
Member

@KrzysiekJ Yeah, DHTs need to maintain a decent number of open connections for proper functioning. You can tweak it lower in your configuration file, Look for Swarm.ConnMgr

@EternityForest
Copy link

Does the DHT actually need to maintain large numbers of connections to work? It seems like you need to know the locations of a good number of DHT peers, but why actually connect to them?

Can't we just keep a list of a few thousand peers, and figure out if they're still up if/when they're needed?

Connectionless DHT queries should only take 1 UDP round trip per hop if you don't use a handshake or encryption, and it's not like you can't monitor someone pretty easily as is(Connect to them, and watch their wantlist broadcasts).

Congestion doesn't seem like it should be that much of an issue, especially if you limit retries, If they aren't there after 3 or 4 attempts, you just assume they aren't online anymore and try a different path.

An advantage of connectionless is that you can potentially store the last known IP of millions of nodes, meaning most of the network can be within 2 or 3 hops.

That has the issue of concentrating traffic on a few nodes for popular content, but I suspect there's ways of managing that.

@Stebalien
Copy link
Member

Stebalien commented Feb 7, 2018

Does the DHT actually need to maintain large numbers of connections to work? It seems like you need to know the locations of a good number of DHT peers, but why actually connect to them?

Correct. Unfortunately, we don't have any working UDP based protocols at the moment anyways. However, we're working on supporting QUIC. While this wouldn't be a connection-less protocol, connections won't take up file descriptors and we can save memory/bandwidth by "suspending" unused connections (remember the connection's session information but otherwise go silent).

In the future, we'd like a real packet transport system but we aren't there yet. The tricky part will be getting the abstractions right will take a bit of work because we try to make all parts of the IPFS/libp2p stack pluggable.

Connectionless DHT queries should only take 1 UDP round trip per hop if you don't use a handshake or encryption, and it's not like you can't monitor someone pretty easily as is(Connect to them, and watch their wantlist broadcasts).

The encryption isn't just about monitoring, it also prevents middle boxes from being "smart". However, as we generally don't care about replay or perfect forward secrecy for DHT messages, we may be able to encrypt these requests without creating a connection (although that gets expensive if we send more than one message). Again, the tricky part will be getting the abstractions correct (and, in this case, not creating a security footgun).

An advantage of connectionless is that you can potentially store the last known IP of millions of nodes, meaning most of the network can be within 2 or 3 hops.

Unfortunately, IPFS nodes tend to go offline/online all the time. Having connections open helps us keep track of which ones are online. However, the solution here is to just not have flaky nodes act as DHT nodes.

@andrewchambers
Copy link

FWIW: Many operating systems provide facilities for limiting all of those things e.g. consider using linux containers and separate disk partitions. It is then up to ipfs to just handle error conditions returned by the OS properly.

@dokterbob
Copy link
Contributor

dokterbob commented Apr 9, 2018 via email

@Macil
Copy link

Macil commented Apr 9, 2018

If you make the OS / docker limit the memory that ipfs uses, then will ipfs be careful to use less than that amount? If not, ipfs might just keep charging headfirst into the limit and get regularly killed/restarted by the system.

@dokterbob
Copy link
Contributor

dokterbob commented Apr 9, 2018 via email

@Kubuxu
Copy link
Member

Kubuxu commented Apr 10, 2018

We would hard limit the amount of used memory if Golang allowed for it but it does not.
This means we can only chase bugs and try to fix them to limit the memory usage.

@Macil
Copy link

Macil commented Apr 10, 2018

I don't want limits in order to limit the impact of bugs; I'm worried about limiting the amount of memory that ipfs uses under arbitrarily high load. I want to do things like set ipfs to refuse or queue new connections if it's processing too many right now, etc.

@Kubuxu
Copy link
Member

Kubuxu commented Apr 11, 2018

@agentme this isn't a problem right now. Currently, AFAIK, most memory issues are due to bugs.

@CocoonCrash
Copy link

Bugs happen and no one should only rely on the fact no problems will happen once known ones are corrected. I think most limitations as mentioned by @jbenet are necessary as is a seatbelt while driving.

Golang can't have ressource consumption limits set, but "breathing sleep" of some milliseconds can get coded for an end user not to "loose control" of its device for example. And/or the number of effective TCP connections/used bandwidth could also be limited as those are part of the software design.

My personal understanding about this is that one of the numerous goals of IPFS is efficiency, so consuming a lot of ressources (cpu, memory, bandwich) on edges while in idle mode is not an option as it could be seen as an "uncontrolled" software. Would you want a computer knowing that if connected to internet it couldn't get used as it's ensuring everything is working well? Remember me antivirus running on Windows years ago.

I'm far from an IPFS/Libp2p expert, but maybe each node could implement a pub/sub like scheme to open only one connection to listen for heartbeats sent from other nodes referencing it. And when a node's heartbeat is missing for too long it could trigger the DHT routing table to be renewed the regular TCP way. That would be a compromise between UDP/TCP as discussed by @loadletter and @whyrusleeping earlier.

This could also be used to optimise/adapt routing as it could offer a pseudo-latency or workload/availability shared monitoring between nodes, even if I think libp2p already implement many close things as of nodes auto discovering on a common network or that IPFS intends to work even if part of the network get splited in subnetworks etc...

I really hope this will get improved as I think it currently is an adoption barrier. IPFS is a really great and promising thing, and I really thank every designer/contributor for all the work done, but I also would really love seeing it spreading to the whole universe ;)

@theduke
Copy link

theduke commented Oct 5, 2018

As a note of reference, I had problems with ipfs-daemon consistently killing my WiFi connection after a few minutes. I had to disconnect and reconnect manually. (OS: Arch Linux + NetworkManager).

After limiting the maximum connections to 300 (with Swarm.ConnMgr.HighWater).

It works fine now, but this is really bad for the average user where they might just not understand why their internet is suddenly so slow or not working correctly.

The default setup should be very conservative with resources used.

@priom
Copy link

priom commented Jul 13, 2021

Any new update on this?

@guseggert
Copy link
Contributor

Libp2p has recently added a "resource manager" which we are working to integrate with go-ipfs, we are planning to release it in v0.13 (there is a chance it could be delayed to v0.14).

More info: https://github.com/libp2p/go-libp2p-resource-manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/community-input Needs input from the wider community
Projects
None yet
Development

Successfully merging a pull request may close this issue.