Replies: 7 comments 4 replies
-
This is for my personal miner, not our PiKNiK system: Total number of hosts - 1 miner/lotus node; 1 boost node; and 1 seal worker. |
Beta Was this translation helpful? Give feedback.
-
Current onboarding 1-3 TiB per 24 hour of legacy deal via slingshot evergreen. Less than 6 deals per 24 hours in v2 boost deals. Ideal able to max out my 24 hour sealing capacity. 6TiB per 24 hour or 90% of external bandwidth. Bottlenecks No I do not run a distributed key/value store for other services. Miner: f01611097 Power: 2.14 Pi / 17.1 Ei (0.0122%) Workers: Seal(13) WdPoSt(1) WinPoSt(1) Storage Deals: 7769, 210.2 TiB Retrieval Deals (complete): 54948, 17.11 TiB lotus version 1.15.2+mainnet+git.518dc962e Hardware 3 Node Mining Cluster: — Node 1 details
— Node 2 details
— Node 3 details
4 Node Ceph Cluster for Storage |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Total number of hosts - sealer worker hosts for PC1/2 C1/2, etc. Hosts’ configuration (CPU / RAM / disks (HDD, SSD) / GPU)?
Number of sealer nodes? Current raw byte power? Planned raw byte power? How do you store your unsealed and sealed data? Do you have NFS? What is your public/external network bandwidth? What is your internal network bandwidth? How many deals and how much data (GiB or TiB) do you onboard per day today? How many deals and how much data (GiB or TiB) do you have capacity for and want to onboard if possible in an ideal world? What bottlenecks do you see with Lotus and Boost today? If you are not running Boost, describe lotus-markets. Does your organization already run a distributed key/value store for other services? Do you have a preferred key/value store? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Total number of hosts - sealer worker hosts for PC1/2 C1/2, etc.
Hosts’ configuration (CPU / RAM / disks (HDD, SSD) / GPU)?
Current raw byte power?
Planned raw byte power?
How do you store your unsealed and sealed data? Do you have NFS?
What is your public/external network bandwidth?
What is your internal network bandwidth?
How many deals and how much data (GiB or TiB) do you onboard per day today?
How many deals and how much data (GiB or TiB) do you have capacity for and want to onboard if possible in an ideal world?
What bottlenecks do you see with Lotus and Boost today? If you are not running Boost, describe lotus-markets.
Does your organization already run a distributed key/value store for other services? Do you have a preferred key/value store?
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Storage providers’ questionnaire
Storage providers, please reply to this thread with answers to the questionnaire
Could you describe your current infrastructure:
lotus-markets
.Background
At the moment we are in a pre-release stage with the Boost project. Boost is an in-place replacement for Lotus markets sub-system. Storage providers included in alpha testing cohorts have already reported accepting deals with speed increases of up to 20x.
Having improved deal acceptance and data transfer rates, we are seeing a few bottlenecks with single-process Boost deployments, backed by tens/hundreds of sealing workers.
This document aims to document these bottlenecks and describe a way forward to address them given recent conversations within the Boost team, and start a conversation with the community storage providers, so that we better understand our users’ needs.
Problem definition / Requirements
Storage providers would like to be able to onboard data at petabytes scale.
During conversations with the PL Product team, as well as first hand interviews with SPs, one of the requirements we need to address in the coming months is that SPs are waiting on the sidelines and would like to onboard deals at rates from 200 TiB per day, up to PiBs per day.
Ease of upgradeability / reliability / no single-points-of-failure
At the moment it is possible to run
boostd
only as a single process. It should be possible to run aboostd
upgrade with zero-downtime, and high reliability.Support for multi-size storage providers
We must avoid pushing complexity to small storage providers. Storage providers should be able to continue to run Lotus / Boost easily, without too many dependencies, on single hosts if vertical scaling allows for that.
Bottlenecks
CommP calculation
CommP calculation is necessary in order to confirm that data for a deal sent by a client matches the data on the provider side, before that deal is published on-chain.
CommP calculating is rather intensive, and depending on hardware runs at speeds between 300-700 MBps. (or said differently, for a 32GiB piece, it takes anywhere from 50-90sec.)
If we assume a CommP calculation speed of 500MBps (476MiBps), we come up with an upper-bound of 39TiB worth of deals that we can onboard per day.
Storage providers would like to be able to onboard data at petabytes scale, about 100x that.
DAGStore indices
At the moment DAGStore is backed by an embedded database on a single machine. It is growing at a rate of 100GB per PiB.
If storage providers are onboarding petabytes of deals per day, we would need to be able to provide a solution for a sharded DAGStore across multiple machines. Furthermore given 1 (CommP calculation bottleneck) we need to be able to read/write from it from multiple processes. Given the required scale, we cannot be limited by a single user-land process both for redundancy as well as for throughput reasons.
Proposals
Make
boostd
a stateless serviceIn order to scale Boost, we are proposing to refactor
boostd
into a stateless service, with all global data stored separately, and accessible via a sidecar data service, namedboostd-data
. Small SPs would run the data service with an embedded store, such as LevelDB or Badger, in order to avoid the complexity of maintaining a separate database, while their data fits on a single host. For larger SPs, we’ll provide an implementation ofboostd-data
targeting a distributed key/value store.For large SPs: introduce a distributed key/value store for all global state
If Boost state doesn’t fit on a single host, SPs should be able to easily transition to a distributed sharded datastore and use it as a backend for all
boostd
instances.Beta Was this translation helpful? Give feedback.
All reactions