This repository has been archived by the owner on Dec 20, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 773
Proposal: Dragonfly supports P2P based on Streaming to reduce disk IO #1164
Labels
kind/performance
any issue related to software performance
kind/proposal
any proposals for project
SoC2020
2020 Summer of Code
Comments
Great job! I am willing to help. |
|
Hello @lowzj! I'd like to pick this up as part of GSoC. |
Hi, @lowzj I would like to work on this as a part of ASoC2020. Could you guide with what is the current state of development since it appears it has been worked on by others as well? |
It has multiple tasks. We provide one task of it for ASoC2020: optimizing the scheduling algorithm of supernode for p2p-streaming. You can work on it. |
when this feature(p2p-streaming) will be ready,thx! |
sungjunyoung
pushed a commit
to sungjunyoung/Dragonfly
that referenced
this issue
May 8, 2022
Signed-off-by: Gaius <gaius.qi@gmail.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
kind/performance
any issue related to software performance
kind/proposal
any proposals for project
SoC2020
2020 Summer of Code
Backgrounds
Now the client of Dragonfly will random read and write disk multiple times during the downloading process.
For directly using
dfget
to download a file:dfget
random writes a piece into disk after downloading itdfget server
random reads the piece from disk to share itdfget
sequential reads the file from disk after downloading to do checksumAnd for using
dfdaemon
to pull images, there're extra disk IO by Dragonfly:dfdaemon
sequential reads the file from disk to send it todockerd
It's not a problem when the host has a local disk. But it will be a potential
bottleneck when the Dragonfly client runs on a virtual machine with a cloud disk, all the disk IO will become network IO which has a bad performance when read/write at the same time.
So a solution is needed to reduce the IO times generated by Dragonfly.
Idea
P2P Streaming is a P2P based on Streaming, which sends the data downloaded by using p2p pattern to the user directly, in order to achieve the purpose of reading and writing to disk as few as possible.
P2P Streaming Data Flow
This diagram describes the p2p streaming data flow.
Piece Data Cache
stores the pieces' data in memory that can be shared to the other peers. A piece's data should be putted into this cache after downloading, and be evicted according to theLRU
strategy when the cache is full.StreamIO
sends pieces' data to callers in ascending order ofpiece's number
.dfdaemon
to pulling images and others , thedfdaemon
anddfget
should be merged into one process that can reduce the time of startingdfget
process.dfget
can be as an individual process to download files directly.P2P Streaming Sliding Window
The
P2P Streaming Sliding Window
is designed to control the number of pieces of a file that can be scheduled and downloaded to avoid unlimited memory usage. This idea comes from tcp sliding window, but its minimal transmission unit is apiece
not abyte
.Memory Cache
is thePiece Data Cache
to share pieces in the p2p network. The larger the cache, the higher the p2p transmission efficiency.The text was updated successfully, but these errors were encountered: