Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Proposal: Dragonfly supports P2P based on Streaming to reduce disk IO #1164

Open
lowzj opened this issue Jan 9, 2020 · 6 comments
Open

Proposal: Dragonfly supports P2P based on Streaming to reduce disk IO #1164

lowzj opened this issue Jan 9, 2020 · 6 comments
Labels
kind/performance any issue related to software performance kind/proposal any proposals for project SoC2020 2020 Summer of Code

Comments

@lowzj
Copy link
Member

lowzj commented Jan 9, 2020

Backgrounds

Now the client of Dragonfly will random read and write disk multiple times during the downloading process.
For directly using dfget to download a file:

  • dfget random writes a piece into disk after downloading it
  • dfget server random reads the piece from disk to share it
  • dfget sequential reads the file from disk after downloading to do checksum

And for using dfdaemon to pull images, there're extra disk IO by Dragonfly:

  • dfdaemon sequential reads the file from disk to send it to dockerd

It's not a problem when the host has a local disk. But it will be a potential
bottleneck when the Dragonfly client runs on a virtual machine with a cloud disk, all the disk IO will become network IO which has a bad performance when read/write at the same time.

So a solution is needed to reduce the IO times generated by Dragonfly.

Idea

P2P Streaming is a P2P based on Streaming, which sends the data downloaded by using p2p pattern to the user directly, in order to achieve the purpose of reading and writing to disk as few as possible.

P2P Streaming Data Flow

This diagram describes the p2p streaming data flow.

image

  • Piece Data Cache stores the pieces' data in memory that can be shared to the other peers. A piece's data should be putted into this cache after downloading, and be evicted according to the LRU strategy when the cache is full.
  • StreamIO sends pieces' data to callers in ascending order of piece's number.
  • In the scenario of using dfdaemon to pulling images and others , the dfdaemon and dfget should be merged into one process that can reduce the time of starting dfget process.
  • Also dfget can be as an individual process to download files directly.

P2P Streaming Sliding Window

The P2P Streaming Sliding Window is designed to control the number of pieces of a file that can be scheduled and downloaded to avoid unlimited memory usage. This idea comes from tcp sliding window, but its minimal transmission unit is a piece not a byte.

image

  • Memory Cache is the Piece Data Cache to share pieces in the p2p network. The larger the cache, the higher the p2p transmission efficiency.
@pouchrobot pouchrobot added the kind/proposal any proposals for project label Jan 9, 2020
@SataQiu
Copy link
Member

SataQiu commented Jan 9, 2020

Great job! I am willing to help.

@xujihui1985
Copy link

  • start long live peerserver in dfdaemon
  • move download logic in DownloadContext, change result type to http.Resonse
  • use library to download piece instead of dfget binary
  • supernode support generate taskID with http header range
  • extract new Writer interface for ClientWriter
  • new ClientStreamWriter that implement Writer to write download piece to stream instead of file
  • ClientStreamWriter implement both io.ReaderCloser and io.Writer
  • when register task to supernode,return file metadata, eg: file length, dfdaemon should request source directly

@allencloud allencloud added the kind/performance any issue related to software performance label Feb 4, 2020
@govardhangdg
Copy link

Hello @lowzj! I'd like to pick this up as part of GSoC.

@lowzj lowzj added the SoC2020 2020 Summer of Code label May 21, 2020
@RonnieGandhi
Copy link

Hi, @lowzj I would like to work on this as a part of ASoC2020. Could you guide with what is the current state of development since it appears it has been worked on by others as well?

@lowzj
Copy link
Member Author

lowzj commented Jun 12, 2020

Hi, @lowzj I would like to work on this as a part of ASoC2020. Could you guide with what is the current state of development since it appears it has been worked on by others as well?

It has multiple tasks. We provide one task of it for ASoC2020: optimizing the scheduling algorithm of supernode for p2p-streaming. You can work on it.

@liudechi7
Copy link

@lowzj

when this feature(p2p-streaming) will be ready,thx!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/performance any issue related to software performance kind/proposal any proposals for project SoC2020 2020 Summer of Code
Projects
None yet
Development

No branches or pull requests

8 participants