diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 466656af7cc..d3628919736 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -37,3 +37,4 @@ - [0413-presign](rfcs/0413-presign.md) - [0423-command-line-interface](rfcs/0423-command-line-interface.md) - [0429-init-from-iter](rfcs/0429-init-from-iter.md) + - [0438-multipart](rfcs/0438-multipart.md) diff --git a/docs/rfcs/0438-multipart.md b/docs/rfcs/0438-multipart.md new file mode 100644 index 00000000000..f98ebc21785 --- /dev/null +++ b/docs/rfcs/0438-multipart.md @@ -0,0 +1,163 @@ +- Proposal Name: `multipart` +- Start Date: 2022-07-11 +- RFC PR: [datafuselabs/opendal#438](https://github.com/datafuselabs/opendal/pull/438) +- Tracking Issue: [datafuselabs/opendal#439](https://github.com/datafuselabs/opendal/issues/439) + +# Summary + +Add multipart support in OpenDAL. + +# Motivation + +[Multipart Upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) APIs are widely used in object storage services to upload large files concurrently and resumable. + +A successful multipart upload includes the following steps: + +- `CreateMultipartUpload`: Start a new multipart upload. +- `UploadPart`: Upload a single part with the previously uploaded id. +- `CompleteMultipartUpload`: Complete a multipart upload to get a regular object. + +To cancel a multipart upload, users need to call `AbortMultipartUpload`. + +Apart from those APIs, most object services also provide a list API to get the current multipart uploads status: + +- `ListMultipartUploads`: List current ongoing multipart uploads +- `ListParts`: List already uploaded parts. + +Before `CompleteMultipartUpload` has been called, users can't read already uploaded parts. + +After `CompleteMultipartUpload` or `AbortMultipartUpload` has been called, all uploaded parts will be removed. + +Object storage services commonly allow 10000 parts, and every part will allow up to 5 GiB. This way, users can upload a file up to 48.8 TiB. + +OpenDAL users can upload objects larger than 5 GiB via supporting multipart uploads. + +# Guide-level explanation + +Users can start a multipart upload via: + +```rust +let mp = op.object("path/to/file").create_multipart().await?; +``` + +Or build a multipart via already known upload id: + +```rust +let mp = op.object("path/to/file").into_multipart(""); +``` + +With `Multipart`, we can upload a new part: + +```rust +let part = mp.write(part_number, content).await?; +``` + +After all parts have been uploaded, we can finish this upload: + +```rust +let _ = mp.complete(parts).await?; +``` + +Or, we can abort already uploaded parts: + +```rust +let _ = mp.abort().await?; +``` + +# Reference-level explanation + +`Accessor` will add the following APIs: + +```rust +pub trait Accessor: Send + Sync + Debug { + async fn create_multipart(&self, args: &OpCreateMultipart) -> Result { + let _ = args; + unimplemented!() + } + + async fn write_multipart(&self, args: &OpWriteMultipart) -> Result { + let _ = args; + unimplemented!() + } + + async fn complete_multipart(&self, args: &OpCompleteMultipart) -> Result<()> { + let _ = args; + unimplemented!() + } + + async fn abort_multipart(&self, args: &OpAbortMultipart) -> Result<()> { + let _ = args; + unimplemented!() + } +} +``` + +While closing a `PartWriter`, a `Part` will be generated. + +`Operator` will build APIs based on `Accessor`: + +```rust +impl Object { + async fn create_multipart(&self) -> Result {} + fn into_multipart(&self, upload_id: &str) -> Multipart {} +} + +impl Multipart { + async fn write(&self, part_number: usize, bs: impl AsRef<[u8]>) -> Result {} + async fn writer(&self, part_number: usize, size: u64) -> Result {} + async fn complete(&self, ps: &[Part]) -> Result<()> {} + async fn abort(&self) -> Result<()> {} +} +``` + +# Drawbacks + +None. + +# Rationale and alternatives + +## Why not add new object modes? + +It seems natural to add a new object mode like `multipart`. + +```rust +pub enum ObjectMode { + FILE, + DIR, + MULTIPART, + Unknown, +} +``` + +However, to make this work, we need big API breaks that introduce `mode` in Object. + +And we need to change every API call to accept `mode` as args. + +For example: + +```rust +let _ = op.object("path/to/dir/").list(ObjectMODE::MULTIPART); +let _ = op.object("path/to/file").stat(ObjectMODE::MULTIPART) +``` + +## Why not split Object into File and Dir? + +We can split `Object` into `File` and `Dir` to avoid requiring `mode` in API. There is a vast API breakage too. + +# Prior art + +None. + +# Unresolved questions + +None. + +# Future possibilities + +## Support list multipart uploads + +We can support listing multipart uploads to list ongoing multipart uploads so we can resume an upload or abort them. + +## Support list part + +We can support listing parts to list already uploaded parts for an upload. diff --git a/src/operator.rs b/src/operator.rs index 1fca71de229..7f7ab516aae 100644 --- a/src/operator.rs +++ b/src/operator.rs @@ -350,7 +350,7 @@ impl BatchOperator { BatchOperator { src: op } } - /// Walk a dir in top down way: list current dir first and than list nested dir. + /// Walk a dir in top down way: list current dir first and then list nested dir. /// /// Refer to [`TopDownWalker`] for more about the behavior details. pub fn walk_top_down(&self, path: &str) -> Result { @@ -360,7 +360,7 @@ impl BatchOperator { )))) } - /// Walk a dir in bottom up way: list nested dir first and than current dir. + /// Walk a dir in bottom up way: list nested dir first and then current dir. /// /// Refer to [`BottomUpWalker`] for more about the behavior details. pub fn walk_bottom_up(&self, path: &str) -> Result {