Skip to content

Commit

Permalink
RFC-0438: Multipart (#438)
Browse files Browse the repository at this point in the history
* rfc: Multipart

Signed-off-by: Xuanwo <github@xuanwo.io>

* Assign number

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix typo

Signed-off-by: Xuanwo <github@xuanwo.io>

* Add guide-level explanation

Signed-off-by: Xuanwo <github@xuanwo.io>

* Fix other parts

Signed-off-by: Xuanwo <github@xuanwo.io>
  • Loading branch information
Xuanwo committed Jul 12, 2022
1 parent fb4ba4b commit 1c9f9d7
Show file tree
Hide file tree
Showing 3 changed files with 166 additions and 2 deletions.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,4 @@
- [0413-presign](rfcs/0413-presign.md)
- [0423-command-line-interface](rfcs/0423-command-line-interface.md)
- [0429-init-from-iter](rfcs/0429-init-from-iter.md)
- [0438-multipart](rfcs/0438-multipart.md)
163 changes: 163 additions & 0 deletions docs/rfcs/0438-multipart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
- Proposal Name: `multipart`
- Start Date: 2022-07-11
- RFC PR: [datafuselabs/opendal#438](https://github.com/datafuselabs/opendal/pull/438)
- Tracking Issue: [datafuselabs/opendal#439](https://github.com/datafuselabs/opendal/issues/439)

# Summary

Add multipart support in OpenDAL.

# Motivation

[Multipart Upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) APIs are widely used in object storage services to upload large files concurrently and resumable.

A successful multipart upload includes the following steps:

- `CreateMultipartUpload`: Start a new multipart upload.
- `UploadPart`: Upload a single part with the previously uploaded id.
- `CompleteMultipartUpload`: Complete a multipart upload to get a regular object.

To cancel a multipart upload, users need to call `AbortMultipartUpload`.

Apart from those APIs, most object services also provide a list API to get the current multipart uploads status:

- `ListMultipartUploads`: List current ongoing multipart uploads
- `ListParts`: List already uploaded parts.

Before `CompleteMultipartUpload` has been called, users can't read already uploaded parts.

After `CompleteMultipartUpload` or `AbortMultipartUpload` has been called, all uploaded parts will be removed.

Object storage services commonly allow 10000 parts, and every part will allow up to 5 GiB. This way, users can upload a file up to 48.8 TiB.

OpenDAL users can upload objects larger than 5 GiB via supporting multipart uploads.

# Guide-level explanation

Users can start a multipart upload via:

```rust
let mp = op.object("path/to/file").create_multipart().await?;
```

Or build a multipart via already known upload id:

```rust
let mp = op.object("path/to/file").into_multipart("<upload_id>");
```

With `Multipart`, we can upload a new part:

```rust
let part = mp.write(part_number, content).await?;
```

After all parts have been uploaded, we can finish this upload:

```rust
let _ = mp.complete(parts).await?;
```

Or, we can abort already uploaded parts:

```rust
let _ = mp.abort().await?;
```

# Reference-level explanation

`Accessor` will add the following APIs:

```rust
pub trait Accessor: Send + Sync + Debug {
async fn create_multipart(&self, args: &OpCreateMultipart) -> Result<String> {
let _ = args;
unimplemented!()
}

async fn write_multipart(&self, args: &OpWriteMultipart) -> Result<PartWriter> {
let _ = args;
unimplemented!()
}

async fn complete_multipart(&self, args: &OpCompleteMultipart) -> Result<()> {
let _ = args;
unimplemented!()
}

async fn abort_multipart(&self, args: &OpAbortMultipart) -> Result<()> {
let _ = args;
unimplemented!()
}
}
```

While closing a `PartWriter`, a `Part` will be generated.

`Operator` will build APIs based on `Accessor`:

```rust
impl Object {
async fn create_multipart(&self) -> Result<Multipart> {}
fn into_multipart(&self, upload_id: &str) -> Multipart {}
}

impl Multipart {
async fn write(&self, part_number: usize, bs: impl AsRef<[u8]>) -> Result<Part> {}
async fn writer(&self, part_number: usize, size: u64) -> Result<impl PartWrite> {}
async fn complete(&self, ps: &[Part]) -> Result<()> {}
async fn abort(&self) -> Result<()> {}
}
```

# Drawbacks

None.

# Rationale and alternatives

## Why not add new object modes?

It seems natural to add a new object mode like `multipart`.

```rust
pub enum ObjectMode {
FILE,
DIR,
MULTIPART,
Unknown,
}
```

However, to make this work, we need big API breaks that introduce `mode` in Object.

And we need to change every API call to accept `mode` as args.

For example:

```rust
let _ = op.object("path/to/dir/").list(ObjectMODE::MULTIPART);
let _ = op.object("path/to/file").stat(ObjectMODE::MULTIPART)
```

## Why not split Object into File and Dir?

We can split `Object` into `File` and `Dir` to avoid requiring `mode` in API. There is a vast API breakage too.

# Prior art

None.

# Unresolved questions

None.

# Future possibilities

## Support list multipart uploads

We can support listing multipart uploads to list ongoing multipart uploads so we can resume an upload or abort them.

## Support list part

We can support listing parts to list already uploaded parts for an upload.
4 changes: 2 additions & 2 deletions src/operator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ impl BatchOperator {
BatchOperator { src: op }
}

/// Walk a dir in top down way: list current dir first and than list nested dir.
/// Walk a dir in top down way: list current dir first and then list nested dir.
///
/// Refer to [`TopDownWalker`] for more about the behavior details.
pub fn walk_top_down(&self, path: &str) -> Result<DirStreamer> {
Expand All @@ -360,7 +360,7 @@ impl BatchOperator {
))))
}

/// Walk a dir in bottom up way: list nested dir first and than current dir.
/// Walk a dir in bottom up way: list nested dir first and then current dir.
///
/// Refer to [`BottomUpWalker`] for more about the behavior details.
pub fn walk_bottom_up(&self, path: &str) -> Result<DirStreamer> {
Expand Down

1 comment on commit 1c9f9d7

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for opendal ready!

✅ Preview
https://opendal-1y2c4tvhw-databend.vercel.app

Built with commit 1c9f9d7.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.