Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(writer): configurable buffer size of unsized write #2143

Merged
merged 12 commits into from
May 6, 2023

Conversation

wcy-fdu
Copy link
Contributor

@wcy-fdu wcy-fdu commented Apr 27, 2023

Currently only support configurable write buffer size for s3, gcs, and oss.
close #2087

@Xuanwo
Copy link
Member

Xuanwo commented Apr 27, 2023

I don't like current API:

  • Users should be able to specify the buffer size during write operation. (maybe not true?)
  • There are different buffer size/limit here, we should make the behavior more clear.
    • S3's buffer size requires every write size >= size
    • GCS's buffer size requires every write size == size to align with 256 KiB

For my current understanding, there are at least the following buffer related sizes:

  • the at_least_size for write operation: every write will have at least n bytes
  • the exact_size for write operation: every write will have exactly n times of bytes
  • the at_most_size for write operation: storage can buffer at most n bytes before write

I'm also seeking for better names. Any ideas?


Seems we do need to add value in config, the best value cloud be different for services.

@Xuanwo Xuanwo marked this pull request as draft April 27, 2023 11:20
@xyjixyjixyji
Copy link
Contributor

I'm also seeking for better names. Any ideas?

I think it is ok to specify the buffer size for each write, since one writer is constructed per-write operation.

Then maybe this write pattern could be a field in OpWrite like

enum Buffersize {
   AtLeast(usize),
   Exact(usize),
   AtMost(usize),
}

@Xuanwo
Copy link
Member

Xuanwo commented Apr 27, 2023

Then maybe this write pattern could be a field in OpWrite like

No, too much. And they could be set at the same time, for example, at least 4MiB, at most 16 MiB.

@suyanhanx
Copy link
Member

suyanhanx commented Apr 27, 2023

Why not just upper/lower? 👀

@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Apr 28, 2023

I don't like current API:

  • Users should be able to specify the buffer size during write operation. (maybe not true?)

  • There are different buffer size/limit here, we should make the behavior more clear.

    • S3's buffer size requires every write size >= size
    • GCS's buffer size requires every write size == size to align with 256 KiB

For my current understanding, there are at least the following buffer related sizes:

  • the at_least_size for write operation: every write will have at least n bytes
  • the exact_size for write operation: every write will have exactly n times of bytes
  • the at_most_size for write operation: storage can buffer at most n bytes before write

I'm also seeking for better names. Any ideas?

Seems we do need to add value in config, the best value cloud be different for services.

Seem that different services need different semantics, but maybe maintaining three related items is a little redundant. Waiting for an exact solution(naming scheme).

@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented May 5, 2023

Any suggestion?
If there is no good solution for the time being, we can close this pr as well.

@xyjixyjixyji
Copy link
Contributor

Any suggestion? If there is no good solution for the time being, we can close this pr as well.

I think this need some further discussion before we implement this.

core/src/services/gcs/backend.rs Outdated Show resolved Hide resolved
core/src/services/gcs/backend.rs Outdated Show resolved Hide resolved
core/src/services/gcs/backend.rs Outdated Show resolved Hide resolved
core/src/services/gcs/backend.rs Outdated Show resolved Hide resolved
core/src/services/oss/backend.rs Outdated Show resolved Hide resolved
core/src/services/s3/backend.rs Outdated Show resolved Hide resolved
core/src/services/gcs/backend.rs Outdated Show resolved Hide resolved
core/src/services/oss/backend.rs Outdated Show resolved Hide resolved
core/src/services/oss/backend.rs Outdated Show resolved Hide resolved
core/src/services/oss/backend.rs Outdated Show resolved Hide resolved
core/src/services/oss/writer.rs Outdated Show resolved Hide resolved
core/src/services/gcs/backend.rs Outdated Show resolved Hide resolved
core/src/services/gcs/writer.rs Outdated Show resolved Hide resolved
core/src/services/oss/writer.rs Outdated Show resolved Hide resolved
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks a lot!

@Xuanwo
Copy link
Member

Xuanwo commented May 6, 2023

Mostly LGTM, the only thing left is to resolve the conflicts.

@wcy-fdu wcy-fdu marked this pull request as ready for review May 6, 2023 16:18
@Xuanwo Xuanwo merged commit dfd286c into apache:main May 6, 2023
suyanhanx pushed a commit to suyanhanx/opendal that referenced this pull request May 8, 2023
* config buffer size for gcs

* config buffer size for oss s3

* refactor

* refactor buffer size limitation

* resolve comments

* minor

* minor

* resolve comments

* rebase main and resolve conflict

* minor

* typo
@Xuanwo Xuanwo mentioned this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the buffer size of streaming write configurable
4 participants