Skip to content

Conversation

@zuston
Copy link
Member

@zuston zuston commented Dec 16, 2022

What changes were proposed in this pull request?

To support multiple disk selection for one partition when using local storage

  1. Introduce the StorageSelector pluggable selector to support different selection strategy, such as multiple disk selection strategy(ChainableLocalStorageSelector) or concurrent strategy.
  2. Introduce the LocalFileClientReadMultiFileHandler to manage multiple handlers, every handler is bound to every disk's data file and index file.

Why are the changes needed?

  1. To make full use of the local disks capacity.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

  1. UTs

@zuston
Copy link
Member Author

zuston commented Dec 16, 2022

This is a draft to support multiple disk selection for one partition. @jerqi @advancedxy @xianjingfeng

If the design is OK, I will go ahead.

@advancedxy
Copy link
Contributor

This is a draft to support multiple disk selection for one partition. @jerqi @advancedxy @xianjingfeng

If the design is OK, I will go ahead.

I would take a look this weekend.

@xianjingfeng
Copy link
Member

xianjingfeng commented Dec 17, 2022

  1. Can we write to each disk evenly ?
  2. Can we let the client to decide whether to write to multiple disks? I think only large partitions need multiple disks.

@zuston
Copy link
Member Author

zuston commented Dec 17, 2022

  • Can we write to each disk evenly ?

No. Currently storage of one partition only will be switched when disk reaches high watermark.

  • Can we let the client to decide whether to write to multiple disks?

Yes, this could support in the future.

I think only large partitions need multiple disks.

No. If the disk space is small and partitions are many, this feature could make full use of local disk instead of fall back to HDFS

@xianjingfeng
Copy link
Member

xianjingfeng commented Dec 19, 2022

  • Can we write to each disk evenly ?

No. Currently storage of one partition only will be switched when disk reaches high watermark.

  • Can we let the client to decide whether to write to multiple disks?

Yes, this could support in the future.

I think only large partitions need multiple disks.

No. If the disk space is small and partitions are many, this feature could make full use of local disk instead of fall back to HDFS

If we just support switch to another disk when disk reaches high watermark, i think it is unnecessary to let the client to decide whether to write to multiple disks. I just want to make the large partition write faster, so I hope to support concurrent disk writing for a partition.

@zuston
Copy link
Member Author

zuston commented Dec 19, 2022

I just want to make the large partition write faster, so I hope to support concurrent disk writing for a partition.

I think this feature you mentioned is also meaningful, I also have similar thought.

Concurrent writing could be supported later. In current PR, I introduce the ChainableLocalStorage which is a view of multiple local storages. If you want to support concurrent writing, you could implement a PooledLocalStorage which could be as a view to control the concurrency of writing like #396.

@jerqi
Copy link
Contributor

jerqi commented Dec 21, 2022

Is this pr ready for review?

@zuston
Copy link
Member Author

zuston commented Dec 21, 2022

Is this pr ready for review?

No, I will do some change.

@zuston zuston marked this pull request as draft December 21, 2022 03:23
int64 offset = 6;
int32 length = 7;
int64 timestamp = 8;
int32 storageId = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good design choice. It's leaking too much details to client.
And there're other storage types that doesn't need local storage, such as memory and memory_hdfs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just reuse the meta data in
private final Map<PartitionUnionKey, ChainableLocalStorage> partitionsOfLocalStorage;?

Or another way would be that look up all the disk dirs to find the correct storage path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original shuffle-data reading will follow the rule

  1. Reading the remote whole index file to split and filter to get the required segments
  2. Reading the shuffle-data according to above segment's offset and length one by one

If we expose the unified abstraction for client to obey above reading sequence, it means we have to compose multiple files into abstract one and re-calculate the offset and length for every request to map the real file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good design choice. It's leaking too much details to client.

Yes. I also prefer giving a unified abstraction to hide the multiple under storages' detail for client, but currently I have no good ideas on this.

And there're other storage types that doesn't need local storage, such as memory and memory_hdfs.

Emmm... Only localfile storage will use this proto. Memory reading will use an independent api and HDFS reading will fetch data directly from HDFS datanode instead of fetch remote data from shuffle-server.

int64 offset = 6;
int32 length = 7;
int64 timestamp = 8;
int32 storageId = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using storageId. It expose too many details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet.

@zuston zuston force-pushed the LocalStorageSwitch branch from 69b8a64 to dcde67e Compare December 21, 2022 12:32
@zuston zuston marked this pull request as ready for review December 21, 2022 12:36
@zuston
Copy link
Member Author

zuston commented Dec 21, 2022

I have updated the code, which is a little bit different with previous version commit in LocalStorageManager part.

I introduce the StorageSelector pluggable selector to support different selection strategy, such as multiple disk selection strategy(ChainableLocalStorageSelector) or concurrent strategy.

@zuston zuston force-pushed the LocalStorageSwitch branch from dcde67e to 6955490 Compare December 22, 2022 02:44
@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2022

Codecov Report

Merging #435 (5f7c22d) into master (5321292) will decrease coverage by 0.07%.
The diff coverage is 58.38%.

@@             Coverage Diff              @@
##             master     #435      +/-   ##
============================================
- Coverage     58.62%   58.55%   -0.08%     
- Complexity     1642     1658      +16     
============================================
  Files           199      203       +4     
  Lines         11173    11274     +101     
  Branches        989      997       +8     
============================================
+ Hits           6550     6601      +51     
- Misses         4231     4282      +51     
+ Partials        392      391       -1     
Impacted Files Coverage Δ
...pache/uniffle/server/ShuffleServerGrpcService.java 0.80% <0.00%> (-0.01%) ⬇️
.../org/apache/uniffle/server/ShuffleTaskManager.java 74.16% <0.00%> (-0.51%) ⬇️
...uniffle/storage/factory/ShuffleHandlerFactory.java 0.00% <0.00%> (ø)
...dler/impl/LocalFileClientReadMultiFileHandler.java 0.00% <0.00%> (ø)
...orage/request/CreateShuffleReadHandlerRequest.java 0.00% <ø> (ø)
...orage/handler/impl/LocalFileClientReadHandler.java 52.63% <25.00%> (-8.66%) ⬇️
...r/storage/local/ChainableLocalStorageSelector.java 88.57% <88.57%> (ø)
...rg/apache/uniffle/server/ShuffleDataReadEvent.java 94.73% <100.00%> (+4.73%) ⬆️
...a/org/apache/uniffle/server/ShuffleServerConf.java 99.25% <100.00%> (+0.01%) ⬆️
...he/uniffle/server/storage/LocalStorageManager.java 90.00% <100.00%> (+1.64%) ⬆️
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zuston zuston changed the title [WIP][Feature] Support multiple disk selection for one partition when using local storage [Feature] Support multiple disk selection for one partition when using local storage Dec 22, 2022
@zuston zuston requested a review from jerqi December 22, 2022 07:36
@zuston zuston requested a review from advancedxy December 22, 2022 07:36
@advancedxy
Copy link
Contributor

This is ready for review?

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

@zuston
Copy link
Member Author

zuston commented Dec 22, 2022

This is ready for review?

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

Yes. Ready to review.

@zuston
Copy link
Member Author

zuston commented Dec 23, 2022

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

Looking forward to better design.

@zuston zuston linked an issue Dec 23, 2022 that may be closed by this pull request
3 tasks
@advancedxy
Copy link
Contributor

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

Looking forward to better design.

I went through this PR today. It may require some id to indicate which shuffle_data/shuffle_index current points, but I don't think it's a good design to have the client passing around the storageIndex/storageId.

One possible solution would be similar with how hdfs handles multiple IndexFile/DataFile.
For local shuffle client:

  1. Get all index data once, resulting a list of
  2. Create a list of LocalFileClientReaderHandler, which corresponding a list of shuffle data file.
  3. Iterate through sequentially/parallelly to fetch the data.

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

@zuston
Copy link
Member Author

zuston commented Dec 29, 2022

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

zuston added a commit to zuston/incubator-uniffle that referenced this pull request Dec 30, 2022
@advancedxy
Copy link
Contributor

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

It could be, but mostly it would be the base path of shuffleDataFile.. As the ShuffleDataFileName could be generated by the rule.

@zuston
Copy link
Member Author

zuston commented Dec 30, 2022

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

It could be, but mostly it would be the base path of shuffleDataFile.. As the ShuffleDataFileName could be generated by the rule.

Concrete data file name is necessary which could not be generated by client, because there are multi files for one partition.

@advancedxy
Copy link
Contributor

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

It could be, but mostly it would be the base path of shuffleDataFile.. As the ShuffleDataFileName could be generated by the rule.

Concrete data file name is necessary which could not be generated by client, because there are multi files for one partition.

I see, make sense.

@kaijchen kaijchen changed the title [Feature] Support multiple disk selection for one partition when using local storage feat: support multiple disk selection for one partition when using local storage Feb 10, 2023
@zuston zuston closed this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support concurrent write to multiple local disks for one partition

5 participants