feat: support multiple disk selection for one partition when using local storage #435

zuston · 2022-12-16T11:50:45Z

What changes were proposed in this pull request?

To support multiple disk selection for one partition when using local storage

Introduce the StorageSelector pluggable selector to support different selection strategy, such as multiple disk selection strategy(ChainableLocalStorageSelector) or concurrent strategy.
Introduce the LocalFileClientReadMultiFileHandler to manage multiple handlers, every handler is bound to every disk's data file and index file.

Why are the changes needed?

To make full use of the local disks capacity.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

UTs

zuston · 2022-12-16T11:52:54Z

This is a draft to support multiple disk selection for one partition. @jerqi @advancedxy @xianjingfeng

If the design is OK, I will go ahead.

advancedxy · 2022-12-16T11:55:02Z

This is a draft to support multiple disk selection for one partition. @jerqi @advancedxy @xianjingfeng

If the design is OK, I will go ahead.

I would take a look this weekend.

xianjingfeng · 2022-12-17T13:57:54Z

Can we write to each disk evenly ?
Can we let the client to decide whether to write to multiple disks? I think only large partitions need multiple disks.

zuston · 2022-12-17T14:52:04Z

Can we write to each disk evenly ?

No. Currently storage of one partition only will be switched when disk reaches high watermark.

Can we let the client to decide whether to write to multiple disks?

Yes, this could support in the future.

I think only large partitions need multiple disks.

No. If the disk space is small and partitions are many, this feature could make full use of local disk instead of fall back to HDFS

xianjingfeng · 2022-12-19T05:57:36Z

Can we write to each disk evenly ?

No. Currently storage of one partition only will be switched when disk reaches high watermark.

Can we let the client to decide whether to write to multiple disks?

Yes, this could support in the future.

I think only large partitions need multiple disks.

No. If the disk space is small and partitions are many, this feature could make full use of local disk instead of fall back to HDFS

If we just support switch to another disk when disk reaches high watermark, i think it is unnecessary to let the client to decide whether to write to multiple disks. I just want to make the large partition write faster, so I hope to support concurrent disk writing for a partition.

zuston · 2022-12-19T06:18:04Z

I just want to make the large partition write faster, so I hope to support concurrent disk writing for a partition.

I think this feature you mentioned is also meaningful, I also have similar thought.

Concurrent writing could be supported later. In current PR, I introduce the ChainableLocalStorage which is a view of multiple local storages. If you want to support concurrent writing, you could implement a PooledLocalStorage which could be as a view to control the concurrency of writing like #396.

jerqi · 2022-12-21T03:19:35Z

Is this pr ready for review?

zuston · 2022-12-21T03:22:44Z

Is this pr ready for review?

No, I will do some change.

advancedxy · 2022-12-21T03:21:47Z

proto/src/main/proto/Rss.proto

  int64 offset = 6;
  int32 length = 7;
  int64 timestamp = 8;
+  int32 storageId = 9;


I don't think this is a good design choice. It's leaking too much details to client.
And there're other storage types that doesn't need local storage, such as memory and memory_hdfs.

Couldn't we just reuse the meta data in
private final Map<PartitionUnionKey, ChainableLocalStorage> partitionsOfLocalStorage;?

Or another way would be that look up all the disk dirs to find the correct storage path?

Let me think twice.

The original shuffle-data reading will follow the rule

Reading the remote whole index file to split and filter to get the required segments

Reading the shuffle-data according to above segment's offset and length one by one

If we expose the unified abstraction for client to obey above reading sequence, it means we have to compose multiple files into abstract one and re-calculate the offset and length for every request to map the real file.

I don't think this is a good design choice. It's leaking too much details to client.

Yes. I also prefer giving a unified abstraction to hide the multiple under storages' detail for client, but currently I have no good ideas on this.

And there're other storage types that doesn't need local storage, such as memory and memory_hdfs.

Emmm... Only localfile storage will use this proto. Memory reading will use an independent api and HDFS reading will fetch data directly from HDFS datanode instead of fetch remote data from shuffle-server.

jerqi · 2022-12-21T12:29:58Z

proto/src/main/proto/Rss.proto

  int64 offset = 6;
  int32 length = 7;
  int64 timestamp = 8;
+  int32 storageId = 9;


We should avoid using storageId. It expose too many details.

Do you have any idea?

zuston · 2022-12-21T12:38:20Z

I have updated the code, which is a little bit different with previous version commit in LocalStorageManager part.

I introduce the StorageSelector pluggable selector to support different selection strategy, such as multiple disk selection strategy(ChainableLocalStorageSelector) or concurrent strategy.

codecov-commenter · 2022-12-22T06:15:31Z

Codecov Report

Merging #435 (5f7c22d) into master (5321292) will decrease coverage by 0.07%.
The diff coverage is 58.38%.

@@             Coverage Diff              @@
##             master     #435      +/-   ##
============================================
- Coverage     58.62%   58.55%   -0.08%     
- Complexity     1642     1658      +16     
============================================
  Files           199      203       +4     
  Lines         11173    11274     +101     
  Branches        989      997       +8     
============================================
+ Hits           6550     6601      +51     
- Misses         4231     4282      +51     
+ Partials        392      391       -1

Impacted Files	Coverage Δ
...pache/uniffle/server/ShuffleServerGrpcService.java	`0.80% <0.00%> (-0.01%)`	⬇️
.../org/apache/uniffle/server/ShuffleTaskManager.java	`74.16% <0.00%> (-0.51%)`	⬇️
...uniffle/storage/factory/ShuffleHandlerFactory.java	`0.00% <0.00%> (ø)`
...dler/impl/LocalFileClientReadMultiFileHandler.java	`0.00% <0.00%> (ø)`
...orage/request/CreateShuffleReadHandlerRequest.java	`0.00% <ø> (ø)`
...orage/handler/impl/LocalFileClientReadHandler.java	`52.63% <25.00%> (-8.66%)`	⬇️
...r/storage/local/ChainableLocalStorageSelector.java	`88.57% <88.57%> (ø)`
...rg/apache/uniffle/server/ShuffleDataReadEvent.java	`94.73% <100.00%> (+4.73%)`	⬆️
...a/org/apache/uniffle/server/ShuffleServerConf.java	`99.25% <100.00%> (+0.01%)`	⬆️
...he/uniffle/server/storage/LocalStorageManager.java	`90.00% <100.00%> (+1.64%)`	⬆️
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

advancedxy · 2022-12-22T12:36:46Z

This is ready for review?

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

zuston · 2022-12-22T12:44:11Z

This is ready for review?

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

Yes. Ready to review.

zuston · 2022-12-23T02:01:45Z

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

Looking forward to better design.

advancedxy · 2022-12-26T08:00:45Z

Seems like we haven't get rid of storageId in proto.

I haven't thought it thoroughly, but it should be possible to eliminate the need of storageId.

Looking forward to better design.

I went through this PR today. It may require some id to indicate which shuffle_data/shuffle_index current points, but I don't think it's a good design to have the client passing around the storageIndex/storageId.

One possible solution would be similar with how hdfs handles multiple IndexFile/DataFile.
For local shuffle client:

Get all index data once, resulting a list of
Create a list of LocalFileClientReaderHandler, which corresponding a list of shuffle data file.
Iterate through sequentially/parallelly to fetch the data.

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

zuston · 2022-12-29T06:55:25Z

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

local storage apache#435

advancedxy · 2022-12-30T03:27:02Z

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

It could be, but mostly it would be the base path of shuffleDataFile.. As the ShuffleDataFileName could be generated by the rule.

zuston · 2022-12-30T03:38:06Z

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

It could be, but mostly it would be the base path of shuffleDataFile.. As the ShuffleDataFileName could be generated by the rule.

Concrete data file name is necessary which could not be generated by client, because there are multi files for one partition.

advancedxy · 2023-01-03T08:07:44Z

In step 2 and 3, you may pass a ShuffleDataFileId/ ShuffleIndexId to indicate which shuffle file is fetched. It may be effectively the same as StorageId, but I believe it's more natural to use ShuffleDataFileId here.

ShuffleDataFileId is ShuffleDataFileName ?

It could be, but mostly it would be the base path of shuffleDataFile.. As the ShuffleDataFileName could be generated by the rule.

Concrete data file name is necessary which could not be generated by client, because there are multi files for one partition.

I see, make sense.

zuston requested review from advancedxy, jerqi and xianjingfeng December 16, 2022 11:51

zuston mentioned this pull request Dec 16, 2022

[FEATURE] Support concurrent write to multiple local disks for one partition #436

Closed

3 tasks

zuston force-pushed the LocalStorageSwitch branch from 7f5aeb6 to 69b8a64 Compare December 17, 2022 13:31

zuston marked this pull request as draft December 21, 2022 03:23

advancedxy requested changes Dec 21, 2022

View reviewed changes

jerqi reviewed Dec 21, 2022

View reviewed changes

zuston force-pushed the LocalStorageSwitch branch from 69b8a64 to dcde67e Compare December 21, 2022 12:32

zuston marked this pull request as ready for review December 21, 2022 12:36

LocalStorageSwitch

6955490

zuston force-pushed the LocalStorageSwitch branch from dcde67e to 6955490 Compare December 22, 2022 02:44

zuston added 5 commits December 22, 2022 12:52

disable multiple disk selection by default

6785e6c

checkstyle fix

cd37c7f

Return null when getting storage for reader

6ae7f4c

minor optimize

d1dace9

fix null check

ed0bc0d

Add tests about multiple disk selection

689d61b

zuston changed the title ~~[WIP][Feature] Support multiple disk selection for one partition when using local storage~~ [Feature] Support multiple disk selection for one partition when using local storage Dec 22, 2022

zuston requested a review from jerqi December 22, 2022 07:36

zuston requested a review from advancedxy December 22, 2022 07:36

zuston added 2 commits December 22, 2022 16:17

optimize

0bea079

fix

5f7c22d

zuston linked an issue Dec 23, 2022 that may be closed by this pull request

[FEATURE] Support concurrent write to multiple local disks for one partition #436

Closed

3 tasks

zuston added a commit to zuston/incubator-uniffle that referenced this pull request Dec 30, 2022

[Feature] Support multiple disk selection for one partition when using

cc4f892

local storage apache#435

zuston mentioned this pull request Jan 4, 2023

[ISSUE-380] Refactor the flush process to fix fallback fail #383

Merged

kaijchen changed the title ~~[Feature] Support multiple disk selection for one partition when using local storage~~ feat: support multiple disk selection for one partition when using local storage Feb 10, 2023

zuston closed this Nov 9, 2023

feat: support multiple disk selection for one partition when using local storage #435

feat: support multiple disk selection for one partition when using local storage #435

Uh oh!

Conversation

zuston commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

zuston commented Dec 16, 2022

Uh oh!

advancedxy commented Dec 16, 2022

Uh oh!

xianjingfeng commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuston commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xianjingfeng commented Dec 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuston commented Dec 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerqi commented Dec 21, 2022

Uh oh!

zuston commented Dec 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuston commented Dec 21, 2022

Uh oh!

codecov-commenter commented Dec 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

advancedxy commented Dec 22, 2022

Uh oh!

zuston commented Dec 22, 2022

Uh oh!

zuston commented Dec 23, 2022

Uh oh!

advancedxy commented Dec 26, 2022

Uh oh!

zuston commented Dec 29, 2022

Uh oh!

advancedxy commented Dec 30, 2022

Uh oh!

zuston commented Dec 30, 2022

Uh oh!

advancedxy commented Jan 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zuston commented Dec 16, 2022 •

edited

Loading

xianjingfeng commented Dec 17, 2022 •

edited

Loading

zuston commented Dec 17, 2022 •

edited

Loading

xianjingfeng commented Dec 19, 2022 •

edited

Loading

zuston commented Dec 19, 2022 •

edited

Loading

codecov-commenter commented Dec 22, 2022 •

edited

Loading