Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize maintenance rebalance/re-replicate with direct asd-to-asd communication #716

Open
domsj opened this issue May 10, 2017 · 5 comments

Comments

@domsj
Copy link
Contributor

domsj commented May 10, 2017

Rebalance can be optimized by having the too full asd send the fragment data to the not-yet-full-enough asd directly.

Similarly for repair in case of a replication policy it should be possible to send the fragment data directly between the asds.

@dejonghb
Copy link
Member

dejonghb commented May 11, 2017

Quite some impact from maintenance trying to rebalance an asymmetric backend (1 disk extra in 1 node in this setup; but extra empty disks/nodes would have simular behaviour)

Throughput from dd in a vm via edge:

maintenance off:

...
1494507344 5 16.3308 s 131 MB/s
1494507361 6 17.5935 s 122 MB/s
1494507380 7 18.4305 s 117 MB/s

maintenance on:

1494507400 8 27.6425 s 77.7 MB/s
1494507428 9 50.8574 s 42.2 MB/s
1494507480 10 32.9535 s 65.2 MB/s
1494507514 11 44.0469 s 48.8 MB/s

maintenance off:

1494507561 12 17.793 s 121 MB/s
1494507580 13 17.5795 s 122 MB/s
1494507599 14 18.3454 s 117 MB/s
1494507618 15 17.8269 s 120 MB/s
1494507637 16 19.3551 s 111 MB/s

maintenance on:

1494507657 17 49.1189 s 43.7 MB/s
1494507709 18 42.4096 s 50.6 MB/s
1494507753 19 41.4204 s 51.8 MB/s
1494507797 20 33.4441 s 64.2 MB/s

Network without maintenance:

       eth2       
 KB/s in  KB/s out
147599.3  54549.87
189334.6  54740.97
199151.6  54746.13
167426.1  16094.95
222470.0  39147.36
219685.5  54783.22
206448.5  54979.32
330253.6  55099.64
185136.3  37519.75

network with maintenance:

       eth2       
 KB/s in  KB/s out
409364.7  505545.1
433639.6  513500.1
418442.1  570177.3
454566.2  498419.2
453942.3  513534.5
420513.6  435244.4
438833.4  466755.9
473411.2  526030.5
518031.7  369577.4
489400.4  430726.2

@dejonghb
Copy link
Member

Maybe the rebalancing should not be enabled by default, given the impact on the network (and disks) that gets lost for ingest?

Is the time/work done for moving old data around indeed worth the effort? Probably this also depends on the use case and for a constant ingest things might be different than for a bursty one...

Maybe the decision when to move data around plus from where to where to move is also something that needs more thoughtful insight (policies used / capacity planning / ...) than the maintenance process itself has?

ps/ rebalancing can be turned off via

alba update-maintenance-config --disable-rebalance --config <abm-configurl>

@wimpers
Copy link

wimpers commented May 19, 2017

Isn't there a way to limit the impact of rebalancing (lowering its priority) so there still is some rebalancing going on?

@wimpers wimpers added this to the H milestone May 29, 2017
@dejonghb
Copy link
Member

rebalance

@domsj domsj self-assigned this Jun 2, 2017
@wimpers wimpers modified the milestones: I, H Aug 29, 2017
@wimpers wimpers modified the milestones: I, J Nov 28, 2017
@wimpers wimpers modified the milestones: J, M Mar 6, 2018
@wimpers wimpers modified the milestones: M, Roadmap Sep 14, 2018
@toolslive
Copy link
Member

waiting on QA effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants