HDDS-11949. Update Recon OM Sync default configs #7600

devmadhuu · 2024-12-19T09:28:51Z

What changes were proposed in this pull request?

This PR change is to update Recon OM Sync default configs values.

Default configs for recon om sync are recommended based on recent performance test and evaluation of recon om sync process and underlying tasks execution speed.

Recommended configs default values:

ozone.recon.om.snapshot.task.interval.delay -> 5s
recon.om.delta.update.limit -> 50000
recon.om.delta.update.loop.limit -> 50

Above are recommended and default configs for high write TPS workload in the range of approx 5k to achieve near real time sync between Recon and OM data.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11949

How was this patch tested?

Tested manually with existing Junit test cases.

…cker configs.

errose28 · 2024-12-19T17:34:23Z

Thanks for adding this @devmadhuu. Could you share the benchmarks that were used to arrive at these numbers with the community?

devmadhuu · 2024-12-20T10:02:16Z

Thanks for adding this @devmadhuu. Could you share the benchmarks that were used to arrive at these numbers with the community?

yes sure, thanks @errose28 . We did following performance benchmarking testing for Recon OM sync process flow.

Workload test ran for 5K TPS (create/commit operations) on cluster:

ozone freon ockrw -n 10000000 -t 100 --percentage-read 0 --size 0 -r 1000000 -v voltest -b buckettest -p performanceTest

Following configs:

Recon heap allocation - 31 GB
ozone.recon.om.snapshot.task.interval.delay - 5s
recon.om.delta.update.limit - 50k
recon.om.delta.update.loop.limit - 50

Test ran for 39 mins and approx 10M OM DB events got generated having a mix of following events due to 5K create/commit key operations per sec:
create/commit

- 
-     insert in open key (PUT)
-     update bucket info (UPDATE)
-     delete from open key (DELETE)
-     update bucket info (UPDATE)
-     insert to key (PUT)
-     insert to delete key (PUT)
-     removal from deleted key table (DELETE)

Further observations:

Approx 2.1M OM DB events per min got generated by workload till the whole test run duration.

```
No JVM pause detection and GC pauses.
```

Recon OM data was lagging by approx 330k OM DB events in one sync interval and it was near real time sync while test workload was in progress.

```
Recon OM sync is divided among following sub tasks:
-     Get from OM
-     Perform DB update in batch
-     Prepare events based on DB update in batch.- These 3 tasks together took 16 secs
-     Process those DB events by each of the 4 background task concurrently.- 30 secs
```
So based on above perf stats, Recon was actually processing end to end 1.4M per min and OM was generating at a pace of 2.1M per min. This data also confirmed by Grafana metrics. Based on this data, if we increase delta update limit further, it will not help much because processing time will increase and after each run, there is an delay of 1 min. so we need to reduce the delay further to 5s, so that lag between Recon and OM is kept to min in the range of just 330k (1 sync cycle will match up this as well after test workload finishes).

Our next task would be to think, how we can optimize the processing speed of background tasks, though there is limited possibility due to the nature of data and Recon's background tasks must process all the events in sequence and cannot process concurrently, we need to think and see the optimization possibility or possibility of processing concurrently in processing logic of single event by each background task.
Raised HDDS-11688 and HDDS-11953 to handle further optimization.

HDDS-11949. Ozone Recon - Update Recon OM Sync default configs and do…

df2734f

…cker configs.

devmadhuu marked this pull request as draft December 19, 2024 09:28

devmadhuu marked this pull request as ready for review December 19, 2024 09:29

devmadhuu requested review from sumitagrawl and ArafatKhan2198 December 19, 2024 09:32

adoroszlai added the recon label Dec 19, 2024

adoroszlai changed the title ~~HDDS-11949. Ozone Recon - Update Recon OM Sync default configs and docker configs~~ HDDS-11949. Update Recon OM Sync default configs Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-11949. Update Recon OM Sync default configs #7600

HDDS-11949. Update Recon OM Sync default configs #7600

devmadhuu commented Dec 19, 2024

errose28 commented Dec 19, 2024

devmadhuu commented Dec 20, 2024 •

edited

Loading

HDDS-11949. Update Recon OM Sync default configs #7600

Are you sure you want to change the base?

HDDS-11949. Update Recon OM Sync default configs #7600

Conversation

devmadhuu commented Dec 19, 2024

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

errose28 commented Dec 19, 2024

devmadhuu commented Dec 20, 2024 • edited Loading

devmadhuu commented Dec 20, 2024 •

edited

Loading