Limit CPU and memory comsumption in resource-limited environment #3685

v0y4g3r · 2024-04-10T03:17:42Z

What problem does the new feature solve?

GreptimeDB is designed to scale from even embedded devices to mega scale cloud services. But when it runs on resource-limited devices, like industrial controller based on.Android and Windows, it does not have a framework to limit the resource consumption, namely CPU and memory usage.

What does the feature do?

This issue calls for a resource-limit framework, just like cgroup in Linux kernel, to limit the CPU and memory usage for those dedicated spawned tasks, like flush, compaction, etc.

Implementation challenges

Tokio does not provide instrumentation tools to probe the CPU and memory usage of submitted tasks, we can only wrap the tasks with our own metrics and using rate limiting strategies to limit inflight tasks.

sunng87 · 2024-04-11T03:52:45Z

This reminds me a discussion with @zyy17 about an general abstraction of task spawning:

an implementation independent spawn function as entrypoint
option to spawn to current thread, to thread pool, to local subprocess or remote process (managed by various resource manager, like k8s)
optional option to sepecify compute resources for the task

ActivePeter · 2024-08-05T09:08:39Z

I have applied for OSPP 2024 with this sub project.
Here's the reference: https://summer-ospp.ac.cn/org/prodetail/2432c0077?lang=en&list=pro
Here's the real-time records: https://fvd360f8oos.feishu.cn/docx/QTb0d75gpoCoHGxL6Q3cx9KBnig

Task Goals and Technical Solutions

I. Task Basic Goals

Encapsulate and modify common/runtime, achieve task priority, and submit a set of runtime libraries
- Modify the common/runtime in Greptime (which is a layer of encapsulation of tokio and has three thread pools to handle different tasks) to achieve the function of task priority.
Implement corresponding integrity and performance tests, do a good job in corresponding analysis, records, and source code that can be quickly reproduced
- Ensure comprehensive testing of the modified runtime library, including functional integrity testing and performance testing. At the same time, detailed analysis and records should be made to be able to quickly reproduce the testing process and results.
Try some further ideas (such as resource limitations, scheduling algorithms) and do a good job in corresponding summaries, records, and source code that can be quickly reproduced
- Explore further optimization directions such as resource limitations and scheduling algorithms, and summarize and record the process and results of the attempt, and retain the source code that can be quickly reproduced.

II. Technical Solutions

(I) Selection of Pending Timing

Probability calculation mode
- Calculate directly with probability (it is more flexible to set, but there may be too many consecutive pendings).
Fixed interval mode
- Priority 5, probability 0.5: 1 0 1 0
- Priority 4, probability 0.66: 11 0 11 0
- Priority 3, probability 0.75: 111 0 111 0
- Priority 2, probability 0.9 (10/11): 1111111111 0 1111111111 0
- Normal (no special priority): 11111111111111
Mode based on time accumulation and division
- Record the accumulated time adding_up_time and pend_t. Divide the entire running process with pend_t, and the corresponding division point is the timing of pend.
- When scheduling each time, see how many pend_t are contained in adding_up_time, and pend as many times (or only pend once). Then take the remainder of adding_up_time. This idea can more accurately control the frequency of triggering pend (how many times pend is triggered per unit time), and the period is pend_t.

(II) Pend Recovery Mechanism

Prevent tasks from not being scheduled again
- Returning pend directly may cause the awakened task to never be scheduled again in the future.
- The idea adopted is to register the waker again before returning pend and wake again later (add to the scheduling queue).
- The specific approach is to use a separate delayed wake-up task for wake-up.
- tokio::task::yield_now() offered an example

(III) Control Closed-Loop CPU and Dynamically Adjust the Pending Probability

Core idea
- When the CPU usage of the thread pool exceeds the set threshold, the deviation value is reflected as the pending probability sensitivity coefficient.
Observation of computational amount
- https://docs.rs/tokio-metrics/latest/tokio_metrics/struct.TaskMetrics.html: This library has relatively complete observation of the delay related to task.
- https://docs.rs/sysinfo/latest/sysinfo/: This library has the ability to observe resources across platforms.

v0y4g3r · 2024-08-05T09:14:52Z

@ActivePeter we can start with introducing a new wrapper runtime that can set different priorities for different tasks even if it's not yet referenced in the code base. Feel free to draft a PR once you're ready.

feat: add throttle runtime (#3685)

v0y4g3r added C-feature Category Features C-performance Category Performance Difficulty: Hard labels Apr 10, 2024

tisonkun removed the Difficulty: Hard label May 15, 2024

v0y4g3r assigned ActivePeter Aug 5, 2024

zyy17 unassigned ActivePeter Aug 20, 2024

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 28, 2024

feat: Limit CPU in runtime (GreptimeTeam#3685)

dd234c2

ActivePeter mentioned this issue Sep 28, 2024

feat: Limit CPU in runtime (#3685) #4782

Merged

3 tasks

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 28, 2024

feat: Limit CPU in runtime (GreptimeTeam#3685)

dd76151

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 28, 2024

feat: Limit CPU in runtime (GreptimeTeam#3685)

46991de

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 28, 2024

feat: Limit CPU in runtime (GreptimeTeam#3685)

e23f85e

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 28, 2024

feat: abstract for alternative runtime (GreptimeTeam#3685)

6e98474

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 28, 2024

feat: abstract for alternative runtime (GreptimeTeam#3685)

ba4ede3

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

9187e7c

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

51d91b0

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

8c5054e

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

98ba161

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

ac00a8e

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

b659487

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

c60462c

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

6fc6c52

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

a45e154

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

bce877e

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Sep 29, 2024

feat: add throttle runtime (GreptimeTeam#3685)

9ff081f

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Oct 20, 2024

feat: add throttle runtime (GreptimeTeam#3685)

1f4ba19

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Oct 23, 2024

feat: add throttle runtime (GreptimeTeam#3685)

8ecf99e

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Oct 23, 2024

feat: add throttle runtime (GreptimeTeam#3685)

c37ee8d

ActivePeter added a commit to ActivePeter/greptimedb that referenced this issue Oct 23, 2024

feat: add throttle runtime (GreptimeTeam#3685)

c69b984

github-merge-queue bot pushed a commit that referenced this issue Oct 24, 2024

feat: Limit CPU in runtime (#3685) (#4782)

9d3ee63

feat: add throttle runtime (#3685)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit CPU and memory comsumption in resource-limited environment #3685

Limit CPU and memory comsumption in resource-limited environment #3685

v0y4g3r commented Apr 10, 2024

sunng87 commented Apr 11, 2024

ActivePeter commented Aug 5, 2024

v0y4g3r commented Aug 5, 2024

Limit CPU and memory comsumption in resource-limited environment #3685

Limit CPU and memory comsumption in resource-limited environment #3685

Comments

v0y4g3r commented Apr 10, 2024

What problem does the new feature solve?

What does the feature do?

Implementation challenges

sunng87 commented Apr 11, 2024

ActivePeter commented Aug 5, 2024

Task Goals and Technical Solutions

I. Task Basic Goals

II. Technical Solutions

(I) Selection of Pending Timing

(II) Pend Recovery Mechanism

(III) Control Closed-Loop CPU and Dynamically Adjust the Pending Probability

v0y4g3r commented Aug 5, 2024