update

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
pingcap · Feb 26, 2023 · ff6edfd · ff6edfd
1 parent 1d2040a
commit ff6edfd
Showing 1 changed file with 148 additions and 0 deletions.
diff --git a/docs/design/2022-11-24-resource-manage-for-ddl.md b/docs/design/2022-11-24-resource-manage-for-ddl.md
@@ -0,0 +1,148 @@
+# Proposal: resource manage for DDL
+- Author(s): [hawkingrei](https://github.com/hawkingrei) 
+- Tracking issue: #38025
+
+## Table of Contents
+
+<!-- TOC -->
+* [Abstract](#abstract)
+* [Background](#background)
+* [Design](#design)
+  * [Component](#component)
+  * [Dynamic goroutine pool](#dynamic-goroutine-pool)
+  * [Resource manager](#resource-manager)
+    * [scheduler interface](#scheduler-interface)
+    * [simple CPU utilization-based scheduler](#simple-CPU-utilization-based-scheduler)
+    * [Task scheduler](#task-scheduler)
+      * [Task meta](#task-meta)
+      * [Task meta management](#task-meta-management)
+      * [Task schedule](#task-schedule)
+
+<!-- TOC -->
+
+## Abstract
+
+This proposes a design of how to control the usage for CPU when to do DDL operator and we want to avoid concurrent ddl 
+affecting queries.
+
+## Background
+
+Currently, TiDB DDL is not well controlled, which may cause high CPU usage when we do DDL operations. For example, we 
+have a table with 1000 columns, and we add a column to it. It will cause high CPU usage. Concurrent DDL operations may 
+affect the query. We want to prevent this.
+
+## Design
+
+## Component
+
+1、dynamic goroutine pool
+
+Many backgound tasks（etc DDL tasks）normally could run under producer-consumer pattern. We can use a dynamic goroutine pool to control the goroutine number. 
+The pool is easy for developer to use. we can use API to control the global concurrency or task concurrency. Every job 
+in the pool will be collected runtime statistics and asynchronously submit to resource manager.
+
+2、resource manager
+
+resource manager is to receive the statistics from the pool and do some analysis to scheduler. The manager can inject 
+the different schedulers and dynamically start, stop or switch schedulers.
+
+## Dynamic goroutine pool
+
+### Implement
+
+dyanamic goroutine pool's have two main function.
+
+- limit the global concurrency and tune in anytime.
+- reuse the goroutine to avoid the cost of creating goroutine.
+
+### Resource manager
+
+#### scheduler interface
+
+```go
+// Scheduler is a scheduler interface
+type Scheduler interface {
+   Tune(component util.Component, p util.GorotinuePool) Command
+}
+```
+
+Component is the type tag for the tidb component, GorotinePool is the interface class for the pool, from which 
+the scheduler gets the status and running statistics of the pool, and Command is the scheduling command, currently 
+there are three types of commands: Downclock, Hold, Overclock.
+
+#### simple CPU utilization-based scheduler
+
+```go
+// CPUScheduler is a cpu scheduler
+type CPUScheduler struct{}
+
+// NewCPUScheduler is to create a new cpu scheduler
+func NewCPUScheduler() *CPUScheduler {
+   return &CPUScheduler{}
+}
+
+// Tune is to tune the goroutine pool
+func (*CPUScheduler) Tune(_ util.Component, pool util.GoroutinePool) Command {
+   // Check the time elapsed since the last tuner point to avoid frequent schedule
+   if time.Since(pool.LastTunerTs()) < util.MinSchedulerInterval.Load() {
+      return Hold
+   }
+   if cpu.GetCPUUsage() < 0.5 {
+      return Overclock
+   }
+   if cpu.GetCPUUsage() > 0.7 {
+      return Downclock
+   }
+   return Hold
+}
+
+```
+
+##### Task scheduler 
+
+After scheduling, the scheduler will return a command, and the pool will execute the corresponding command and change the 
+concurrency of the pool. If pool is overclock, we need to create new task and add task workers to the pool. If pool is downclock, we
+need to stop existed task and reduce task workers to the pool.
+
+
+###### Task meta
+
+After a task is generated by the pool, a taskbox is produced with the number of concurrent tasks, which contains the context 
+and communication channels necessary to execute the task. 
+
+```go
+// TaskBox is a box which contains all info about pool task.
+type TaskBox[T any, U any, C any, CT any, TF Context[CT]] struct {
+   constArgs   C        // C Constant parameters at runtime
+   contextFunc TF       // Context 
+   wg          *sync.WaitGroup // Wait for the task to finish 
+   task        chan Task[T]    // Task queue
+   resultCh    chan U          // Result Queue
+   taskID      uint64          // task ID
+   status      atomic.Int32 // task manager is able to make this task stop, wait or running
+}
+```
+
+###### Task meta management
+
+Task meta management, registering task meta information and all its taskboxes to TaskManage at the start of a task, 
+and removing all meta information for this task at the end of a task.
+
+Task Meta management is the shard map to avoid lock contention. The key is the task ID, and the value is the task meta.
+
+###### Task schedule
+
+Task schedule is based on the task meta management. when pool concurrency is changed, we need to check the task meta.
+
+* when we overclock the pool
+
+We need to find the most suitable task in the pool, create new taskbox for it and add it into the task queue.
+
+1. It is the most priority that the expansion task runs less than the scheduled number of concurrency.
+2. The task that has been running for the shortest time is selected.
+
+* when we downclock the pool
+
+1. It is the most priority that the shrink task runs more than the scheduled number of concurrency.
+2. The task that has been running for the longest time is selected.
+