Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
  • Loading branch information
hawkingrei committed Feb 26, 2023
1 parent 1d2040a commit ff6edfd
Showing 1 changed file with 148 additions and 0 deletions.
148 changes: 148 additions & 0 deletions docs/design/2022-11-24-resource-manage-for-ddl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Proposal: resource manage for DDL
- Author(s): [hawkingrei](https://github.com/hawkingrei)
- Tracking issue: #38025

## Table of Contents

<!-- TOC -->
* [Abstract](#abstract)
* [Background](#background)
* [Design](#design)
* [Component](#component)
* [Dynamic goroutine pool](#dynamic-goroutine-pool)
* [Resource manager](#resource-manager)
* [scheduler interface](#scheduler-interface)
* [simple CPU utilization-based scheduler](#simple-CPU-utilization-based-scheduler)
* [Task scheduler](#task-scheduler)
* [Task meta](#task-meta)
* [Task meta management](#task-meta-management)
* [Task schedule](#task-schedule)

<!-- TOC -->

## Abstract

This proposes a design of how to control the usage for CPU when to do DDL operator and we want to avoid concurrent ddl
affecting queries.

## Background

Currently, TiDB DDL is not well controlled, which may cause high CPU usage when we do DDL operations. For example, we
have a table with 1000 columns, and we add a column to it. It will cause high CPU usage. Concurrent DDL operations may
affect the query. We want to prevent this.

## Design

## Component

1、dynamic goroutine pool

Many backgound tasks(etc DDL tasks)normally could run under producer-consumer pattern. We can use a dynamic goroutine pool to control the goroutine number.
The pool is easy for developer to use. we can use API to control the global concurrency or task concurrency. Every job
in the pool will be collected runtime statistics and asynchronously submit to resource manager.

2、resource manager

resource manager is to receive the statistics from the pool and do some analysis to scheduler. The manager can inject
the different schedulers and dynamically start, stop or switch schedulers.

## Dynamic goroutine pool

### Implement

dyanamic goroutine pool's have two main function.

- limit the global concurrency and tune in anytime.
- reuse the goroutine to avoid the cost of creating goroutine.

### Resource manager

#### scheduler interface

```go
// Scheduler is a scheduler interface
type Scheduler interface {
Tune(component util.Component, p util.GorotinuePool) Command
}
```

Component is the type tag for the tidb component, GorotinePool is the interface class for the pool, from which
the scheduler gets the status and running statistics of the pool, and Command is the scheduling command, currently
there are three types of commands: Downclock, Hold, Overclock.

#### simple CPU utilization-based scheduler

```go
// CPUScheduler is a cpu scheduler
type CPUScheduler struct{}

// NewCPUScheduler is to create a new cpu scheduler
func NewCPUScheduler() *CPUScheduler {
return &CPUScheduler{}
}

// Tune is to tune the goroutine pool
func (*CPUScheduler) Tune(_ util.Component, pool util.GoroutinePool) Command {
// Check the time elapsed since the last tuner point to avoid frequent schedule
if time.Since(pool.LastTunerTs()) < util.MinSchedulerInterval.Load() {
return Hold
}
if cpu.GetCPUUsage() < 0.5 {
return Overclock
}
if cpu.GetCPUUsage() > 0.7 {
return Downclock
}
return Hold
}

```

##### Task scheduler

After scheduling, the scheduler will return a command, and the pool will execute the corresponding command and change the
concurrency of the pool. If pool is overclock, we need to create new task and add task workers to the pool. If pool is downclock, we
need to stop existed task and reduce task workers to the pool.


###### Task meta

After a task is generated by the pool, a taskbox is produced with the number of concurrent tasks, which contains the context
and communication channels necessary to execute the task.

```go
// TaskBox is a box which contains all info about pool task.
type TaskBox[T any, U any, C any, CT any, TF Context[CT]] struct {
constArgs C // C Constant parameters at runtime
contextFunc TF // Context
wg *sync.WaitGroup // Wait for the task to finish
task chan Task[T] // Task queue
resultCh chan U // Result Queue
taskID uint64 // task ID
status atomic.Int32 // task manager is able to make this task stop, wait or running
}
```

###### Task meta management

Task meta management, registering task meta information and all its taskboxes to TaskManage at the start of a task,
and removing all meta information for this task at the end of a task.

Task Meta management is the shard map to avoid lock contention. The key is the task ID, and the value is the task meta.

###### Task schedule

Task schedule is based on the task meta management. when pool concurrency is changed, we need to check the task meta.

* when we overclock the pool

We need to find the most suitable task in the pool, create new taskbox for it and add it into the task queue.

1. It is the most priority that the expansion task runs less than the scheduled number of concurrency.
2. The task that has been running for the shortest time is selected.

* when we downclock the pool

1. It is the most priority that the shrink task runs more than the scheduled number of concurrency.
2. The task that has been running for the longest time is selected.

0 comments on commit ff6edfd

Please sign in to comment.