Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: part 1 of add-index integrating dxf dynamic modify param #59046

Merged

Conversation

D3Hunter
Copy link
Contributor

@D3Hunter D3Hunter commented Jan 20, 2025

What problem does this PR solve?

Issue Number: ref #57497, ref #57229

Problem Summary:

What changed and how does it work?

This is the first part, will add integrate test later

  • add a routine in executeDistTask to detect whether there are param mismatch between DDL job and DXF task, and we call ModifyTaskByID to let DXF handle the rest param modify work if detected
    • there is a way to achieve this without start a routine, but requires refactor transitOneJobStep to start txn on need first
// Note: we can achieve the same effect by calling ModifyTaskByID directly inside
// the process of 'ADMIN ALTER DDL JOB xxx', so we can eliminate the goroutine,
// but if the task hasn't been created we need to make sure the task is created
// with config after ALTER DDL JOB is executed. A possible solution is to make
// the DXF task submission and 'ADMIN ALTER DDL JOB xxx' txn conflict with each
// other when they overlap in time, by modify the job at the same time when submit
// task, as we are using optimistic txn. But this will cause WRITE CONFLICT with
// outer txn in transitOneJobStep.
  • remove some useless code such as skipWriteBinlog
  • some refactor such as create static errors using errors.New of golang to avoid stack overhead of pingcap/errors and replace error equal check with goerrors.Is

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 20, 2025
Copy link

tiprow bot commented Jan 20, 2025

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jan 20, 2025

Codecov Report

Attention: Patch coverage is 88.82353% with 19 lines in your changes missing coverage. Please review.

Project coverage is 75.7585%. Comparing base (444c38f) to head (6c7230a).
Report is 61 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #59046        +/-   ##
================================================
+ Coverage   73.0574%   75.7585%   +2.7010%     
================================================
  Files          1689       1743        +54     
  Lines        466990     485676     +18686     
================================================
+ Hits         341171     367941     +26770     
+ Misses       104844      95371      -9473     
- Partials      20975      22364      +1389     
Flag Coverage Δ
integration 49.8536% <10.2941%> (?)
unit 73.0140% <88.8235%> (+0.7559%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 64.2598% <ø> (+18.8511%) ⬆️

@D3Hunter D3Hunter changed the title [WIP]ddl: part 1 of add-index integrating dxf dynamic modify param ddl: part 1 of add-index integrating dxf dynamic modify param Jan 21, 2025
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2025
@D3Hunter
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jan 21, 2025

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jan 21, 2025

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jan 22, 2025

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor Author

D3Hunter commented Feb 7, 2025

@lance6716 @tangenta

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 / 19 files viewed

@D3Hunter
Copy link
Contributor Author

@lance6716 ptal

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

err = g.Wait()
return err
}

// Note: we can achieve the same effect by calling ModifyTaskByID directly inside
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review most of DXF codes so I'm not sure what's the meaning od this function comment. Just to check:

it says we only allow some functions to write to the system table of DXF, especially, we don't allow ADMIN ALTER DDL JOB to write because it's heavier to rollback on conflict?

Can you explain in this design, which functions are allowed to write the system table? I want to check if they still can conflict with each other.

zap.String("state", currTask.State.String()))
continue
}
if err = taskManager.ModifyTaskByID(ctx, taskID, &proto.ModifyParam{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when checking the implementation of ModifyTaskByID I see the "old values" are not added to the UPDATE .. WHERE clause. So double check that it is expected for two concurrent call of ModifyTaskByID, the final task arguments can be updated as any of the call?

Copy link
Contributor Author

@D3Hunter D3Hunter Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModifyTaskByID will transit the task into modifying state if it's in the prev state, so only 1 will success for concurrent calls

if err != nil {
return 0, err
}
return min(workerCnt, cpuCount), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the reason why we limit the concurrency? If the work is not a pure compute workload, increase the concurrency to more that CPU count can make use of IO time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only abstract logic in here https://github.com/pingcap/tidb/pull/59046/files#diff-45ad039a8b87835b475a52615355c444538d3a1f31a84d7b3ac94e15b1ce8721L2550-L2554

valid range of tidb_ddl_reorg_worker_cnt is [1, 256], but for a DXF task, valid range of task concurrency or slots is [1, cpuCount]

@D3Hunter
Copy link
Contributor Author

ping @lance6716

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 14, 2025
jobID, taskID int64,
lastConcurrency, lastBatchSize, lastMaxWriteSpeed int,
) {
logger := logutil.DDLLogger().With(zap.Int64("jobId", jobID), zap.Int64("taskId", taskID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger := logutil.DDLLogger().With(zap.Int64("jobId", jobID), zap.Int64("taskId", taskID))
logger := logutil.DDLLogger().With(zap.Int64("jobID", jobID), zap.Int64("taskID", taskID))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #59343

taskManager storage.Manager,
done chan struct{},
jobID, taskID int64,
lastConcurrency, lastBatchSize, lastMaxWriteSpeed int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use "old" / "new" to replace "last" / "latest"?

Copy link

ti-chi-bot bot commented Feb 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lance6716, tangenta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 17, 2025
Copy link

ti-chi-bot bot commented Feb 17, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-02-14 07:13:53.737282322 +0000 UTC m=+599876.133504384: ☑️ agreed by lance6716.
  • 2025-02-17 03:38:55.936594453 +0000 UTC m=+846178.332816516: ☑️ agreed by tangenta.

@ti-chi-bot ti-chi-bot bot merged commit baf1600 into pingcap:master Feb 17, 2025
27 checks passed
@D3Hunter D3Hunter deleted the add-index/integrate-dxf-dynamic-param branch February 18, 2025 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants