Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session: migrate test-infra to testify for tidb_test.go and bootstrap_test.go #28555

Merged
merged 8 commits into from
Oct 11, 2021

Conversation

tisonkun
Copy link
Contributor

@tisonkun tisonkun commented Oct 7, 2021

This closes #28322 .
This closes #28329 .

Release note

None

…_test.go

This closes pingcap#28322 .
This closes pingcap#28329 .

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Oct 7, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • lance6716
  • morgo

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 7, 2021
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2021
@tisonkun
Copy link
Contributor Author

tisonkun commented Oct 7, 2021

/cc @lance6716 @morgo

@lance6716 this PR changes the failpoint you introduced in #18714 to avoid goroutine leak. Please review.

@@ -569,7 +569,7 @@ func (d *ddl) doDDLJob(ctx sessionctx.Context, job *model.Job) error {
i := 0
for {
failpoint.Inject("storeCloseInLoop", func(_ failpoint.Value) {
d.cancel()
_ = d.Stop()
Copy link
Contributor Author

@tisonkun tisonkun Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lance6716 at this line only, so that we don't just cancel the context but trigger a graceful shutdown. I'm unsure whether or not it is the original failpoint want to do. In another way, we can ignore the leak.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change.

The original failpoint is to imitate that the TiDB-as-a-library is being closed when handling a DDL, to check if the goroutine is stuck inside this for-loop. My original test code is #18714 , but after some iterations it becomes current invasive failpoint 😂

I'll review rest of code tomorrow

Comment on lines +292 to +298
oomAction := config.GetGlobalConfig().OOMAction
defer func() {
config.UpdateGlobal(func(conf *config.Config) {
conf.OOMAction = oomAction
})
}()
ctx := context.Background()
Copy link
Contributor Author

@tisonkun tisonkun Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this store and restore for

  • TestUpgrade
  • TestIssue17979_1
  • TestIssue20900_2

Otherwise TestIssue17979_2 failed as TestUpgrade set the default oomAction to "log". Previously we don't meet this issue possibly due to different tests executed order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names like TestIssue17979_1 is not good right? how about give them proper name and add link in comments?

}

for _, ca := range cases {
func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap in a function so that defer works properly.

Comment on lines -42 to -43
s.dbName = "test_bootstrap"
s.dbNameBootstrap = "test_main_db_bootstrap"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't actually make use of them

Comment on lines +36 to +37
SetIndexUsageSyncLease(1)
defer SetIndexUsageSyncLease(0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes global variable and thus run in serial.

Comment on lines +76 to +77
require.NoError(t, err)
return se
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

se.Auth is for TestForIssue23387 only.

Comment on lines -148 to -150
func removeStore(c *C, dbPath string) {
os.RemoveAll(dbPath)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really in use.

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun
Copy link
Contributor Author

tisonkun commented Oct 7, 2021

/run-check_dev_2

1 similar comment
@tisonkun
Copy link
Contributor Author

tisonkun commented Oct 7, 2021

/run-check_dev_2

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 8, 2021
Copy link
Contributor

@morgo morgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

session/bootstrap_serial_test.go Outdated Show resolved Hide resolved
session/bootstrap_serial_test.go Show resolved Hide resolved
session/bootstrap_serial_test.go Show resolved Hide resolved
Comment on lines +114 to +116
if v.Scope != variable.ScopeSession {
count++
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit mask of options, testing for ScopeSession might be buggy if we introduce any other scopes like ScopeInstance. Using helper funcs is better suited:

// HasNoneScope returns true if the scope for the sysVar is None.
func (sv *SysVar) HasNoneScope() bool {
return sv.Scope == ScopeNone
}
// HasSessionScope returns true if the scope for the sysVar includes session.
func (sv *SysVar) HasSessionScope() bool {
return sv.Scope&ScopeSession != 0
}
// HasGlobalScope returns true if the scope for the sysVar includes global.
func (sv *SysVar) HasGlobalScope() bool {
return sv.Scope&ScopeGlobal != 0
}

I did not realize it, but ScopeNone variables are also persisted to the table (i.e. != ScopeSession is both Global and None). This looks to be intentional because system_time_zone needs to be persisted to the cluster. There might actually be bugs here because a Get on a ScopeNone returns sv.Value and doesn't check the persisted value:

// GetSessionOrGlobalSystemVar gets a system variable.
// If it is a session only variable, use the default value defined in code.
// Returns error if there is no such variable.
func GetSessionOrGlobalSystemVar(s *SessionVars, name string) (string, error) {
sv := GetSysVar(name)
if sv == nil {
return "", ErrUnknownSystemVar.GenWithStackByArgs(name)
}
if sv.HasNoneScope() {
return sv.Value, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some additional research, and all the ScopeNone variables look to be safe to non-persist in the system table. They are usually overwritten by each TiDB server during startup.

I've forked this to #28667

tisonkun and others added 2 commits October 9, 2021 13:57
Co-authored-by: Morgan Tocker <tocker@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
@tisonkun
Copy link
Contributor Author

tisonkun commented Oct 9, 2021

@morgo thanks for your review. Your comments are addressed.

For #28667 , let's keep track on the issue.

@morgo
Copy link
Contributor

morgo commented Oct 9, 2021

/run-check_dev_2

@tisonkun
Copy link
Contributor Author

@morgo CI fixed. Please give this PR one more round review.

@morgo morgo self-requested a review October 11, 2021 02:16
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 11, 2021
@morgo
Copy link
Contributor

morgo commented Oct 11, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: ac6a0db

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 11, 2021
@ti-chi-bot ti-chi-bot merged commit 2f9d591 into pingcap:master Oct 11, 2021
@tisonkun tisonkun deleted the session-testify branch October 11, 2021 02:24
conf.TiKVClient.AsyncCommit.SafeWindow = 0
conf.TiKVClient.AsyncCommit.AllowedClockDrift = 0
})
tikv.EnableFailpoints()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change cause a performance regression.
In the old code, EnableFailpoints only works on the normal test cases.
After the change, EnableFailpoints will affect the benchmark tests.

When the the failpoint is not found in the benchmark test code, it trigger this code path

image

Copy link
Contributor

@tiancaiamao tiancaiamao Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the chart here http://www.zenlife.tk:18081/

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaiamao is there a way configuring tests and benchmarks separately? Previous we turn on failpoints in test, but it is a global state change. I guess we don't observe "regression" then because it occasionally run benchmark at first?

@disksing IIUC we discuss about tikv failpoints regressions ever, could you also take a look at this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaiamao emmm, it is because we already run benchmark & tests separately, but TestMain affects either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check if it is running bench or test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiancaiamao hacking and found several candidates to address this regression, please check #28709

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check if it is running bench or test?

@disksing I share the same preference with you and here is one possible solution #28709 . Please check.

I have to say that "check if it is running bench or test" isn't supported out-of-the-box by Golang, but we MAY achieve it manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
7 participants