Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn: allow larger initial chunk size for read check-ts for read-consistency #37226

Open
cfzjywxk opened this issue Aug 18, 2022 · 8 comments
Open
Assignees
Labels
sig/transaction SIG:Transaction type/enhancement The issue or PR belongs to an enhancement.

Comments

@cfzjywxk
Copy link
Contributor

Enhancement

When the read check-ts feature is used, there's a constraint that the statement retry could work only when conflicts are detected and there're no result packets are responded to the client yet.

However, the init chunk size has a limit upper bound which is 32, which means the select statement may respond error to the client if the total result row number is larger than 32 and there's possible conflict after the first chunk is responded. As a result, the client may receive errors and have to process these errors if the read check-ts feature is enabled and the whole result row number is large, even the conflict happens rarely.

An enhancement for the read check-ts feature is to allow the init chunk size to be adjusted to a larger value like 1024 so the client does not need to handle errors when the conflict happens rarely and the select result row number is large.

@cfzjywxk cfzjywxk added type/enhancement The issue or PR belongs to an enhancement. sig/transaction SIG:Transaction labels Aug 18, 2022
@cfzjywxk
Copy link
Contributor Author

@TonsnakeLin
This enhancement could be considered together with the current ongoing ts optimization for read-consistency isolation level.

@TonsnakeLin
Copy link
Contributor

However, the init chunk size has a limit upper bound which is 32, which means the select statement may respond error to the client if the total result row number is larger than 32
What's the way TiDB process the result sets whose rows are more than 32? Get all the rows at once, or get 32 rows once time and loop to finished?

@TonsnakeLin
Copy link
Contributor

I found that adjusting tidb_init_chunk_size can't reslove this problem. The tidb_init_chunk_size can't control the rows fetched from selectResp. Even tidb_init_chunk_size is 32, it cant fetch more than 32 at once, which is depended on tidb_max_chunk_size.

@cfzjywxk
Copy link
Contributor Author

Looks like the init_chunk_size would be deprecated in the future, and does it mean that we could adjust the tidb_max_chunk_size to work around the write conflict error?

@TonsnakeLin
Copy link
Contributor

Looks like the init_chunk_size would be deprecated in the future, and does it mean that we could adjust the tidb_max_chunk_size to work around the write conflict error?

I'm not sure how to use init_chunk_size for the other scenes, but in (r *selectResult) Next(ctx context.Context, chk *chunk.Chunk) which is used to get results from tikv for select statement, It can store more rows than init_chunk_size.

Can adjusting tidb_max_chunk_size work around the write conflict error? IMO, it can't except using paging mode, every cop_task receives all data from TiKV at once, (r *selectResult) Next fetches the data from cop_task response in one or more rounds which depends on max_chunk_size .

@cfzjywxk
Copy link
Contributor Author

We need to figure out the workaround strategy and document it, especially when the feature scope of the lazy-tso would be expanded to DMLs. I'm not quite sure if there would be extra development work to adapt it.

@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Aug 24, 2022

Another diagnosis enhancement requirement is that some metrics or slow log information is needed to show the impact of the retry, for example, if the user may mistakenly enable the RCReadCheckTS but actually there're quite a lot of conflicts.

@TonsnakeLin
Copy link
Contributor

This is the proposals to add statistics for WriteConflict caused by RCCheckTS isolation.

  • Collect the statistic in function (p *PessimisticRCTxnContextProvider) handleAfterQueryError and (p *PessimisticRCTxnContextProvider) handleAfterPessimisticLockError
    • We increase the counter if the resaon of WriteConfilictError is RCCheckTS in the two functions above.
    • We can also add a new flag indicating the retry is caused by WriteConfilictError whose reason is RCCheckTS, and increase the counter in function (p *PessimisticRCTxnContextProvider) OnStmtRetry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/transaction SIG:Transaction type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants