-
Notifications
You must be signed in to change notification settings - Fork 66
*: parse the data source directly into data and skip the KV encoder #145
Conversation
ce043f9
to
f0dadce
Compare
f81aead
to
44db439
Compare
@lonng Some metrics are temporarily removed, we need to see if we want to tweak the metrics or the process. The old process:
New process:
The following metrics were involved in this change and may need to be repurposed?
|
@kennytm I think we could add some new metrics about:
|
b83efb5
to
5624ac0
Compare
This skips the more complex pingcap/parser, and speeds up parsing speed by 50%. We have also refactored the KV delivery mechanism to use channels directly, and revamped metrics: - Make the metrics about engines into its own `engines` counter. The `tables` counter is exclusively about tables now. - Removed `block_read_seconds`, `block_read_bytes`, `block_encode_seconds` since the concept of "block" no longer applies. Replaced by the equivalents named `row_***`. - Removed `chunk_parser_read_row_seconds` for being overlapping with `row_read_seconds`. - Changed `block_deliver_bytes` into a histogram vec, with kind=index or kind=data. Introduced `block_deliver_kv_pairs`.
Only kill Lightning if the whole chunk is imported exactly. The chunk checkpoint may be recorded before a chunk is fully written, and this will hit the failpoint more than 5 times.
This helps debugging some mysterious cancellation where the log is inhibited. Added IsReallyContextCanceledError() for code logic affected by error type.
5624ac0
to
216c812
Compare
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that lightning lacks unit test
lightning/restore/checkpoints.go
Outdated
_, err = chunkStmt.ExecContext( | ||
c, tableName, engineID, | ||
value.Key.Path, value.Key.Offset, value.Columns, value.ShouldIncludeRowID, | ||
value.Key.Path, value.Key.Offset, colPerm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only know we call InsertEngineCheckpoints
after calling populateChunks
. If so, columns
column always empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this and the checksum (always 0 too).
func (cr *chunkRestore) saveCheckpoint(t *TableRestore, engineID int32, rc *RestoreController) { | ||
rc.saveCpCh <- saveCp{ | ||
tableName: t.tableName, | ||
merger: &RebaseCheckpointMerger{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it enough to only save it in L529? it seems AllocBase
wouldn't change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no, the AllocBase needs to be larger than all content of _tidb_rowid
(or the integer primary key), which value cannot be determined until we've read all data.
Added a comment for this.
{"t2", "CREATE TABLE `t2` (`c1` varchar(30000) NOT NULL)", "failed to ExecDDLSQL `mockdb`.`t2`:.*"}, | ||
{"t3", "CREATE TABLE `t3-a` (`c1-a` varchar(5) NOT NULL)", ""}, | ||
{"t1", "CREATE TABLE `t1` (`c1` varchar(5) NOT NULL)"}, | ||
// {"t2", "CREATE TABLE `t2` (`c1` varchar(30000) NOT NULL)"}, // no longer able to create this kind of table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this case meaningless? or add some error cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR we no longer parses the CREATE TABLE
DDL, and instead directly unmarshal the JSON result from TiDB (calling tables.TableFromMeta
). So yeah this case becomes meaningless since either you can't create a VARCHAR(30000)
in TiDB, or you can and produced a proper JSON which Lightning accepts without error.
Anyway, added a separate unit test to ensure malformed table info will produce an error.
@@ -19,11 +19,12 @@ import ( | |||
"strings" | |||
|
|||
. "github.com/pingcap/check" | |||
"github.com/pingcap/errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't add more test for it, only old cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old unit test didn't compile since the interface is changed. There's no new or deleted tests in this file.
Rest LGTM |
/run-all-tests |
Rest LGTM |
@GregoryIan PTAL again |
LGTM |
/run-all-tests |
What problem does this PR solve?
In our testing with a 4.1 TB workload, we found that parsing SQL takes almost half of the time to encode a row. Since we have already used a parser to extract each row, parsing it again is wasting computing resource. Additionally, for CSV we need to perform the complex and unnecessary Parse CSV → Reconstruct SQL → Parse SQL.
What is changed and how it works?
We change the Lightning parsers to directly produce an array of
types.Datum
for both CSV and SQL. We also get rid of the abstraction layerKvEncoder
(since it only accepts SQL statements), and directly use(*table.Table).AddRecord
to convert the[]types.Datum
into KV pairs.This slashes half of the encoding time according to experiment.
Check List
Tests
Side effects
Related changes