-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: utilities for disk-based hash join #12116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
var n int64 | ||
numRows := chk.NumRows() | ||
chk.offsetsOfRows = make([]int64, 0, numRows) | ||
var format *diskFormatRow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ format/ rowInDiskFormat ?
util/chunk/disk.go
Outdated
} | ||
|
||
// toRow deserializes diskFormatRow to Row. | ||
func (format *diskFormatRow) toRow(fields []*types.FieldType) Row { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that will MutRow
help here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten a zero-copy implementation, while feels a little hack.
func convertFromRow(row Row, reuse *diskFormatRow) (format *diskFormatRow) { | ||
numCols := row.Chunk().NumCols() | ||
if reuse != nil { | ||
format = reuse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ format/ rowInDiskFormat ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to keep the short-lived value short.
bufReader.Reset(r) | ||
defer bufReaderPool.Put(bufReader) | ||
|
||
format := rowInDisk{numCol: len(l.fieldTypes)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ format/ rowInDiskFormat ?
util/chunk/disk.go
Outdated
New: func() interface{} { return bufio.NewReaderSize(nil, readBufSize) }, | ||
} | ||
|
||
var tmpDir = path.Join(os.TempDir(), "tidb-server-hashJoin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to include pid as part of the file name. Consider the situation that one machine have multiply tidb-server processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included os.Args[0]
as part of the file name so that we can clean the uncleared temp file during the last run.
Codecov Report
@@ Coverage Diff @@
## master #12116 +/- ##
================================================
- Coverage 81.6069% 81.3039% -0.3031%
================================================
Files 452 453 +1
Lines 98064 96940 -1124
================================================
- Hits 80027 78816 -1211
- Misses 12393 12459 +66
- Partials 5644 5665 +21 |
/run-all-tests |
71da850
to
4b30699
Compare
/run-all-tests |
1 similar comment
/run-all-tests |
/build |
/run-unit-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
util/chunk/disk_test.go
Outdated
|
||
chks := make([]*Chunk, 0, numChk) | ||
for chkIdx := 0; chkIdx < numChk; chkIdx++ { | ||
chk := NewChunkWithCapacity(fields, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chk := NewChunkWithCapacity(fields, 2) | |
chk := NewChunkWithCapacity(fields, numRow) |
util/chunk/disk_test.go
Outdated
"github.com/pingcap/tidb/types/json" | ||
) | ||
|
||
func initChunks(numChk, numRow int) ([]*Chunk, []*types.FieldType, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to return an error
?
util/chunk/disk_test.go
Outdated
defer func() { | ||
err := l.Close() | ||
c.Check(err, check.IsNil) | ||
c.Check(l.disk, check.Not(check.IsNil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.Check(l.disk, check.Not(check.IsNil)) | |
c.Check(l.disk, check.NotNil) |
n, err := chk2.WriteTo(l.bufWriter) | ||
l.offWrite += n | ||
if err != nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to directly use the return err
form to improve code readability.
util/chunk/disk.go
Outdated
// sizesOfColumns stores the size of each column in a row. | ||
// -1 means the value of this column is null. | ||
sizesOfColumns []int64 // -1 means null | ||
cells [][]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cells [][]byte | |
cells [][]byte // raw data is shallow copied to a cell |
22b83f1
to
9373cdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
What problem does this PR solve?
part of #11607
The utilities for disk-based hash join
What is changed and how it works?
Check List
Tests
Code changes
Side effects
Related changes
Release note