Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: utilities for disk-based hash join #12116

Merged
merged 7 commits into from
Sep 11, 2019

Conversation

SunRunAway
Copy link
Contributor

@SunRunAway SunRunAway commented Sep 10, 2019

What problem does this PR solve?

part of #11607
The utilities for disk-based hash join

What is changed and how it works?

goos: darwin
goarch: amd64
pkg: github.com/pingcap/tidb/util/chunk
BenchmarkListInDiskAdd-12       	  300000	      4681 ns/op	     360 B/op	       4 allocs/op
BenchmarkListInDiskGetRow-12    	  500000	      2878 ns/op	     994 B/op	      21 allocs/op
BenchmarkListMemoryUsage-12     	2000000000	         0.37 ns/op	       0 B/op	       0 allocs/op
BenchmarkListAdd-12             	20000000	       104 ns/op	      41 B/op	       0 allocs/op
BenchmarkListGetRow-12          	500000000	         4.13 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/pingcap/tidb/util/chunk	31.892s

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

  • None

Related changes

  • None

Release note

  • None

util/chunk/disk.go Outdated Show resolved Hide resolved
util/chunk/disk.go Outdated Show resolved Hide resolved
Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

var n int64
numRows := chk.NumRows()
chk.offsetsOfRows = make([]int64, 0, numRows)
var format *diskFormatRow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ format/ rowInDiskFormat ?

util/chunk/disk.go Outdated Show resolved Hide resolved
}

// toRow deserializes diskFormatRow to Row.
func (format *diskFormatRow) toRow(fields []*types.FieldType) Row {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that will MutRow help here?

Copy link
Contributor Author

@SunRunAway SunRunAway Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten a zero-copy implementation, while feels a little hack.

func convertFromRow(row Row, reuse *diskFormatRow) (format *diskFormatRow) {
numCols := row.Chunk().NumCols()
if reuse != nil {
format = reuse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ format/ rowInDiskFormat ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep the short-lived value short.

bufReader.Reset(r)
defer bufReaderPool.Put(bufReader)

format := rowInDisk{numCol: len(l.fieldTypes)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ format/ rowInDiskFormat ?

New: func() interface{} { return bufio.NewReaderSize(nil, readBufSize) },
}

var tmpDir = path.Join(os.TempDir(), "tidb-server-hashJoin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to include pid as part of the file name. Consider the situation that one machine have multiply tidb-server processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included os.Args[0] as part of the file name so that we can clean the uncleared temp file during the last run.

@codecov
Copy link

codecov bot commented Sep 11, 2019

Codecov Report

Merging #12116 into master will decrease coverage by 0.303%.
The diff coverage is 84.6666%.

@@               Coverage Diff                @@
##             master     #12116        +/-   ##
================================================
- Coverage   81.6069%   81.3039%   -0.3031%     
================================================
  Files           452        453         +1     
  Lines         98064      96940      -1124     
================================================
- Hits          80027      78816      -1211     
- Misses        12393      12459        +66     
- Partials       5644       5665        +21

@SunRunAway
Copy link
Contributor Author

/run-all-tests

@SunRunAway
Copy link
Contributor Author

/run-all-tests

1 similar comment
@SunRunAway
Copy link
Contributor Author

/run-all-tests

@zyxbest
Copy link
Contributor

zyxbest commented Sep 11, 2019

/build

@SunRunAway
Copy link
Contributor Author

/run-unit-test

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


chks := make([]*Chunk, 0, numChk)
for chkIdx := 0; chkIdx < numChk; chkIdx++ {
chk := NewChunkWithCapacity(fields, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chk := NewChunkWithCapacity(fields, 2)
chk := NewChunkWithCapacity(fields, numRow)

"github.com/pingcap/tidb/types/json"
)

func initChunks(numChk, numRow int) ([]*Chunk, []*types.FieldType, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to return an error?

defer func() {
err := l.Close()
c.Check(err, check.IsNil)
c.Check(l.disk, check.Not(check.IsNil))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c.Check(l.disk, check.Not(check.IsNil))
c.Check(l.disk, check.NotNil)

n, err := chk2.WriteTo(l.bufWriter)
l.offWrite += n
if err != nil {
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to directly use the return err form to improve code readability.

// sizesOfColumns stores the size of each column in a row.
// -1 means the value of this column is null.
sizesOfColumns []int64 // -1 means null
cells [][]byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cells [][]byte
cells [][]byte // raw data is shallow copied to a cell

@SunRunAway SunRunAway force-pushed the disk-join-issue11607-disk branch from 22b83f1 to 9373cdd Compare September 11, 2019 09:28
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. and removed status/PTAL labels Sep 11, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 11, 2019

/run-all-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants