Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: support group_concat under new aggregation evaluation framework #7032

Merged
merged 18 commits into from
Jul 17, 2018

Conversation

XuHuaiyu
Copy link
Contributor

What have you changed? (mandatory)

implement
groupConcat4DistinctString
groupConcat4String

What is the type of the changes? (mandatory)

Improvement

  • New feature (non-breaking change which adds functionality)
  • Improvement (non-breaking change which is an improvement to an existing feature)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How has this PR been tested? (mandatory)

The existing test cases.

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

No

Does this PR affect tidb-ansible update? (mandatory)

No

Does this PR need to be added to the release notes? (mandatory)

No

Refer to a related PR or issue link (optional)

#6952

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. sig/execution SIG execution labels Jul 12, 2018
@zz-jason zz-jason added this to the 2.1 milestone Jul 12, 2018
// limitations under the License.
package aggfuncs

import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorganize the import packages

baseAggFunc
}

func (e *baseGroupConcat4String) AppendFinalResult2Chunk(sctx sessionctx.Context, pr PartialResult, chk *chunk.Chunk) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put functions belonging to baseGroupConcat4String together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseGroupConcat4String only has one function.

p.buffer.WriteString(s)
}
}
// TODO: if total length is greater than global var group_concat_max_len, truncate it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this as an issue and put the issue number into this comment.


func (e *groupConcat4String) ResetPartialResult(pr PartialResult) {
p := (*partialResult4ConcatString)(pr)
p.sep, p.sepInited, p.buffer = "", false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.sep should be a field of groupConcat4String, not the partial result. The separator can only be a constant and should be consider as a property of the group_concat function.

@XuHuaiyu
Copy link
Contributor Author

/run-all-tests

@XuHuaiyu
Copy link
Contributor Author

PTAL @zz-jason @winoros


func (e *groupConcat4String) UpdatePartialResult(sctx sessionctx.Context, rowsInGroup []chunk.Row, pr PartialResult) (err error) {
p := (*partialResult4ConcatString)(pr)
if !e.sepInited {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.sep can be initialized when we create the groupConcat4String object?

}
}
base := baseAggFunc{
args: aggFuncDesc.Args,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about args: aggFuncDesc.Args[:len(aggFuncDesc.Args)-1]

@lysu lysu changed the title execuotr: support group_concat under new aggregation evaluation framework executor: support group_concat under new aggregation evaluation framework Jul 12, 2018
}
if isNull {
continue
}

This comment was marked as resolved.

@XuHuaiyu
Copy link
Contributor Author

PTAL @zz-jason
The previous implementation may be more readable?

@zz-jason
Copy link
Member

@XuHuaiyu We need some benchmarks to decide whether it is worthy to make that change.

@jackysp
Copy link
Member

jackysp commented Jul 13, 2018

/run-all-tests tidb-test=pr/573

@XuHuaiyu
Copy link
Contributor Author

@zz-jason
Benchmark is interesting:

func BenchmarkString(b *testing.B) {
	buffer := &bytes.Buffer{}
	array := []string{"abc", "bcd", "efg", "hij"}
	b.ResetTimer()
	valBuf := make([]string, len(array))
	for i := 0; i < b.N; i ++{
		for j := range array {
			valBuf[j] = array[j]
		}
		for j := range valBuf{
			buffer.WriteString(valBuf[j])
		}
	}
}

func BenchmarkString2(b *testing.B) {
	buffer := &bytes.Buffer{}
	array := []string{"abc", "bcd", "efg", "hij"}
	b.ResetTimer()
	for i := 0; i < b.N; i ++{
		for j := range array {
			buffer.WriteString(array[j])
		}
	}
}

BenchmarkString-4 20000000 70.1 ns/op 28 B/op 0 allocs/op
BenchmarkString2-4 30000000 59.8 ns/op 37 B/op 0 allocs/op

1 similar comment
@XuHuaiyu
Copy link
Contributor Author

@zz-jason
Benchmark is interesting:

func BenchmarkString(b *testing.B) {
	buffer := &bytes.Buffer{}
	array := []string{"abc", "bcd", "efg", "hij"}
	b.ResetTimer()
	valBuf := make([]string, len(array))
	for i := 0; i < b.N; i ++{
		for j := range array {
			valBuf[j] = array[j]
		}
		for j := range valBuf{
			buffer.WriteString(valBuf[j])
		}
	}
}

func BenchmarkString2(b *testing.B) {
	buffer := &bytes.Buffer{}
	array := []string{"abc", "bcd", "efg", "hij"}
	b.ResetTimer()
	for i := 0; i < b.N; i ++{
		for j := range array {
			buffer.WriteString(array[j])
		}
	}
}

BenchmarkString-4 20000000 70.1 ns/op 28 B/op 0 allocs/op
BenchmarkString2-4 30000000 59.8 ns/op 37 B/op 0 allocs/op

@zz-jason
Copy link
Member

zz-jason commented Jul 16, 2018

(70.1-59.8)/59.8, about 17%, I think it's worth to refine.

@zz-jason
Copy link
Member

we need to reset buffer before for j := range array {?

@XuHuaiyu
Copy link
Contributor Author

it's ok to not reset buffer,
group_concat would not reset buffer if there is no group-by.

p.buffer.WriteString(e.sep)
}
}
p.buffer.Truncate(p.buffer.Len() - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed?

}
p.buffer.WriteString(v)
}
if isWriteSep {

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

if isNull {
continue
}
valsBuf[i] = v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valsBuf can be optimized out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This valsBuf should be maintained,
cause we need to get all the values here for checking distinct later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make valsBuf be an object of bytes.Buffer? Maybe it's faster than strings.Join?

@XuHuaiyu
Copy link
Contributor Author

/run-all-tests

@XuHuaiyu
Copy link
Contributor Author

PTAL @zz-jason @winoros

sep, _, err := c.EvalString(nil, nil)
// This err will never happen, we check it here for passing the errcheck.
if err != nil {
log.Warning("Error happened when buildGroupConcat:", errors.Trace(err).Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return nil or panic here to avoid further execution.

@XuHuaiyu
Copy link
Contributor Author

PTAL @zz-jason

@XuHuaiyu
Copy link
Contributor Author

/run-all-tests

@@ -38,6 +38,9 @@ var (
// All the AggFunc implementations for "MAX" are listed here.
// All the AggFunc implementations for "MIN" are listed here.
// All the AggFunc implementations for "GROUP_CONCAT" are listed here.
_ AggFunc = (*groupConcat4DistinctString)(nil)
_ AggFunc = (*groupConcat4String)(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_concat always executes on the string inputs, I think we can remove the 4String suffix. How about groupConcat and groupConcatDistinct?

buffer *bytes.Buffer
}

type partialResult4ConcatString struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about partialResult4GroupConcat?

v, isNull := "", false
for _, row := range rowsInGroup {
isWriteSep := false
for i, l := 0, len(e.args); i < l; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for _, arg := range e.args could be simpler.

p := (*partialResult4ConcatString)(pr)
v, isNull := "", false
for _, row := range rowsInGroup {
isWriteSep := false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about s/isWriteSep/writeSep/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the current name.😄

return nil
}

type partialResult4ConcatDistinctString struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about partialResult4GroupConcatDistinct?

p := (*partialResult4ConcatDistinctString)(pr)
v, isNull, valsBuf, joinedVals := "", false, make([]string, len(e.args)), ""
for _, row := range rowsInGroup {
for i, l := 0, len(e.args); i < l; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

if isNull {
continue
}
valsBuf[i] = v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make valsBuf be an object of bytes.Buffer? Maybe it's faster than strings.Join?

@XuHuaiyu
Copy link
Contributor Author

@AndreMouche is working on the failed case.

@XuHuaiyu
Copy link
Contributor Author

comments addressed, PTAL @zz-jason

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 17, 2018
base := baseAggFunc{
args: aggFuncDesc.Args[:len(aggFuncDesc.Args)-1],
ordinal: ordinal,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove base(line#201~204) to below default(line#209) ?

p := (*partialResult4GroupConcat)(pr)
v, isNull := "", false
for _, row := range rowsInGroup {
isWriteSep := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about define isWriteSep out of for loop?

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 17, 2018
@zz-jason
Copy link
Member

/run-all-tests

@zz-jason zz-jason merged commit b29d52b into pingcap:master Jul 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants