Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(execute/table): Refactored the table builder interfaces to support null value creation. #790

Merged
merged 1 commit into from
Jan 10, 2019

Conversation

aanthony1243
Copy link
Contributor

Done checklist

  • docs/SPEC.md updated
  • Test cases written

@aanthony1243 aanthony1243 force-pushed the feat/builder-null-safe branch from a59b56b to 3f3339b Compare January 9, 2019 22:23
Copy link
Contributor

@jsternberg jsternberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I don't like this change to the builder. I think the underlying problem to address is that invalid methods were used with arrow during the conversion and we should just fix those usage points rather than turn the Append methods into array copy methods. I think it makes more sense to instead have a utility method that will copy an arrow array similar to how copy() and io.Copy() exist in the Go library.

I would much prefer that these fix the utility methods in arrow to account for nils and fix the few places in the codebase that have the wrong usage.

I also think that this change makes it easier to inadvertently cause memory leaks.

I do think there's some good stuff in here though that I wish to see split. There's some fixes like having ValueForRow account for null values which is desperately needed mixed with the API change and I think those two are separate.

@@ -187,7 +187,7 @@ func (t *differenceTransformation) Process(id execute.DatasetID, tbl flux.Table)
d := differences[j]
switch c.Type {
case flux.TBool:
if err := arrow.AppendBools(builder, j, arrow.BoolSlice(cr.Bools(j), firstIdx, l)); err != nil {
if err := builder.AppendBools(j, arrow.BoolSlice(cr.Bools(j), firstIdx, l)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this got in originally, but it's a memory leak. You have to release the slice after creating it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will let this get fixed in a currently open PR for difference.

functions/transformations/keys.go Outdated Show resolved Hide resolved
execute/table.go Outdated
val = ValueForRow(cr, i, colMap[j])
err = builder.AppendValue(j, val)
} else {
switch c.Type {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why AppendNil can't be used in this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. I will fix it.

@affo
Copy link
Contributor

affo commented Jan 10, 2019

@aanthony1243 @jsternberg

Hey guys, I am not against this change as discussed in Slack. I leave the discussion to you and follow the thread.

However, @jsternberg I can't see the point of having utilities to convert a call that uses go slices to one that uses arrow arrays, given that we should evolve to an arrow-oriented implementation and avoid the use of slices. Am I missing something?
Are you suggesting that we perform this step post-alpha and, for now, we only fix utils?

At the moment, I wrote down a fix to utils in #777
However, from the arrow package in flux, I can't write, like @aanthony1243 did:

col := b.cols[j].(*intColumnBuilder)
nullOffset := len(col.data)

And use the offset of that specific column to set nils. So, I must append values one by one.
Should we move those utils into execute?

@jsternberg For the memory leaks in difference and limit (it also uses slicing utils and it is missing in this PR), I'll fix them now and put you as reviewer. Thank you.

@aanthony1243
Copy link
Contributor Author

I see the issues and I have some ideas. we can talk offline and then update this PR.

@affo
Copy link
Contributor

affo commented Jan 10, 2019

For difference memory leaks, see #792

@jsternberg
Copy link
Contributor

I talked with @aanthony1243 offline and both of us seemed to be in agreement that the current table builder API needed to be revisited and we would revisit it after the alpha.

There were a few points that are important:

  1. The arrow builder should be exposed directly.
  2. The utility methods are a bit of an anti-pattern since nobody knows they exist.
  3. The table builder being an interface is likely an anti-pattern.

I'll give this another once over but let's just merge it so we have null support for the alpha and we'll revisit it later. We're not freezing any Go APIs yet anyway.

@aanthony1243 aanthony1243 force-pushed the feat/builder-null-safe branch from 07fa9ef to 4c05461 Compare January 10, 2019 16:39
…rt null value creation.

Also fixed group and pivot tests that now get nulls correctly

update spec
@aanthony1243 aanthony1243 force-pushed the feat/builder-null-safe branch from 4de003f to a9c1a30 Compare January 10, 2019 20:17
@aanthony1243 aanthony1243 merged commit 2451c02 into master Jan 10, 2019
@aanthony1243 aanthony1243 deleted the feat/builder-null-safe branch January 10, 2019 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants