Skip to content

Conversation

@huaxingao
Copy link
Contributor

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@huaxingao
Copy link
Contributor Author

@gszadovszky @shangxinli @rdblue Could you please take a look at this PR when you have time? Thanks a lot!

@huaxingao
Copy link
Contributor Author

also cc @chenjunjiedada

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments in the code. All of them are more to open discussions so I neither approve nor disapprove for now.
Otherwise the code seems good to me. Thanks a lot for working on it!

}
}

// base class for In and NotIn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a better comment since it is public method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

this.values = Objects.requireNonNull(values, "values cannot be null");
checkArgument(!values.isEmpty(), "values in SetColumnFilterPredicate shouldn't be empty!");

String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have a 'toString' to cache but do we see generally this is reused multiple times? If no, proactively converting to string will be a waste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

iter++;
}
String valueStr = values.size() <= 100 ? str.substring(0, str.length() - 2) : str + "...";
this.toString = name + "(" + column.getColumnPath().toDotString() + ", " + valueStr + ")";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to merge lines 272 and 273 into the above code of that building? the string? String operations sometimes consume a lot of memory like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just enough to replace str + "..." to str.append("...").toString?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str.substring(0, str.length() - 2) is still StringBuilder operation. Seems fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can replace line 273 with StringBuilder operation too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can just 'return this.getClass() == o.getClass()'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
SetColumnFilterPredicate<?> that = (SetColumnFilterPredicate<?>) o;
return column.equals(that.column) && values.equals(that.values) && Objects.equals(toString, that.toString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is toString comparison still needed here? It seems toString have (values and class). You can just compare class here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed toString comparison

@huaxingao
Copy link
Contributor Author

@gszadovszky @shangxinli @dbtsai Thank you all very much for reviewing! I have changed the code to generate the visit methods for in/notIn and also added the default by throwing Exception. Will address the rest of the comments tomorrow or the day after tomorrow.

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some more concrete comments in the code. Some more work is needed but I think it is going to the good direction. Thanks a lot for your efforts.

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of more comments/requests.
Sorry if I am a bit strict here but filtering is not an easy topic and can have serious issues (lost of data).

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of comments for the new MinMax class but otherwise everything seems great!

T element = iterator.next();
if (max == null) {
max = element;
} else if (max != null && element != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are already in the else path so do not need to check for max != null.

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor issues only. Thanks a lot for your efforts to implement this! I think it is a great improvement for the query engines.

@gszadovszky gszadovszky merged commit 98e3e1a into apache:master Sep 30, 2021
@huaxingao
Copy link
Contributor Author

@gszadovszky @shangxinli @viirya @dbtsai Thank you so much for all your help!!

@huaxingao huaxingao deleted the in_filter branch September 30, 2021 14:22
@gszadovszky
Copy link
Contributor

Thank you for your contribution, @huaxingao! Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants