Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Enabling ascii encoding fast track based on row selection. #21

Closed
wants to merge 1 commit into from

Conversation

kgpai
Copy link
Contributor

@kgpai kgpai commented Aug 11, 2021

Notes:

  • The central idea here is we keep a mask (selectivity vector) for every SimpleVector which stores all ascii string information. For propagating asciness we simply intersect across the vectors.
  • All the string functions test pass (some tests which test for mostly ascii etc have been disabled)
  • Theres a bunch of code that is no longer relevant (like string encoding tracker etc, or the old getStringEncode functions etc ). I am going to remove them in subsequent commits.
  • Quite a few of the functions have to be renamed and they make sense in only old context.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2021
@kgpai kgpai requested a review from mbasmanova August 11, 2021 09:13
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpai Thanks for working on this optimization. Some initial questions below.

@@ -196,7 +196,7 @@ class SubstrFunction : public exec::VectorFunction {
BaseVector* stringsVector = args[0].get();
BaseVector* startsVector = args[1].get();
BaseVector* lengthsVector = noLengthVector ? nullptr : args[2].get();
auto stringArgStringEncoding = getStringEncodingOrUTF8(stringsVector);
auto stringArgStringEncoding = getStringEncodingOrUTF8(stringsVector, rows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method no longer take "rows"? The function runs only on "rows" and doesn't need to know whether the whole vector is ASCII, just whether these particular rows are ASCII.


auto otherIter = SelectivityIterator(other);
vector_size_t idx;
while (otherIter.next(idx)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be more efficient way to combine two bitmaps. Would you take a look at SelectivityVector.h and BitUtil.h if there is an existing function? If not, let's make one.


// If T is f4d::StringView, we create a bitmap to store asciiness for each
// string. A set bit means the corresponding string is ascii.
std::optional<std::shared_ptr<SelectivityVector>> asciiMap_ = std::nullopt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason not to use empty SelectivityVector as the default?

} else {
return simpleVector->getStringEncoding().value();

if (simpleVector->hasStringAsciiMap()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove..

@@ -236,6 +236,10 @@ class SimpleVector : public BaseVector {
copyStringEncodingFrom(const BaseVector* vector) {
encodingMode_ =
vector->asUnchecked<SimpleVector<StringView>>()->getStringEncoding();
auto source = vector->asUnchecked<SimpleVector<StringView>>();
if (source->template hasStringAsciiMap()) {
asciiMap_ = std::optional(std::move(source->asciiMap_.value()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy..

@kgpai kgpai closed this Sep 1, 2021
zhouyuan pushed a commit to zhouyuan/velox that referenced this pull request Jun 19, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jun 24, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jun 30, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jul 20, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Aug 2, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Aug 12, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Aug 22, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Sep 7, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants