-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Enabling ascii encoding fast track based on row selection. #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgpai Thanks for working on this optimization. Some initial questions below.
@@ -196,7 +196,7 @@ class SubstrFunction : public exec::VectorFunction { | |||
BaseVector* stringsVector = args[0].get(); | |||
BaseVector* startsVector = args[1].get(); | |||
BaseVector* lengthsVector = noLengthVector ? nullptr : args[2].get(); | |||
auto stringArgStringEncoding = getStringEncodingOrUTF8(stringsVector); | |||
auto stringArgStringEncoding = getStringEncodingOrUTF8(stringsVector, rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method no longer take "rows"? The function runs only on "rows" and doesn't need to know whether the whole vector is ASCII, just whether these particular rows are ASCII.
|
||
auto otherIter = SelectivityIterator(other); | ||
vector_size_t idx; | ||
while (otherIter.next(idx)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be more efficient way to combine two bitmaps. Would you take a look at SelectivityVector.h and BitUtil.h if there is an existing function? If not, let's make one.
|
||
// If T is f4d::StringView, we create a bitmap to store asciiness for each | ||
// string. A set bit means the corresponding string is ascii. | ||
std::optional<std::shared_ptr<SelectivityVector>> asciiMap_ = std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason not to use empty SelectivityVector as the default?
} else { | ||
return simpleVector->getStringEncoding().value(); | ||
|
||
if (simpleVector->hasStringAsciiMap()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove..
@@ -236,6 +236,10 @@ class SimpleVector : public BaseVector { | |||
copyStringEncodingFrom(const BaseVector* vector) { | |||
encodingMode_ = | |||
vector->asUnchecked<SimpleVector<StringView>>()->getStringEncoding(); | |||
auto source = vector->asUnchecked<SimpleVector<StringView>>(); | |||
if (source->template hasStringAsciiMap()) { | |||
asciiMap_ = std::optional(std::move(source->asciiMap_.value())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy..
Notes: