-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic implementation of Unnest operator #27
Conversation
@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This pull request was exported from Phabricator. Differential Revision: D30264256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -34,4 +34,53 @@ const std::vector<std::shared_ptr<const PlanNode>>& ExchangeNode::sources() | |||
return EMPTY_SOURCES; | |||
} | |||
|
|||
UnnestNode::UnnestNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic Question, does Unnest here mean hive explode or does it mean something else ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<TypePtr> types; | ||
|
||
for (const auto& variable : replicateVariables_) { | ||
names.emplace_back(variable->name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Might I suggest std::vector<Pair<string, TypePtr>> so we can use an iterator to access both instead of an index ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type constructor takes two vectors, hence, no flexibility here.
ChannelIndex outputChannel = 0; | ||
for (const auto& variable : unnestNode->replicateVariables()) { | ||
identityProjections_.emplace_back( | ||
inputType->getChildIdx(variable->name()), outputChannel++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what ChannelIndex, childIdx here are ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps to think about input data as a table made of a columns. Each column has a name and an ordinal zero-based position called channel. inputType is a RowType made of of column types. childIdx = channel and ChannelIndex is a typedef for the type of channel.
using ChannelIndex = uint32_t;
// Create "indices" buffer to repeat rows as many times as there are elements | ||
// in the array. | ||
BufferPtr repeatedIndices = | ||
AlignedBuffer::allocate<vector_size_t>(numElements, pool()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for vector_size_t to be 0 here ? Should we also do some some max size check ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector_size_t is a type, hence, cannot be zero, but numElements can be zero. Nice catch. Will update the code to handle that case properly.
velox/exec/tests/UnnestTest.cpp
Outdated
assertQuery( | ||
op, | ||
std::vector<std::shared_ptr<TempFilePath>>{}, | ||
"SELECT c0, x FROM tmp, UNNEST(ARRAY[0, 1, 2]) as t(x)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for map ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This initial cut of UNNEST doesn't support maps.
This pull request was exported from Phabricator. Differential Revision: D30264256 |
@kgpai Krishna, thank you for review. I replied to questions and updated the code to handle numElements == 0 explicitly. Would you take another look? |
This pull request was exported from Phabricator. Differential Revision: D30264256 |
This pull request was exported from Phabricator. Differential Revision: D30264256 |
Summary: This is an initial cut which supports only one unnest column of type ARRAY and doesn't support WITH ORDINALITY clause. Pull Request resolved: facebookincubator/velox#27 Differential Revision: D30264256 Pulled By: mbasmanova fbshipit-source-id: adf0c138d0a48a37437eb11e628c90532b21bdf5
This pull request was exported from Phabricator. Differential Revision: D30264256 |
@mbasmanova merged this pull request in 545a610. |
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
…incubator#27) * Filter validation for Parquet reader at runtime * Style * Style * Format Removed special handling for avg (facebookincubator#31) [OPPRO-173] Make batch size configurable (facebookincubator#32) support dwrf format
This is an initial cut which supports only one unnest column of type ARRAY and doesn't support WITH ORDINALITY clause.