-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20960][SQL] make ColumnVector public #20116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,32 +14,39 @@ | |
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.spark.sql.execution.vectorized; | ||
| package org.apache.spark.sql.vectorized; | ||
|
|
||
| import org.apache.spark.sql.catalyst.util.MapData; | ||
| import org.apache.spark.sql.types.DataType; | ||
| import org.apache.spark.sql.types.Decimal; | ||
| import org.apache.spark.unsafe.types.UTF8String; | ||
|
|
||
| /** | ||
| * This class represents in-memory values of a column and provides the main APIs to access the data. | ||
| * It supports all the types and contains get APIs as well as their batched versions. The batched | ||
| * versions are considered to be faster and preferable whenever possible. | ||
| * An interface representing in-memory columnar data in Spark. This interface defines the main APIs | ||
| * to access the data, as well as their batched versions. The batched versions are considered to be | ||
| * faster and preferable whenever possible. | ||
| * | ||
| * To handle nested schemas, ColumnVector has two types: Arrays and Structs. In both cases these | ||
| * columns have child columns. All of the data are stored in the child columns and the parent column | ||
| * only contains nullability. In the case of Arrays, the lengths and offsets are saved in the child | ||
| * column and are encoded identically to INTs. | ||
| * Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values | ||
| * in this ColumnVector. | ||
| * | ||
| * Maps are just a special case of a two field struct. | ||
| * ColumnVector supports all the data types including nested types. To handle nested types, | ||
| * ColumnVector can have children and is a tree structure. For struct type, it stores the actual | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: child -> children
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's already
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for my mistake. |
||
| * data of each field in the corresponding child ColumnVector, and only stores null information in | ||
| * the parent ColumnVector. For array type, it stores the actual array elements in the child | ||
| * ColumnVector, and stores null information, array offsets and lengths in the parent ColumnVector. | ||
| * | ||
| * Most of the APIs take the rowId as a parameter. This is the batch local 0-based row id for values | ||
| * in the current batch. | ||
| * ColumnVector is expected to be reused during the entire data loading process, to avoid allocating | ||
| * memory again and again. | ||
| * | ||
| * ColumnVector is meant to maximize CPU efficiency but not to minimize storage footprint. | ||
| * Implementations should prefer computing efficiency over storage efficiency when design the | ||
| * format. Since it is expected to reuse the ColumnVector instance while loading data, the storage | ||
| * footprint is negligible. | ||
| */ | ||
| public abstract class ColumnVector implements AutoCloseable { | ||
|
|
||
| /** | ||
| * Returns the data type of this column. | ||
| * Returns the data type of this column vector. | ||
| */ | ||
| public final DataType dataType() { return type; } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the space before
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the standard java foreach code style