Release v0.3.2 · Qbeast-io/qbeast-spark

Bip bip new version of qbeast-spark with some awesome algorithm improvements 🥳

What's changed

Better file sizes! Now the final size of the cubes corresponds to the cubeSize used to write the data. You can find more information about the algorithm changes and performance numbers in the merged PR #156 .
Register Operation Metrics and Per-file Statistics [Delta]. Statistical information of the columns (min, max, nullCount) is gathered in order to perform a better data skipping.

Option for specifying min/max values of the indexed columns. It will allow a more flexible creation of Revision, in order to include values that might not be in the newly indexed Dataframe.

df.write.format("qbeast")
.option("columnsToIndex", "a,b")
.option("columnStats","""{"a_min":0,"a_max":10,"b_min":20.0,"b_max":70.0}""")
.save("/tmp/table")

The enforced structure of the JSON is:

{
    "columnName_min" : value
    "columnName_max" : value

}

Minor changes

(click to see)

Fixed #165 . Create External Table with Qbeast without specifying the schema.
Fixed #149. Update metadata through MetadataManager

Contributors

Special thanks to @Jiaweihu08, who took the Qbeast Format files to the next level with the Domain-Driven algorithm!
@cugni @osopardo1
Full Changelog: v0.3.1...v0.3.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.2

What's changed

Minor changes

(click to see)

Contributors

Contributors