Evaluate use of INT64 (and setting pq schema as UINT_64) #33

wseaton · 2021-10-26T22:12:25Z

The INT64 Go type is being used in a lot of places, which translates to the UINT_64 logical type in parquet. This type is incompatible w/ Spark <= 3.1:

org.apache.spark.sql.AnalysisException: Parquet type not supported: INT64 (UINT_64)

This on it's own is fine, but it seems to be used in cases where the underlying data is not very large, say max value of 2048.0, is there a way to run an aggregation on the entire series before export and downcast to the smallest suitable logical type? Or maybe even issue a Prometheus query across a wider date range to grab the max, to help prevent schema (read: type) drift? I think this may be partly to blame for the large memory usage on export.

There's currently not a great spark workaround for the above since we need to use spark.read.option("mergeSchema", "true") to account for the schema drift internal to Prometheus. Best solution is to use bleeding edge Spark 3.2.0 which has it's own problems 😬

The text was updated successfully, but these errors were encountered:

wseaton changed the title ~~Evaluate use of INT64 (UINT_64)~~ Evaluate use of INT64 (and setting pq schema as UINT_64) Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate use of INT64 (and setting pq schema as UINT_64) #33

Evaluate use of INT64 (and setting pq schema as UINT_64) #33

wseaton commented Oct 26, 2021 •

edited

Loading

Evaluate use of INT64 (and setting pq schema as UINT_64) #33

Evaluate use of INT64 (and setting pq schema as UINT_64) #33

Comments

wseaton commented Oct 26, 2021 • edited Loading

wseaton commented Oct 26, 2021 •

edited

Loading