You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The INT64 Go type is being used in a lot of places, which translates to the UINT_64 logical type in parquet. This type is incompatible w/ Spark <= 3.1:
org.apache.spark.sql.AnalysisException: Parquet type not supported: INT64 (UINT_64)
This on it's own is fine, but it seems to be used in cases where the underlying data is not very large, say max value of 2048.0, is there a way to run an aggregation on the entire series before export and downcast to the smallest suitable logical type? Or maybe even issue a Prometheus query across a wider date range to grab the max, to help prevent schema (read: type) drift? I think this may be partly to blame for the large memory usage on export.
There's currently not a great spark workaround for the above since we need to use spark.read.option("mergeSchema", "true") to account for the schema drift internal to Prometheus. Best solution is to use bleeding edge Spark 3.2.0 which has it's own problems 😬
The text was updated successfully, but these errors were encountered:
wseaton
changed the title
Evaluate use of INT64 (UINT_64)
Evaluate use of INT64 (and setting pq schema as UINT_64)
Oct 28, 2021
The INT64 Go type is being used in a lot of places, which translates to the UINT_64 logical type in parquet. This type is incompatible w/ Spark <= 3.1:
This on it's own is fine, but it seems to be used in cases where the underlying data is not very large, say max value of
2048.0
, is there a way to run an aggregation on the entire series before export and downcast to the smallest suitable logical type? Or maybe even issue a Prometheus query across a wider date range to grab the max, to help prevent schema (read: type) drift? I think this may be partly to blame for the large memory usage on export.There's currently not a great spark workaround for the above since we need to use
spark.read.option("mergeSchema", "true")
to account for the schema drift internal to Prometheus. Best solution is to use bleeding edge Spark 3.2.0 which has it's own problems 😬The text was updated successfully, but these errors were encountered: