-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parquet][C++][Python] Bump the default format version from 2.4 -> 2.6 #35746
Comments
I found that our implemention already supports /// \brief Allowed for physical type INT64.
class PARQUET_EXPORT TimestampLogicalType : public LogicalType {
public:
static std::shared_ptr<const LogicalType> Make(bool is_adjusted_to_utc,
LogicalType::TimeUnit::unit time_unit,
bool is_from_converted_type = false,
bool force_set_converted_type = false);
bool is_adjusted_to_utc() const;
LogicalType::TimeUnit::unit time_unit() const;
/// \brief If true, will not set LogicalType in Thrift metadata
bool is_from_converted_type() const;
/// \brief If true, will set ConvertedType for micros and millis
/// resolution in legacy ConvertedType Thrift metadata
bool force_set_converted_type() const;
private:
TimestampLogicalType() = default;
}; And timeunit has
Do we already supports them? /cc @wgtmac |
I've find out that, C++ implements parquet 2.6, and need the flag |
Yes, we indeed already support that version, but we default to 2.4 at the moment. I edited the title that the issue is about changing the default version. The default in C++: arrow/cpp/src/parquet/properties.h Lines 207 to 221 in 6d2df07
And similarly in the Python bindings the default is also "2.4". |
It seems that parquet-cpp has implemented features (e.g. modular encryption and BYTE_STREAM_SPLIT encoding) beyond version 2.6. We probably need to update supported versions and manage features based on the version. https://github.com/apache/parquet-format/blob/master/CHANGES.md |
We can open issue about this, I can take time to fix it. |
Ok, so parquet-mr implemented nanosecond precision timestamps in 2018: apache/parquet-java#519 I think this makes it ok to bump the default to 2.6. |
For reference (and to see where things need to be changed), the commit of the previous time bumping the version: 797c88a |
#36137) Change the default parquet version to 2.6. Discussed in: * [ML](https://lists.apache.org/thread/027g366yr3m03hwtpst6sr58b3trwhsm) * [Issue](#35746) * Closes: #35746 Lead-authored-by: anjakefala <anja@voltrondata.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: mwish <1506118561@qq.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Describe the enhancement requested
Parquet format version 2.6 introduces the NanoSecond time unit for Time and Timestamp logical types.
Component(s)
Parquet
The text was updated successfully, but these errors were encountered: