You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While using parquet::file::reader::SerializedFileReader, calling to_json_value on a parquet Row, containing a date before 1970/01/01 panics.
A similar issue #2897 was reported earlier and closed after #2899 solved it. But it only handled Field::TimestampMillis and Field::TimestampMicros types, Field::Date still breaks.
To Reproduce
Inside test parquet::record::api::tests::test_convert_date_to_string, add an extra call to function check_date_conversion with any date before 1970/01/01. Another way is to use a parquet file with at least one Row containing such value.
This panics with the message
thread 'record::api::tests::test_convert_date_to_string' panicked at 'No such local time'
pub enum Field {
...
Date(u32),
TimestampMillis(u64),
TimestampMicros(u64),
...
}
When i32 or i64 are interpreted as their unsigned counterparts, the value underflows due to the now lack of a sign bit, e.g., -7766_i32 becomes 4294959530_u32.
When the new value is interpreted as its signed variant again, we get the original value (since the bytes never changed). Therefore assert_eq!(4294959530_u32 as i32, -7766_i32) succeeds.
#2899 utilizes this fact to solve the problem. But the same code doesn't resolve it for Field::Date in the below snippet, because it holds a u32, and promoting it an i64 changes the value itself (since it treats the MSB of the u32 for value, and not sign, which was crucial). Below code doesn't work.
fn convert_date_to_string(value: u32) -> String {
static NUM_SECONDS_IN_DAY: i64 = 60 * 60 * 24;
let dt = Utc
.timestamp_opt(value as i64 * NUM_SECONDS_IN_DAY, 0)
.unwrap();
format!("{}", dt.format("%Y-%m-%d %:z"))
}
The correct way would be to read value to i32 before promoting it to i64. Therefore below code works.
fn convert_date_to_string(value: u32) -> String {
static NUM_SECONDS_IN_DAY: i64 = 60 * 60 * 24;
let dt = Utc
.timestamp_opt(value as i32 as i64 * NUM_SECONDS_IN_DAY, 0)
.unwrap();
format!("{}", dt.format("%Y-%m-%d %:z"))
}
Though my personal opinion is to change the type definition of Field and use the signed integer types for Date, TimestampMillis and TimestampMicros, because we can always have dates/times before Unix epoch. Thus, negative timestamps are required.
The text was updated successfully, but these errors were encountered:
I would be happy with either approach, changing the definition of field to be signed to match thr parquet datatypes makes a lot of sense to me
tustvold
changed the title
Parquet API: Cannot convert date before Unix epoch to json
Parquet Record API: Cannot convert date before Unix epoch to json
Jan 3, 2023
Great. I thought the same. Will write the code and commit in a few hours.
BTW another question that I forgot to ask above. Does it make sense to have a timezone while displaying just the date? The current implementation does it through format!("{}", dt.format("%Y-%m-%d %:z")), but since the date is always initialized from Utc, the timezone will always be +00:00.
Describe the bug
While using
parquet::file::reader::SerializedFileReader
, callingto_json_value
on a parquet Row, containing a date before 1970/01/01 panics.A similar issue #2897 was reported earlier and closed after #2899 solved it. But it only handled
Field::TimestampMillis
andField::TimestampMicros
types,Field::Date
still breaks.To Reproduce
Inside test
parquet::record::api::tests::test_convert_date_to_string
, add an extra call to functioncheck_date_conversion
with any date before 1970/01/01. Another way is to use a parquet file with at least oneRow
containing such value.This panics with the message
Expected behavior
Be able to call
Row::to_json_value
without panic.Additional context
Why #2899 works:
enum Field
definesDate, TimestampMillis, TimestampMicros
fields as:When
i32
ori64
are interpreted as their unsigned counterparts, the value underflows due to the now lack of a sign bit, e.g.,-7766_i32
becomes4294959530_u32
.When the new value is interpreted as its signed variant again, we get the original value (since the bytes never changed). Therefore
assert_eq!(4294959530_u32 as i32, -7766_i32)
succeeds.#2899 utilizes this fact to solve the problem. But the same code doesn't resolve it for
Field::Date
in the below snippet, because it holds au32
, and promoting it ani64
changes the value itself (since it treats the MSB of theu32
for value, and not sign, which was crucial). Below code doesn't work.The correct way would be to read
value
toi32
before promoting it to i64. Therefore below code works.Though my personal opinion is to change the type definition of
Field
and use the signed integer types forDate
,TimestampMillis
andTimestampMicros
, because we can always have dates/times before Unix epoch. Thus, negative timestamps are required.The text was updated successfully, but these errors were encountered: