You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I check my home directory into git. My home directory contains .cargo, my CARGO_HOME directory. When I write a Parquet file, its FileMetaData contains:
created_by: Some(
"parquet-rs version 5.0.0 (build 3ef76a677716df403a13964a58351abe37c1754d)",
),
That SHA is of a commit in my home directory, not in Parquet, and not in the project using Parquet.
I have a test in the project that verifies the size of the parquet file data, and the test was failing for me because the content was 49 bytes too much, the exact size of the extra content above. I verified that in CI, the test passes, and the FileMetaData under test contains:
created_by: Some(
"parquet-rs version 5.0.0",
),
To Reproduce
Check your home directory into git, or alternately set CARGO_HOME to a directory in a git repository.
Generate a parquet file and check the metadata.
Observe the created_by contains a hash from the git directory CARGO_HOME is in.
I'm not sure if it's going to be possible to create a failing test for this given the environmental aspect... the current test only checks that the created_at value is the value of the PARQUET_CREATED_BY environment variable but the problem is what gets in the PARQUET_CREATED_BY environment variable in the first place.
Expected behavior
I expected to get the exact same Parquet file content whether my home directory is checked into Git or not 🤣
Additional context
The PARQUET_CREATED_BY environment variable is set in the build script if git rev-parse HEAD returns a value. Considering this is only getting set if you have a non-standard setup like I do, I think this should just be removed entirely. I'm going to prepare a PR for discussion with this solution :)
The text was updated successfully, but these errors were encountered:
So that Parquet files will contain the same content whether or not your
home directory is checked into Git or not ;)
Fixes#589.
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Describe the bug
I check my home directory into git. My home directory contains
.cargo
, myCARGO_HOME
directory. When I write a Parquet file, itsFileMetaData
contains:That SHA is of a commit in my home directory, not in Parquet, and not in the project using Parquet.
I have a test in the project that verifies the size of the parquet file data, and the test was failing for me because the content was 49 bytes too much, the exact size of the extra content above. I verified that in CI, the test passes, and the
FileMetaData
under test contains:To Reproduce
CARGO_HOME
to a directory in a git repository.created_by
contains a hash from the git directoryCARGO_HOME
is in.I'm not sure if it's going to be possible to create a failing test for this given the environmental aspect... the current test only checks that the
created_at
value is the value of thePARQUET_CREATED_BY
environment variable but the problem is what gets in thePARQUET_CREATED_BY
environment variable in the first place.Expected behavior
I expected to get the exact same Parquet file content whether my home directory is checked into Git or not 🤣
Additional context
The
PARQUET_CREATED_BY
environment variable is set in the build script ifgit rev-parse HEAD
returns a value. Considering this is only getting set if you have a non-standard setup like I do, I think this should just be removed entirely. I'm going to prepare a PR for discussion with this solution :)The text was updated successfully, but these errors were encountered: