-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: properly decode percent-encoded file paths coming from parquet checkpoints #1970
fix: properly decode percent-encoded file paths coming from parquet checkpoints #1970
Conversation
…oming from parquet checkpoints, to prevent tombstone and file paths mismatch (e.g. file path is read from checkpoint while tombstone path is read from JSON)
#[test] | ||
fn test_encode_path() { | ||
let cases = [ | ||
( | ||
"string=$%25&%2F()%3D%5E%22%5B%5D%23%2A%3F.%3A/part-00023-4b06bc90-0678-4a63-94a2-f09af1adb945.c000.snappy.parquet", | ||
"string=$%2525&%252F()%253D%255E%2522%255B%255D%2523%252A%253F.%253A/part-00023-4b06bc90-0678-4a63-94a2-f09af1adb945.c000.snappy.parquet", | ||
), | ||
( | ||
"string=$%25&%2F()%3D%5E%22<>~%5B%5D%7B}`%23|%2A%3F%2F%5Cr%5Cn.%3A/part-00023-e0a68495-8098-40a6-be5f-b502b111b789.c000.snappy.parquet", | ||
"string=$%2525&%252F()%253D%255E%2522%3C%3E~%255B%255D%257B%7D%60%2523%7C%252A%253F%252F%255Cr%255Cn.%253A/part-00023-e0a68495-8098-40a6-be5f-b502b111b789.c000.snappy.parquet" | ||
), | ||
( | ||
"string=$%25&%2F()%3D%5E%22<>~%5B%5D%7B}`%23|%2A%3F%2F%5Cr%5Cn.%3A_-/part-00023-346b6795-dafa-4948-bda5-ecdf4baa4445.c000.snappy.parquet", | ||
"string=$%2525&%252F()%253D%255E%2522%3C%3E~%255B%255D%257B%7D%60%2523%7C%252A%253F%252F%255Cr%255Cn.%253A_-/part-00023-346b6795-dafa-4948-bda5-ecdf4baa4445.c000.snappy.parquet" | ||
) | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we are losing test coverage here. Why are we deleting this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is duplicated in src\kernel\actions\serde_path.rs, including the test. Likely a leftover.
Description
When read from parquet checkpoints, the Add and Remove file paths are not percent-decoded, in contrary to when read from JSON transaction logs. That causes file paths mismatch (e.g. percent-encoded Add file path is read from checkpoint while the same tombstone path is read from JSON), and can result in tombstone files being counted as active.