-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve normalization of VendoredPath
s
#11991
Conversation
CodSpeed Performance ReportMerging #11991 will improve performances by 4.89%Comparing Summary
Benchmarks breakdown
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the many extra error types are worth the complexity, especially considering that vendored paths isn't something that comes from externally.
I generally like narrow error types, but they can be challenging to work with. metadata
and read
now return two different, disjoint errors. This can be frustrating because a function that calls both read
and metadata
now must define its own error type (or use anyhow
) to propagate both errors, but that's as good as just using io::Error
.
The other question is. Is it useful to know that reading a file failed because the path was invalid or because there's an other IO error? Would a caller handle these two cases differently? I would say probably not? I would probably call unwrap
for invalid paths because I know they are well formed (we control the construction) and there's nothing I can do about IO errors other than propagating them.
I'm leaning towards returning NotFound
if a path has too many ../
components. Prefixes are a bit more awkward. I'm leaning toward just panicking, because this is a dev error.
The alternative is to handle prefixes similar to root
where you take the starting prefix (win only) and concatenate it with the next component to get "UNIX" like semantics.
#[derive(Debug, thiserror::Error)] | ||
pub enum MetadataLookupError { | ||
#[error("{0:?} does not exist in the zip archive")] | ||
NonExistentPath(VendoredPathBuf), | ||
#[error("{0}")] | ||
InvalidPath(#[from] InvalidVendoredPathError), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dealing with different errors can often times be more difficult than just having one error, even if it contains more variant than are actually possible.
For example, someone calling metdata
and read
now needs to handle both MetadataLookupError
and ZipFileReadError
but there's no way to convert one error to the other. It forces the caller to define its own error type that is a union over the two.
if normalized_parts.pop().is_none() { | ||
return Err(InvalidVendoredPathError::EscapeFromZipFile); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly checked what the normal FS operations return when you navigate outside the cwd
. They just return a NotFound
error, which is probably also enough for us.
unsupported => { | ||
return Err(InvalidVendoredPathError::UnsupportedComponent( | ||
unsupported.to_string(), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that this is only to handle prefixes? I guess the problem we run into here is that camino::Utf8Path
path parsing is platform-dependent. I'm not sure how to solve this best but I'm also not sure if this special case is worth returning an Error
(we control the path creation, it's not that they're provided from externally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a camino::Utf8Component::RootDir
... which I don't think is possible in the middle of a file, but that can't be validated using the type system
TIL: This is what if path.as_os_str().is_empty() {
Err(io::const_io_error!(io::ErrorKind::InvalidInput, "cannot make an empty path absolute",))
} Uh, this is funny. Too many assert_eq!(
std::path::Path::new("a"),
std::path::Path::new(
"../../../../../../../../../../../../../../../../../../../../../../home"
)
.canonicalize()
.unwrap()
); Fails, and the path resolves to Taking all this into consideration. I would return |
Huh. Right, yeah, that makes sense. We're not trying to "escape from the zip archive"; we've just reached the root of the zip archive, and we can't go any further. |
I think the conclusion here is that none of the changes here are needed:
Thanks for talking this through with me, I appreciate it! |
Summary
This is a competing PR to #11989. Instead of making
VendoredPath
normalization eager, it makes the lazy normalization that we currently do more principled:..
path components that attempt to "escape" from the zip archive altogether (e.g.VendoredPath("..")
should be rejected)Result
instead of panicking if theVendoredPath
is invalid.Test Plan
cargo test -p ruff_db