-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot]: Add a VendoredFileSystem implementation #11863
Conversation
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
00edf56
to
0cbc590
Compare
8bc3384
to
4441c9e
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
Overall looks good, but we should address the following items before merging:
- Use regular functions for tests instead of macros
- Remove
FileSystem
implementation and useVendoredFilePath
andVendoredFilePathBuf
(and move the definitions into `vendored.rs) - Move the implementation out of the
vendored
directory. - If we can, remove the probing where we append a slash at the end of the path when reading failed. This seems like trial and error.
One open question for me is whether we should create a dedicated ruff_vendored
crate that also contains all vendored files (and contains the build.rs
). The benefit of that would be that integration tests can use the actual vendored files, which I think would be desired (but use stubed vendored files for unit tests). If that's undesired, I think leaving it in ruff_db
is fine.
/// A path that has been normalized via the `normalize_vendored_path` function. | ||
/// | ||
/// Trailing slashes are normalized away by `camino::Utf8PathBuf`s, | ||
/// but trailing slashes are crucial for distinguishing between | ||
/// files and directories inside zip archives. | ||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
struct NormalizedVendoredPath(String); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could combine this with the VendoredFilePath
and VendoredFilePathBuf
where we require the use of VendoredFilePathBuf::new
(that normalizes the string).
I'm guessing you mean |
821cbef
to
bd94c60
Compare
I wasn't sure while writing this the extent to which we'd want to tie the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall looks good but I think we can make it much simpler by removing all code that's only necessary for handling directories. From what I understand, we won't need directory support to implement module resolution. All we need is to lookup a specific file path. Or are there operations that need to know about the existence of a directory?
|
||
#[derive(Debug)] | ||
pub struct VendoredFileSystem { | ||
inner: VendoredFileSystemInner, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would remove the inner file type. It doesn't provide that much functionality but the nesting becomes somewhat hard to understand. It also allows to remove other wrapper types, reducing the overall code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of one of the other wrapper types in f4f97d4, but I think I would prefer to keep using an inner file type for this. I tried changing it to a type alias, and in my opinion it made the VendoredFileSystem
unduly coupled to the inner representation that used a lock/RefCell; I also found it less readable.
fn lookup_path(&mut self, path: &NormalizedVendoredPath) -> Result<ZipFile> { | ||
Ok(self.0.by_name(path.as_str())?) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should lookup_path
take a &VendoredPath
instead and do all the normalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the "trailing slashes are semantically significant for zip archives" thing combined with the "ZipFile::by_name()
requires a mutable reference" thing, it's hard for me to see how we could do this while keeping the borrow checker happy and avoiding doing path normalization twice. It doesn't seem possible to call self.0.by_name()
twice in lookup_path()
while keeping the borrow checker happy, unfortunately (unless I'm missing something), and because of the trailing-slash thing, we need to call it twice in lookup_path()
if we're going to do the normalization inside lookup_path()
.
Co-authored-by: Micha Reiser <micha@reiser.io>
96159d9
to
6702893
Compare
6702893
to
b0fe7c0
Compare
b0fe7c0
to
f05bcb2
Compare
2348fe6
to
2cd137b
Compare
2cd137b
to
a43a8e6
Compare
crates/ruff_db/src/vendored.rs
Outdated
let normalized = normalize_vendored_path(path); | ||
let inner_locked = self.inner.lock(); | ||
let mut archive = inner_locked.borrow_mut(); | ||
archive.lookup_path(&normalized).is_ok() | ||
|| archive | ||
.lookup_path(&normalized.with_trailing_slash()) | ||
.is_ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe just self.metadata(path).is_some()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't resist micro-optimising this 😄 (this avoids calling zip_file.crc32()
and zip_file.is_dir()
, which aren't necessary if you only need to know whether the path exists or not)
Co-authored-by: Micha Reiser <micha@reiser.io>
38990d5
to
5d511bb
Compare
Summary
Work towards #11653.
This PR adds a
VendoredFileSystem
implementation to theruff_db
crate, the idea being that this is the filesystem we would use for resolving vendored typeshed stubs.Some of this feels a bit awkward, but I'm not sure there's a better way. I ran into two problems while writing this PR:
ZipArchive
struct exposed by thezip
crate takes&mut self
for most of the methods that query the underlying archive for information about files or directories within the archive. This means that in order to query zipped data from trait method implementations that take&self
, we end up having to clone the underlying archive. I think that's okay, as thezip
docs say that it should be cheap to clone (at least right now), but it didn't fill me with joy.camino
crate strips trailing slashes in a lot of its methods, as they're not semantically significant for most paths. However, for zip archives, trailing slashes are very significant: if astdlib/
directory exists inside the zip,zip.by_name("stdlib")
will not resolve to that directory (onlyzip.by_name("stdlib/")
will). I've attempted to abstract over this difference to other kinds of paths, so that theVendoredFileSystem
is consistent with the otherFileSystem
implementations in this respect. But it was slightly tricky to do so.Test Plan
I added lots of parameterized tests.