Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Encapsulate module resolution logic in module.rs #11767

Merged
merged 4 commits into from
Jun 6, 2024

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Jun 5, 2024

This PR encapsulates all module-resolution logic in module.rs. Currently a Vec of search paths is passed into the routines in module.rs, but this opens the door to the search paths being passed in an incorrect order. The order of search paths should be maintained as an invariant via a common routine to avoid the possiblity of error.

ResolvedSearchPathOrder::new() implements the module resolution order given at https://typing.readthedocs.io/en/latest/spec/distributing.html#import-resolution-ordering, with some small differences (applying the changes proposed in https://discuss.python.org/t/pep-561-s-module-resolution-order-seems-incorrect/55048).

Work towards #11653

@AlexWaygood AlexWaygood force-pushed the refactor-set-module-paths branch from d514307 to f1facf4 Compare June 6, 2024 10:41
Copy link
Contributor

github-actions bot commented Jun 6, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@AlexWaygood AlexWaygood force-pushed the refactor-set-module-paths branch from f1facf4 to a2bb2e5 Compare June 6, 2024 10:56
@AlexWaygood AlexWaygood marked this pull request as ready for review June 6, 2024 13:13
@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Jun 6, 2024
@AlexWaygood
Copy link
Member Author

(Note that this PR essentially implements --custom-typeshed-dir internally, even though we don't yet support resolving to our vendored typeshed stubs.)

@@ -25,6 +25,7 @@ ctrlc = { version = "3.4.4" }
dashmap = { workspace = true }
hashbrown = { workspace = true }
indexmap = { workspace = true }
is-macro = { workspace = true }
Copy link
Member

@MichaReiser MichaReiser Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, nooo xD. This is a huge setback on my quest to remove is-macro. I prefer writing the is methods manually. Rust-analyzer and other ideas often struggle with what is-macro generats, but that's a personal preference. So, up to you if y ou want to keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, I should have been more opinionated. It hurts that this is merged haha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha, I'm sorry. I've never had a problem with Rust-analyzer not being able to understand these generated methods locally!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like what @MichaReiser really means by "rust-analyzer and other IDEs often struggle ..." is "RustRover, which I use, often struggles ..." ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That said, I'm fine with avoiding this kind of thing -- it's also easier for humans to read explicit methods than a proc macro.)

Comment on lines 48 to 49
let resolved_search_paths =
ResolvedSearchPathOrder::new(vec![], workspace_search_path, None, None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having read the implementation. Passing an empty vec and None twice seems a bit weird. Is this temporary until we e.g. have typing-shed paths?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading more through the code. Ignore my comment. What's nice about ResolvedSearchPathOrder is that I think this can be modeled very easily with salsa when we have settings.

All we need to do is to move it into a query that takes the type checker settings and it returns the resolved search paths.

Comment on lines +434 to +451
extra_paths
.into_iter()
.map(|path| ModuleSearchPath::new(path, ModuleSearchPathKind::Extra))
.chain(std::iter::once(ModuleSearchPath::new(
workspace_root,
ModuleSearchPathKind::FirstParty,
)))
// TODO fallback to vendored typeshed stubs if no custom typeshed directory is provided by the user
.chain(
custom_typeshed.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::StandardLibrary)
}),
)
.chain(site_packages.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::SitePackagesThirdParty)
}))
// TODO vendor typeshed's third-party stubs as well as the stdlib and fallback to them as a final step
.collect(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not using chain here would be easier to read

Suggested change
extra_paths
.into_iter()
.map(|path| ModuleSearchPath::new(path, ModuleSearchPathKind::Extra))
.chain(std::iter::once(ModuleSearchPath::new(
workspace_root,
ModuleSearchPathKind::FirstParty,
)))
// TODO fallback to vendored typeshed stubs if no custom typeshed directory is provided by the user
.chain(
custom_typeshed.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::StandardLibrary)
}),
)
.chain(site_packages.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::SitePackagesThirdParty)
}))
// TODO vendor typeshed's third-party stubs as well as the stdlib and fallback to them as a final step
.collect(),
let mut paths = extra_paths.into_iter().map(|path| ModuleSearchPath::new(path, ModuleSearchPathKind::Extra));
paths.push(ModuleSearchPath::new(
workspace_root,
ModuleSearchPathKind::FirstParty,
)));
paths.extend(custom_typeshed.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::StandardLibrary)
}));
paths.extend(site_packages.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::SitePackagesThirdParty)
}));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, to me it seems a fair bit more verbose but no more readable :/

--- a/crates/red_knot/src/module.rs
+++ b/crates/red_knot/src/module.rs
@@ -430,26 +430,40 @@ impl ResolvedSearchPathOrder {
         site_packages: Option<PathBuf>,
         custom_typeshed: Option<PathBuf>,
     ) -> Self {
-        Self(
+        let mut paths: Vec<ModuleSearchPath> = Vec::with_capacity(
+            extra_paths.len()
+                + 1
+                + usize::from(site_packages.is_some())
+                + usize::from(custom_typeshed.is_some()),
+        );
+
+        paths.extend(
             extra_paths
                 .into_iter()
-                .map(|path| ModuleSearchPath::new(path, ModuleSearchPathKind::Extra))
-                .chain(std::iter::once(ModuleSearchPath::new(
-                    workspace_root,
-                    ModuleSearchPathKind::FirstParty,
-                )))
-                // TODO fallback to vendored typeshed stubs if no custom typeshed directory is provided by the user
-                .chain(
-                    custom_typeshed.into_iter().map(|path| {
-                        ModuleSearchPath::new(path, ModuleSearchPathKind::StandardLibrary)
-                    }),
-                )
-                .chain(site_packages.into_iter().map(|path| {
-                    ModuleSearchPath::new(path, ModuleSearchPathKind::SitePackagesThirdParty)
-                }))
-                // TODO vendor typeshed's third-party stubs as well as the stdlib and fallback to them as a final step
-                .collect(),
-        )
+                .map(|path| ModuleSearchPath::new(path, ModuleSearchPathKind::Extra)),
+        );
+
+        paths.push(ModuleSearchPath::new(
+            workspace_root,
+            ModuleSearchPathKind::FirstParty,
+        ));
+
+        // TODO fallback to vendored typeshed stubs if no custom typeshed directory is provided by the user
+        paths.extend(
+            custom_typeshed
+                .into_iter()
+                .map(|path| ModuleSearchPath::new(path, ModuleSearchPathKind::StandardLibrary)),
+        );
+
+        paths.extend(
+            site_packages.into_iter().map(|path| {
+                ModuleSearchPath::new(path, ModuleSearchPathKind::SitePackagesThirdParty)
+            }),
+        );
+
+        // TODO vendor typeshed's third-party stubs as well as the stdlib and fallback to them as a final step
+
+        Self(paths)
     }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I probably would forgo the with_capacity optimisation which you also won't get with std::iter::chain because ExactSizeIterator isn't implemented for chained (because it could overflow).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which you also won't get with std::iter::chain because ExactSizeIterator isn't implemented for chained (because it could overflow)

Ahh, I didn't realise that! But it makes sense now that you mention it.

I still kinda like the chain() personally, though 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could also shrink_to_fit the paths before returning, but probably doesn't matter given that few of these are ever created.

/// - `custom_typeshed` is a path to standard-library typeshed stubs.
/// Currently this has to be a directory that exists on disk.
/// (TODO: fall back to vendored stubs if no custom directory is provided.)
impl ResolvedSearchPathOrder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably call this ResolvedSearchPathOrder because an instance of it doesn't implement an order, it's the ordered search paths.

An alternative would be that SearchPathOrder sorts the paths: Given a list of ModuleSearchPath, sort the paths according to PEP-561 (using the `ModuleSearchPathKind)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably call this ResolvedSearchPathOrder

Not sure I understand -- that's what it's already called

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you mean, though. I renamed it to OrderedSearchPaths in 4f0bcf5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry. That should have been *I probably wouldn't *. I didn't sleep well tonight and it is really showing. Sorry for this.

Comment on lines +427 to +433
pub fn new(
extra_paths: Vec<PathBuf>,
workspace_root: PathBuf,
site_packages: Option<PathBuf>,
custom_typeshed: Option<PathBuf>,
) -> Self {
Self(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a slightly nicer API (requires a few more allocations but whathever, we build this once) is to use a builder:

ResolvedSearchPathBuilder::workspace(workspace_root)
    .extras(extras)
    .site_packages(site_packages)
    .typeshed(typeshed)
    .build()

where ResolvedSearchPathBuilder is

struct ResolvedSearchPathBuilder {
    extras: Vec<PathBuf>,
    workspace_root: PathBuf,
    typeshed: Option<PathBuf>,
    site_packages: Option<PathBuf>
}

But I don't think it realy maters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I considered this, but here I think I prefer to force us to explicitly state whether we're passing in site-packages or whether we want to run in "isolated mode", ignoring all site-packages packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did almost comment on the call to OrderedSearchPaths::new above that it's hard to read/understand the call in code review, without IDE support, and without Python keyword arguments. Not sure how much this matters. Another way to address it would be taking in a config struct as argument instead of just positional args.

Copy link
Member

@MichaReiser MichaReiser Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore it for now. Ultimately what we'll do is to read the settings and take the fields from there (or fall back to other defaults).

@AlexWaygood AlexWaygood enabled auto-merge (squash) June 6, 2024 14:27
@AlexWaygood AlexWaygood merged commit 303ef02 into main Jun 6, 2024
18 checks passed
@AlexWaygood AlexWaygood deleted the refactor-set-module-paths branch June 6, 2024 14:31

// e.g. site packages
ThirdParty,
/// Project dependency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the word "dependency" here confusing. I wouldn't think of first-party code as a "dependency" -- the dependencies probably live in site-packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just carried over the pre-existing comment here, but yeah, agree that this isn't the best terminology. I'll fix it up in a followup.

@@ -25,6 +25,7 @@ ctrlc = { version = "3.4.4" }
dashmap = { workspace = true }
hashbrown = { workspace = true }
indexmap = { workspace = true }
is-macro = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That said, I'm fine with avoiding this kind of thing -- it's also easier for humans to read explicit methods than a proc macro.)


// e.g. built-in modules, typeshed
/// e.g. built-in modules, typeshed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the stdlib portion of typeshed

@@ -225,22 +225,23 @@ struct ModuleSearchPathInner {
kind: ModuleSearchPathKind,
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)]
#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash, is_macro::Is)]
pub enum ModuleSearchPathKind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a comment that these are listed here in priority order? And even a link to the relevant page of the typing spec (soon hopefully to be updated to match what we're doing here)?

Comment on lines +427 to +433
pub fn new(
extra_paths: Vec<PathBuf>,
workspace_root: PathBuf,
site_packages: Option<PathBuf>,
custom_typeshed: Option<PathBuf>,
) -> Self {
Self(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did almost comment on the call to OrderedSearchPaths::new above that it's hard to read/understand the call in code review, without IDE support, and without Python keyword arguments. Not sure how much this matters. Another way to address it would be taking in a config struct as argument instead of just positional args.

Comment on lines +434 to +451
extra_paths
.into_iter()
.map(|path| ModuleSearchPath::new(path, ModuleSearchPathKind::Extra))
.chain(std::iter::once(ModuleSearchPath::new(
workspace_root,
ModuleSearchPathKind::FirstParty,
)))
// TODO fallback to vendored typeshed stubs if no custom typeshed directory is provided by the user
.chain(
custom_typeshed.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::StandardLibrary)
}),
)
.chain(site_packages.into_iter().map(|path| {
ModuleSearchPath::new(path, ModuleSearchPathKind::SitePackagesThirdParty)
}))
// TODO vendor typeshed's third-party stubs as well as the stdlib and fallback to them as a final step
.collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could also shrink_to_fit the paths before returning, but probably doesn't matter given that few of these are ever created.

Comment on lines +673 to +675
if module_search_path.kind().is_standard_library() {
package_path = package_path.join(TYPESHED_STDLIB_DIRECTORY);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is so when people point to a custom typeshed, they can just point to the root of the typeshed repo?

What about instead resolving this once upfront in the way we set our stdlib and typeshed-third-party search paths, so we don't have to do it on every resolution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is so when people point to a custom typeshed, they can just point to the root of the typeshed repo?

Yup. That's what mypy and pyright do currently, and it's what I would expect a --custom-typeshed-dir argument to do.

What about instead resolving this once upfront in the way we set our stdlib and typeshed-third-party search paths, so we don't have to do it on every resolution?

Could do! I decided to go this way so that calling .path() on the ModuleSearchPath instance would give you the path for typeshed rather than the path for typeshed/stdlib. That felt like it made the most sense here. As discussed elsewhere, I think we're going to have to do different things for each different kind of module search path anyway, so we're not going to be able to treat all paths in the resolved vector the same way even if this module search path pointed to typeshed/stdlib rather than just typeshed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it seems more useful if .path() on the ModuleSearchPath instance points to the actual path imports will be resolved from.

And agreed that foo-stubs packages mean special handling for resolutions in site-packages specifically, but in general the more work we can do upfront and the less work we can do on resolution of each package, the better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, no strong opinion from me on this; I can change this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants