-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the module structure of the namespace module #6291
base: master
Are you sure you want to change the base?
Conversation
…ate-namespace-module
…ernal-modules-to-root
…ernal-modules-to-root
CodSpeed Performance ReportMerging #6291 will degrade performances by 10.82%Comparing Summary
Benchmarks breakdown
|
…package_root (for UseStatement)
|
||
/// The set of items that represent the namespace context passed throughout type checking. | ||
#[derive(Clone, Debug, Default)] | ||
#[derive(Clone, Debug)] | ||
pub struct Namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the need for the separate Namespace
and Root
types now, but probably better to leave that be for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to keep track of the current module path during the traversal. Namespace
is currently responsible for that, but I guess we could do that some other way. It doesn't feel like it belongs in Root
, though, since it's responsibility is to contain the module tree of the package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I agree that's an issue that's better to leave for a future PR.
self.is_contract_package | ||
} | ||
|
||
pub fn add_span_to_root_module(&mut self, span: Span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this single-use one liner method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced it in order to improve the encapsulation of the namespace module. Without it we need to introduce current_package_root_module_mut
, and set the span on the resulting Module
.
I've committed the change for review, and I guess it's closer to idiomatic Rust this way, but I have to say I dislike the solution. I prefer to call methods that cause the owner of an object to mutate the object, rather than call methods that return mutable borrows of the object. In this case it's just a single call site, but generally we end up with mutating code all over the place, and I do find it harder to maintain.
// The contents of the package being compiled. | ||
current_package: Module, | ||
// True if the current package is a contract, false otherwise. | ||
is_contract_package: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about dropping this out of here?
That would simplify the constructor here, and clean contracts out of the namespace.
It does mean we'd have to do the implicit import of CONTRACT_ID
out of this file, but seems like it would be worth it and clean some complexity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I'm not sure how I feel about that.
I think the implicit import of CONTRACT_ID
is an improvement over what we had before, where a new declaration of CONTRACT_ID
was inserted into each module. We need to perform that import for each module, so we have to have is_contract_package
(or logic that computes it) in the namespace module somewhere.
I suppose we could move it to the Namespace
struct instead. The consequence would be that we have to pass is_contract_package
through from forc-pkg
down to the place where we create Namespace
structs in parsed_to_ast
.
Alternatively the Namespace
constructor could inspect the root module to see whether CONTRACT_ID
is declared there, but that feels like a bit of a hack.
Co-authored-by: João Matos <joao@tritao.eu>
…ernal-modules-to-root
…Labs/sway into jjcnn/move-external-modules-to-root
Description
Fixes #5498.
This PR contains a major refactoring of the namespace module, which is responsible for keeping track of the symbol bindings in the program. The refactoring is intended to reduce memory allocation during compilation, and also fixes a number of minor bugs related to path resolution.
Advice to the reviewer:
I recommend reviewing this PR in 3 steps:
sway-core/src/semantic_analysis/namespace/root.rs
,sway-core/src/semantic_analysis/namespace/namespace.rs
andsway-core/src/language/call_path.rs
. These files contain the major datastructure changes, and if anything in those files needs to be addressed, then there is limited value in reviewing the rest of the code changes.Remaining issues
Changes made by this PR
Representation of the dependency graph (main file:
sway-core/src/semantic_analysis/namespace/root.rs
)Previous: External libraries were represented as submodules of every module in the program. This meant that every module in every external package was cloned for every module in the program, and recursively so, since external libraries could also have dependencies that required clones for every module. In particular,
std
hascore
as a dependency:Changes in this PR: External libraries are now represented only at the package root. The root module is given the same name as the package so that paths beginning with the package name can be resolved as a module path. If multiple libraries depend on the same external library, then they each contain a clone (this can be resolved a at later stage by introducing a global table of libraries).
Reasoning: This change dramatically reduces the amount of cloning of external libraries. The cloning is not eliminated completely, since multiple libraries may have the same library as a dependency (e.g.,
core
is used as a dependency for both the program andstd
, so if the program depends onstd
, then there will be two clones ofcore
in the tree). The remaining cloning can be eliminated by introducing a global table of libraries, which should be doable without making significant changes to the path resolution algorithm.Callpath representation (main file: sway-core/src/language/call_path.rs)
Previous: Paths to externals were considered relative to current module. This worked because all externals existed a submodules of every module. Paths inside the current package were ambiguous, and were either interpreted relative to current module or relative to the root module. The root module did not have a name, so the empty path was used to indicate the root module.
Now: All full paths start with the name of the package, which is also the name of the root module. If the first identifier is not the name of the current package, then we look in the externals of that package, and if the name isn't found there we throw an error. Note that the name of an external is defined in Forc.toml, and may be different from the name used inside the external library (e.g., a library may call itself
X
, but a project may useX
as an external under the nameY
). This means that any path that escapes from a library must have its package name changed.Reasoning: This is all a conseqence of the change to the representation of the dependency graph. We need to be able to the rely on the package name in order to resolve the path within the correct pacakge. Additionally, if a path escapes from a dependency to a dependent, then the package name must be changed to what the package is called in the dependent.
CallPath.callpath_type (main file: sway-core/src/language/call_path.rs)
Previous: The CallPath struct contained a flag
is_absolute
, which, if set, was used to mean two different things. For parsed paths it was used to signal that the path started with::
, i.e., that the path should be resolved relative to the current package root. For paths generated during typechecking the flag was used to indicate that the path was a full path, i.e., that it should either be resolved relative to the current package root or that it was a path to an external library.Now:
is_absolute
has been replaced withcallpath_type
, which has 3 possible values:RelativeToPackageRoot
, which indicates that the path started with::
,Full
, which indicates that the path starts with the package name, andAmbiguous
, which indicates that the path should either be resolved relative to the current module, or that it is a full path referring to an external library.Reasoning: A path has 3 different states, so a boolean flag is not sufficient to represent all states.
Path representation (main file: sway-core/src/language/call_path.rs)
Previous: Full paths always pointed to the module containing the declaration of the thing that the path resolves to.
Now: Full paths point to a module where the suffix is bound. This may not be the module containing the declaration of the thing the suffix is bound to, because the binding may be the result of an import. The trait map is special in that the path to the actual declaration is used as a key in the map, so for traits we use the canonical path, which point to the module where the declaration resides.
Reasoning: The previous definition of full paths required a duplication of the symbol resolution algorithm in order to construct the full path, and it was getting very difficult to maintain. In the vast majority of cases we only need the full path to indicate somewhere where the symbol can be resolved, and don't actually need to know where the declaration resides. In the few cases where we do need to know the path to the declaration (the trait map uses these paths as keys) we can generate the canonical path by first converting to the full path, and then resolving the path and using the path stored with the binding in the lexical scope, since this is the path to the actual declaration. Additionally, the previous solution was only able to generate full paths for paths where the suffix is an identifier - for other suffix types we could not generate a full path. This has now been changed so that we can generate full paths for all paths, although canonical paths are still restricted to paths with an identifier as the suffix. This also moves us closer to a situation where we can represent semantic (canonical) paths separately from syntactic callpaths.
Namespace.init removed (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)
Previous: The
Namespace
struct contained aninit
module which was used as a template for new modules, a clone of which was created every time the collection pass or the typechecker entered a submodule. This template contained all external libraries as submodules, as well as (for contract pacakges) a (new) declaration ofCONTRACT_ID
.Now: The
init
module has been removed. When the collection pass or the typechecker enters a new submodule,Namespace
creates a newModule
(if none exists yet for the relevant submodule), and populates it with implicit imports (core::prelude::*
andstd::prelude::*
, and for contract packagesCONTRACT_ID
).Reasoning: The current solution makes it significantly clearer what is in a new
Module
object. Additionally, the declaration ofCONTRACT_ID
is no longer duplicated in each module - it is now declared once in the root module, and imported everywhere else.namespace encapsulation (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)
Previous: The
Namespace
struct contained methods to extract the underlying datastructures in thenamespace
module.Now: An attempt has been made to encapsulate the
namespace
module a little bit more, so that changes and lookups to the datastructures in thenamespace
module are done by calling methods on the Namespace struct rather than letting the caller extract the datastructures and manipulating them directly.Reasoning: When the datastructures are manipulated in many different places it makes it very hard to track down all the places that need to be changed when making changes to the datastructure. The current solution is not perfect, but it does encapsulate things a bit better than before.
Minor changes and improvements
A few minor improvements have been made:
core
,std
, etc.) is now referred to as a package. (Note: TheRoot
struct should probably be renamed toPackage
for consistency)type_resolve.rs
had to be updated because path resolution was changed. The current version is not perfect, but it does improve the faulty check we had before.TyModule
so far contained a clone of theNamespace
struct in the state it was in when typechecking of the relevant module was done. TheNamespace
object was never used except for the one in the root module, so I have moved theNamespace
clone toTyProgram
, thus eliminating quite a bit of unnecessary cloning (the Namespace struct owns the current packageRoot
object, and thus cloning theNamespace
struct clones the entire dependency graph).Updated tests
A number of tests have been updated for various reasons:
core::str_slice
was referenced using the path::core::str_slice
. This is invalid, but sincecore
used to be treated as a submodule of every module in the program, the path resolved anyway. (This also shows that the invalid paths in issue Fix performance and semantic problem when creatingModule
s and makeModule::name
mandatory #5498 are now actually treated as invalid - I have not added regression tests of that issue, since it seems unlikely that we will reintroduce the bug).Unresolved issues
The
Namespace
struct is cloned before the symbol collection phase. This is expensive, since theNamespace
struct at that point contains the entire dependency graph (minus the modules of the current package). Ideally we would like the symbol collection phase and the typechecker to use the sameNamespace
struct, but at the moment the Module struct is unable to perform updates from parsed to typechecked declarations, so this isn't possible. The cloning happens insway-core/src/lib.rs
andsway-core/src/semantic_analysis/namespace/contract_helpers.rs
. Once we represent externals as a global table we should be able to change theRoot.externals
map to only use references into the global table, which will make the cloning step much cheaper.contract_helpers.rs
throws a regular compiler error if typechecking fails, but that should be an ICE, since it is generated code that is failing to typecheck.Paths may in some cases escape from external packages to the current package. If this happens across multiple packages, then this may result in the path no longer being valid. For instance, if A depends on B, which in turn depends on C, then a path may escape from C via B into A, but since A does not have C as a direct dependency, then the path can no longer be resolved. This again can be fixed once we introduce a global table of externals.
The visibility check in
type_resolve.rs
is insufficient, because it doesn't handle enum variants and associated types correctly.Checklist
Breaking*
orNew Feature
labels where relevant.