-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restricted import system with named roots (path spec v2) #11138
Comments
Couldn't yet fully digest this. In short, you are taking the remapping concept one level higher and are making the Can you maybe write some pseudo-code that starts with an import statement and ends with loading a file from the host filesystem and storing it under a vfs path name? |
Yes. There are no real changes in import syntax or Standard JSON content or how you invoke the compiler on the CLI but there are new limitations and some changes in how paths you specify in all these places are interpreted. Basically, this forces the user to use remappings when the situation gets more complex than a single directory with all files under it. And ensures that all projects will use and interpret these remappings the same way.
No, that's exactly what I meant. But for example in Remix you would want both because that's the situation where you want URLs both in imports and in the "paths" that the callback gets. The main difference with the current situation would be that the compiler internally sees
Sure. Here's a short and very informal version:
For an import statement you would call If you wanted to get a VFS path from something specified on the command line or in |
Also, here's a more concrete version of the pseudocode, using Python syntax). I wrote this one first but then I thought it might a bit too long and detailed and I rewrote it into the simplified one I posted above. If the less formal version above is enough and explains everything, you can just skip this one. (IMPORT_REMAPPINGS, ROOT_REMAPPINGS) = parse_remappings(remappings)
def parse_remappings(remappings):
import_remappings = {}
root_remappings = {}
for remapping in remappings:
(context, prefix, target) = split_remapping(remapping)
if target.startswith("@") and not target.startswith("@@"):
import_remappings[prefix] = {"context": context, "target": target}
else:
if not prefix.startswith("@") or prefix.startswith("@@"):
raise InvalidRootRemapping()
root_remappings[prefix] = target
for url_prefix_or_root, import_remapping in import_remappings:
validate_root(import_remapping.target)
if url_prefix_or_root.startswith("@") and not url_prefix_or_root.startswith("@@"):
validate_root(url_prefix_or_root)
validate_root(import_remapping.context)
if root_remapping.context not in root_remappings:
raise InvalidRemapping()
else:
if import_remapping.context != "" or not is_url(url_prefix_or_root):
raise InvalidRemapping()
for root, root_remapping in root_remappings:
validate_root(root)
if "@/" not in root_remappings:
root_remappings["@/"] = get_current_working_directory()
return (import_remappings, root_remappings)
def import_path_to_source_unit_name(import_path, current_source_unit_name):
local_import_remappings = filter_remappings(IMPORT_REMAPPINGS, context=get_root(current_source_unit_name))
if import_path.startswith("@/"):
root = get_root(current_source_unit_name)
source_path = import_path.removeprefix("@/")
elif import_path.startswith("@") and not import_path.startswith("@@"):
root = get_root(import_path)
source_path = import_path.removeprefix(root)
elif import_path.startswith("./"):
root = get_root(current_source_unit_name)
source_path = dirname(current_source_unit_name.removeprefix(root)) + import_path.removeprefix("./")
elif is_url(import_path):
(prefix, remainder) = find_longest_matching_prefix(import_path, local_import_remappings.keys())
root = prefix
source_path = remainder
else:
raise InvalidImportPath()
if normalize_import_path(source_path) != source_path:
raise InvalidImportPath()
if not validate_root(local_import_remappings[root]):
raise InvalidImportPath()
return local_import_remappings[root] + normalize_import_path(source_path)
def source_unit_name_to_filesystem_path(source_unit_name):
root = get_root(source_unit_name)
source_path = get_source_path(source_unit_name)
if root not in ROOT_REMAPPINGS:
raise RootResolutionFailed()
return ROOT_REMAPPINGS[root] + source_path
def filesystem_path_to_source_unit_name(filesystem_path):
if filesystem_path.startswith("@"):
if not filesystem_path.startswith("@@"):
filesystem_path = source_unit_name_to_filesystem_path(filesystem_path)
else:
filesystem_path = filesystem_path.removeprefix("@")
filesystem_path = normalize_path(to_unix_format(make_absolute(filesystem_path, relative_to=get_current_working_directory())))
root = find_root_containing_path(filesystem_path, ROOT_REMAPPINGS)
source_path = filesystem_path.removeprefix(ROOT_REMAPPINGS[root])
return root + source_path |
A small update: I just realized that context is only useful in remappings of the form |
We discussed the spec today on the channel and later on the call. Some changes were proposed. Here's what will change: From the channel:
From the call:
Also, some general feedback from the call:
|
Related to #11105 and #9353.
Abstract / Motivation
This proposal changes the way paths in import statements, on the CLI and in Standard JSON are handled by the compiler and translated into internal source unit IDs.
The goal is to make imports more intuitive by directly exposing user to the way compiler identifies files internally. The current system hides the abstraction that happens between the actual filesystem and compiler's virtual filesystem and makes users expect import paths to behave like filesystem paths even though they work differently.
The change is meant to preserve a forward-compatible subset of the old syntax to make it possible to have the same files compile with both old and new compiler by only changing the remappings and compiler options.
The syntax for named roots intentionally follows the established convention of using
@
placeholders in imports.To avoid changing the meaning of existing syntax in a confusing way relative imports of the form
import "project/contract.sol";
are disallowed rather than made equivalent toimport "./project/contract.sol";
even though having both work the same way would be quite intiuitive.Examples
import "/project/contract.sol";
looks like an absolute path and indeed will load/project/contract.sol
in the simplest case. It is however relative to--base-path
, just likeimport "project/contract.sol";
.import "./contract.sol";
is relative to the current source file whileimport "contract.sol";
is relative to the base path (or current working directory if base path is not set). This distinction is not obvious since in the shell both paths are equivalent and lead to the same file.project/contract.sol
andproject//contract.sol
are seen as two completely different files (and actually can be different files when the source is provided via Standard JSON) but cause the same file to be loaded from the filesystem. The resulting errors are confusing if the user is not aware of how the compiler decides whether files are distinct or not.../
or./
are normalized, but only partially. If../project//contract.sol
is imported from/work//contracts/../token.sol
, the path resolves into/work//contracts/contract.sol
. Note..
being treated as an actual directory and//
in one part not being replaced with/
.solc contract.sol
will be seen by the compiler as the same asimport "contract.sol";
but if we go to the parent directory and compile it assolc dir/contract.sol
it will be seen as a different file and compiled twice.../
no longer works as relative to the source file. It's now relative to the current working directory because the remapping happens after the relative paths are resolved.There are more examples listed in #11036. While they were originally reported as bugs, ultimately most of them are actually just unintuitive side-effects of the current design that mostly show up in corner cases.
Specification
Overview
Paths given in import statements, on the command line and in Standard JSON are used for two purposes:
A source unit ID consists of a named or unnamed root and a source path. E.g.
@openzeppelin/utils/math/Math.sol
or@/contracts/token.sol
.There are several ways files can get into the virtual filesystem. The most important one is an import statement. Paths in import statements can be specified in three ways:
@openzeppelin/utils/math/Math.sol
../math/Math.sol
is equivalent to@openzeppelin/utils/math/Math.sol
when imported from@openzeppelin/utils/Arrays.sol
.https://github.com/OpenZeppelin/openzeppelin-contracts/contracts/utils/math/Math.sol
.For a remote import to be valid, user needs to assign a named root to a matching prefix (on the CLI or in Standard JSON). For example
https://github.com/OpenZeppelin/openzeppelin-contracts/contracts=@openzeppelin
. After the remapping, the path is processed as if it were a direct import. It's also possible to remap one named root to another (e.g.@openzeppelin=@oz
). Every remapping to a named root becames a part of contract metadata because the mapping happens between the import path and the source unit ID and changing it may affect the result of the compilation even if the source stays they same.In typical usage named roots represent libraries or independent submodules of your project. The main project itself is represented just by
@
.@
is special in that it can represent different directories, depending on where it is used. When used in a file located under some named root it represents that root. This way, when writing a library you can safely refer to its root just as@
(i.e.import "@/utils/math/Math.sol";
). A standalone project using your library can refer to library files via a named root (import "@openzeppelin/utils/math/Math.sol";
) and use@
for its own files without a conflict. The substitution happens when import path is translated into a source unit ID - in the virtual filesystem the source IDs of library files always contain the full named root.To be able to locate the file and load it the compiler passes its source unit ID to the source loader. The loader determines how roots translate to specific locations. In case of the command-line compiler, locations must be existing directories. All named roots must be explicitly mapped for a contract to be compilable. The unnamed root is by default mapped to compiler's working directory but can also be explicitly remapped.
Files on the command line can be specified in two ways:
../contracts/contract.sol
orC:\project\contracts\contract.sol
.@openzeppelin/utils/math/Math.sol
and have the compiler resolve it by passing it to the source loader.When supplying files using Standard JSON, you always specify source IDs yourself. These IDs must of course contain a named or unnamed root. E.g.
math/Math.sol
is not a valid source unit ID.Instead of supplying the source as a part of the JSON file (via the
content
key) you can specify its location (via theurls
key). It can be a path or an URL and whether it can be successfully resolved depends on the compiler interface you use. The command-line interface can only resolve filesystem paths and source unit IDs. The JavaScript interface can also handle URLs or even arbitrary identifiers - it's all up to the user-defined callback.Many details in the above description were intentionally omitted to keep it concise. Additional sections below clarify finer points of the new system.
Normalization
Source unit IDs used internally are always in a normalized form:
_
and-
.@
and ends with/
.@
and:
are not allowed inside root name..
or contain any./
or../
segments,Source IDs specified in Standard JSON must be already normalized. In other contexts compiler may automatically apply some normalization rules:
./
, which is stripped by the compiler../
segments are stripped,../
segments are collapsed.urls
in standard JSON behave just like the ones specified on the command line (though they are never used to form source unit IDs so the only thing that matters is which file they resolve to).@
escapingAn escaping mechanism is needed to discern named roots from paths starting with
@
character in contexts where both are allowed. For that purpose a leading@@
is always interpreted as a single@
and causes the following value not to be seen as a root.Relative imports
./
is stripped from the import path. Then they are combined../
, the path must be normalized according to the same rules as source path in source unit IDs../
.../
is not allowed.Remote imports
protocol://
, whereprotocol
can be anything except forfile
.protocol://
part.Import remapping vs root remapping
There are two kinds of remappings:
@
is not allowed.@a=@b @b=@c @c=@d
will remap@a
to@b
, not@d
.@abc=@abc
) is allowed and can be used as a way to prevent a shorter remapping from matching (e.g. adding@contract=@token
to@con=@pro
will prevent@contract
from being remapped to@protract
.Remapping context
To solve conflicts caused by different libraries referring to their dependencies in the same way, it's possible to qualify import remappings with a context.
If an import remapping has a context, the substitution is only performed on imports found inside the files located under the named root used as context.
Examples:
Supplying files on the CLI
All filesystem paths specified on the CLI that lead to files to be compiled must be located within one of the roots.
Since the unnamed root is by default mapped to current working directory, files from that directory can still be conveniently compiled without specifying any remappings in simple cases.
The source unit ID for the file is constructed by normalizing the path and finding the root that is mapped to the longest matching prefix.
The CLI supports source unit IDs but not direct imports. I.e.
@
never refers to a named root and import remappings are not taken into account.Supplying files via Standard JSON
Source unit IDs specified in Standard JSON must be already normalized and contain a root. As a special case it can also be equal to
<stdin>
. Any other form of an ID is disallowed.URLs specified in
sources.urls
are treated as raw URLs, not remote imports. I.e. remappings are not applied to them. Source unit IDs specified there are also not direct imports.Standard input
A special source ID
<stdin>
is reserved for the content of compiler's standard input.-
command-line flag is specified.@
.Base path
The base path has no function in the new system but could be retained for backwards-compatibility.
--base-path <dir>
would have the same effect as remapping@=<dir>
.Allowed paths
--allowed-paths
option is also still available. It is the only way to compile the project when the directory a root is mapped to contains symlinks that lead outside of it.Possible extensions
Library path
Specifying mapping for all named roots may be tedious. To make it more convenient we could introduce the concept of library path. It would be defined by a variable called
SOLIDITYPATH
and work in a way similar toPATH
in Bash orPYTHONPATH
in Python. All subdirectories of directories listed inSOLIDITYPATH
would automatically become valid named roots.Backwards-compatibility
The proposal only restricts current syntax and does not introduce any new elements.
../
and/
are no longer allowed.@
.As such it's not backwards-compatible but any file compilable after the change should also be compilable with older compilers given the right remappings.
Filesystem paths on the CLI will now produce different source unit IDs because paths are absolute and converted to relative to a root (though, arguably, this is how it was originally supposed to work with
--base-path
and could be considered a bug instead: #11038 (comment)).To use URLs as imports an intermediate mapping to and from a named root is required. This makes it impossible to support arbitrary URLs (though arbitrary URLs within a single protocol are still possible). Reader callback passed to the JavaScript interface now receives files after root remapping. Before it was getting source unit IDs directly. This will affect Remix IDE.
The text was updated successfully, but these errors were encountered: