Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restricted import system with named roots (path spec v2) #11138

Closed
cameel opened this issue Mar 22, 2021 · 6 comments
Closed

Restricted import system with named roots (path spec v2) #11138

cameel opened this issue Mar 22, 2021 · 6 comments
Assignees
Labels
breaking change ⚠️ language design :rage4: Any changes to the language, e.g. new features

Comments

@cameel
Copy link
Member

cameel commented Mar 22, 2021

Related to #11105 and #9353.

Abstract / Motivation

This proposal changes the way paths in import statements, on the CLI and in Standard JSON are handled by the compiler and translated into internal source unit IDs.

The goal is to make imports more intuitive by directly exposing user to the way compiler identifies files internally. The current system hides the abstraction that happens between the actual filesystem and compiler's virtual filesystem and makes users expect import paths to behave like filesystem paths even though they work differently.

The change is meant to preserve a forward-compatible subset of the old syntax to make it possible to have the same files compile with both old and new compiler by only changing the remappings and compiler options.

The syntax for named roots intentionally follows the established convention of using @ placeholders in imports.

To avoid changing the meaning of existing syntax in a confusing way relative imports of the form import "project/contract.sol"; are disallowed rather than made equivalent to import "./project/contract.sol"; even though having both work the same way would be quite intiuitive.

Examples

  • Many cases of users being confused by the current system (including me :)) can be found in the bug tracker: [CLI] --base-path unexpectedly effects both import and source URLs #9346, Standard JSON compilation cannot find or read files. #2266, Non-deterministic output according to filesystem path #9790, Different bytecodes for the same contracts #6487, fail to import the sol in parent path #4914, Compiler ignores files located under the base path in Standard JSON #11038, https://gitter.im/ethereum/solidity?at=60521c4883533831b4e736eb.
  • The path in import "/project/contract.sol"; looks like an absolute path and indeed will load /project/contract.sol in the simplest case. It is however relative to --base-path, just like import "project/contract.sol";.
  • import "./contract.sol"; is relative to the current source file while import "contract.sol"; is relative to the base path (or current working directory if base path is not set). This distinction is not obvious since in the shell both paths are equivalent and lead to the same file.
  • Paths are not normalized which means that project/contract.sol and project//contract.sol are seen as two completely different files (and actually can be different files when the source is provided via Standard JSON) but cause the same file to be loaded from the filesystem. The resulting errors are confusing if the user is not aware of how the compiler decides whether files are distinct or not.
  • Relative paths starting with ../ or ./ are normalized, but only partially. If ../project//contract.sol is imported from /work//contracts/../token.sol, the path resolves into /work//contracts/contract.sol. Note .. being treated as an actual directory and // in one part not being replaced with /.
  • The way a file is referenced on the command line affects whether it matches an import. For example solc contract.sol will be seen by the compiler as the same as import "contract.sol"; but if we go to the parent directory and compile it as solc dir/contract.sol it will be seen as a different file and compiled twice.
    • What's worse, it affects the metadata - both cases produce different bytecode because different paths end up in the metadata - even though the files are identical and still reside in the same directories.
  • When used as a target of a remapping, ../ no longer works as relative to the source file. It's now relative to the current working directory because the remapping happens after the relative paths are resolved.

There are more examples listed in #11036. While they were originally reported as bugs, ultimately most of them are actually just unintuitive side-effects of the current design that mostly show up in corner cases.

Specification

Overview

Paths given in import statements, on the command line and in Standard JSON are used for two purposes:

  • to find and load the source code into compiler's virtual filesystem,
  • to generate a unique source unit ID that determines whether two paths actually refer to the same source unit.

A source unit ID consists of a named or unnamed root and a source path. E.g. @openzeppelin/utils/math/Math.sol or @/contracts/token.sol.

There are several ways files can get into the virtual filesystem. The most important one is an import statement. Paths in import statements can be specified in three ways:

  1. Direct import: specifies the root and source path explicitly: @openzeppelin/utils/math/Math.sol.
  2. Relative import: specifies only a part of the source path, relative to the source ID of the importing file: ./math/Math.sol is equivalent to @openzeppelin/utils/math/Math.sol when imported from @openzeppelin/utils/Arrays.sol.
  3. Remote import: uses an URL in place of the root: https://github.com/OpenZeppelin/openzeppelin-contracts/contracts/utils/math/Math.sol.

For a remote import to be valid, user needs to assign a named root to a matching prefix (on the CLI or in Standard JSON). For example https://github.com/OpenZeppelin/openzeppelin-contracts/contracts=@openzeppelin. After the remapping, the path is processed as if it were a direct import. It's also possible to remap one named root to another (e.g. @openzeppelin=@oz). Every remapping to a named root becames a part of contract metadata because the mapping happens between the import path and the source unit ID and changing it may affect the result of the compilation even if the source stays they same.

In typical usage named roots represent libraries or independent submodules of your project. The main project itself is represented just by @. @ is special in that it can represent different directories, depending on where it is used. When used in a file located under some named root it represents that root. This way, when writing a library you can safely refer to its root just as @ (i.e. import "@/utils/math/Math.sol";). A standalone project using your library can refer to library files via a named root (import "@openzeppelin/utils/math/Math.sol";) and use @ for its own files without a conflict. The substitution happens when import path is translated into a source unit ID - in the virtual filesystem the source IDs of library files always contain the full named root.

To be able to locate the file and load it the compiler passes its source unit ID to the source loader. The loader determines how roots translate to specific locations. In case of the command-line compiler, locations must be existing directories. All named roots must be explicitly mapped for a contract to be compilable. The unnamed root is by default mapped to compiler's working directory but can also be explicitly remapped.

solc ../contracts/contract.sol @openzeppelin=node_modules/openzeppelin/contracts/ @=../contracts/

Files on the command line can be specified in two ways:

  1. As filesystem paths: these are platform-specific and can be relative to the current working directory. E.g. ../contracts/contract.sol or C:\project\contracts\contract.sol.
  2. As source unit IDs: instead of specifying the path directly you specify a source unit ID like @openzeppelin/utils/math/Math.sol and have the compiler resolve it by passing it to the source loader.

When supplying files using Standard JSON, you always specify source IDs yourself. These IDs must of course contain a named or unnamed root. E.g. math/Math.sol is not a valid source unit ID.

Instead of supplying the source as a part of the JSON file (via the content key) you can specify its location (via the urls key). It can be a path or an URL and whether it can be successfully resolved depends on the compiler interface you use. The command-line interface can only resolve filesystem paths and source unit IDs. The JavaScript interface can also handle URLs or even arbitrary identifiers - it's all up to the user-defined callback.

Many details in the above description were intentionally omitted to keep it concise. Additional sections below clarify finer points of the new system.

Normalization

Source unit IDs used internally are always in a normalized form:

  • Root name can contain only letters, numbers and maybe some safe special characters like _ and -.
    • It starts with @ and ends with /.
    • @ and : are not allowed inside root name.
  • Source path:
    • is case-sensitive and in UNIX format regardless of the underlying platform,
    • cannot start with . or contain any ./ or ../ segments,
    • does not contain sequences of multiple slashes, trailing slashes, leading/trailing whitespace.

Source IDs specified in Standard JSON must be already normalized. In other contexts compiler may automatically apply some normalization rules:

  • Relative imports start with ./, which is stripped by the compiler.
  • The part of the import that is the prefix is never normalized. Source path (the part left after stripping the prefix) must be normalized like in any other path.
  • Filesystem paths given on the command line undergo the usual normalization expected from shell commands:
    • multiple slashes are squashed into one.
    • relative paths are relative to the current working directory and converted into absolute ones.
    • ./ segments are stripped,
    • ../ segments are collapsed.
    • The path is also converted from a platform-specific format into the UNIX format before it is used to construct source unit ID.
  • Source unit IDs given on the command line must be already normalized.
  • Filesystem paths and source unit IDs given in urls in standard JSON behave just like the ones specified on the command line (though they are never used to form source unit IDs so the only thing that matters is which file they resolve to).

@ escaping

An escaping mechanism is needed to discern named roots from paths starting with @ character in contexts where both are allowed. For that purpose a leading @@ is always interpreted as a single @ and causes the following value not to be seen as a root.

Relative imports

  • Relative imports work by taking the source unit ID of the importing module as a base. Everything after the last slash is removed from the importing ID and ./ is stripped from the import path. Then they are combined.
  • Apart from the leading ./, the path must be normalized according to the same rules as source path in source unit IDs.
  • Relative imports must start with ./. ../ is not allowed.

Remote imports

  • A remote import must start with protocol://, where protocol can be anything except for file.
  • The prefix of any remote import must be mapped to a named root. The length of the prefix is up to the user but it must at least include the protocol:// part.
  • The part left after stripping the prefix becomes the source path in the VFS.
  • If multiple remappings match the same prefix, the longer one wins.

Import remapping vs root remapping

There are two kinds of remappings:

  • Import remapping: always remaps something to a named root.
    • This is allowed from URL prefixes or other named roots.
    • These remappings are included in contract metadata.
    • Remapping to @ is not allowed.
    • Remappings are not recursive. @a=@b @b=@c @c=@d will remap @a to @b, not @d.
    • Remapping a root to itself (@abc=@abc) is allowed and can be used as a way to prevent a shorter remapping from matching (e.g. adding @contract=@token to @con=@pro will prevent @contract from being remapped to @protract.
  • Root remapping: always remaps a root to something that is not a root
    • The target must be something that combined with the source path is recognized by source loader as a valid source location.
      • On the CLI it must be a path. If the path is relative, it's interpreted as relative to current working directory, converted into an absolute one and normalized.
      • In the JavaScript interface it could be an URL or something completely arbitrary.
    • These remappings are system-dependent and not stored in the metadata. Checksums stored in metadata are used instead to ensure that compiler input is the same regardless of the system.

Remapping context

To solve conflicts caused by different libraries referring to their dependencies in the same way, it's possible to qualify import remappings with a context.

  • the context must be a named or unnamed root (filesystem paths are not allowed),
  • context can only be used for import remappings from named roots,
  • using context when remapping URL prefixes is not allowed (URLs by their nature are expected to be absolute).

If an import remapping has a context, the substitution is only performed on imports found inside the files located under the named root used as context.

Examples:

@libA:@oz=@openzeppelin
@libB:@oz=@oz
@oz=@australia

Supplying files on the CLI

All filesystem paths specified on the CLI that lead to files to be compiled must be located within one of the roots.
Since the unnamed root is by default mapped to current working directory, files from that directory can still be conveniently compiled without specifying any remappings in simple cases.
The source unit ID for the file is constructed by normalizing the path and finding the root that is mapped to the longest matching prefix.

The CLI supports source unit IDs but not direct imports. I.e. @ never refers to a named root and import remappings are not taken into account.

Supplying files via Standard JSON

Source unit IDs specified in Standard JSON must be already normalized and contain a root. As a special case it can also be equal to <stdin>. Any other form of an ID is disallowed.

URLs specified in sources.urls are treated as raw URLs, not remote imports. I.e. remappings are not applied to them. Source unit IDs specified there are also not direct imports.

Standard input

A special source ID <stdin> is reserved for the content of compiler's standard input.

  • It is present in the VFS only when the - command-line flag is specified.
  • It cannot be used in remappings.
  • The parent source ID used when resolving relative imports is @.
  • Its content can be provided explicitly in Standard JSON (to ensure feature parity between Standard JSON and CLI).

Base path

The base path has no function in the new system but could be retained for backwards-compatibility. --base-path <dir> would have the same effect as remapping @=<dir>.

Allowed paths

  • All directories mapped to named roots are automatically added to the list of allowed directories.
  • --allowed-paths option is also still available. It is the only way to compile the project when the directory a root is mapped to contains symlinks that lead outside of it.

Possible extensions

Library path

Specifying mapping for all named roots may be tedious. To make it more convenient we could introduce the concept of library path. It would be defined by a variable called SOLIDITYPATH and work in a way similar to PATH in Bash or PYTHONPATH in Python. All subdirectories of directories listed in SOLIDITYPATH would automatically become valid named roots.

Backwards-compatibility

The proposal only restricts current syntax and does not introduce any new elements.

  • Imports starting with ../ and / are no longer allowed.
  • Non-normalized paths are no longer allowed in many contexts
  • Arbitrary mapping targets and prefixes are no longer allowed.
  • Mapping context is must now start with @.

As such it's not backwards-compatible but any file compilable after the change should also be compilable with older compilers given the right remappings.

Filesystem paths on the CLI will now produce different source unit IDs because paths are absolute and converted to relative to a root (though, arguably, this is how it was originally supposed to work with --base-path and could be considered a bug instead: #11038 (comment)).

To use URLs as imports an intermediate mapping to and from a named root is required. This makes it impossible to support arbitrary URLs (though arbitrary URLs within a single protocol are still possible). Reader callback passed to the JavaScript interface now receives files after root remapping. Before it was getting source unit IDs directly. This will affect Remix IDE.

@cameel cameel added breaking change ⚠️ language design :rage4: Any changes to the language, e.g. new features labels Mar 22, 2021
@cameel cameel self-assigned this Mar 22, 2021
@chriseth
Copy link
Contributor

Couldn't yet fully digest this. In short, you are taking the remapping concept one level higher and are making the @ convention explicit, did I understand that correctly? In https://github.com/OpenZeppelin/openzeppelin-contracts/contracts=@openzeppelin are you actually saying @openzeppelin=https://github.com/OpenZeppelin/openzeppelin-contracts/contracts?

Can you maybe write some pseudo-code that starts with an import statement and ends with loading a file from the host filesystem and storing it under a vfs path name?

@cameel
Copy link
Member Author

cameel commented Mar 24, 2021

In short, you are taking the remapping concept one level higher and are making the @ convention explicit, did I understand that correctly?

Yes. There are no real changes in import syntax or Standard JSON content or how you invoke the compiler on the CLI but there are new limitations and some changes in how paths you specify in all these places are interpreted.

Basically, this forces the user to use remappings when the situation gets more complex than a single directory with all files under it. And ensures that all projects will use and interpret these remappings the same way.

In https://github.com/OpenZeppelin/openzeppelin-contracts/contracts=@openzeppelin are you actually saying @openzeppelin=https://github.com/OpenZeppelin/openzeppelin-contracts/contracts?

No, that's exactly what I meant. But for example in Remix you would want both because that's the situation where you want URLs both in imports and in the "paths" that the callback gets. The main difference with the current situation would be that the compiler internally sees @openzeppelin and not an URL.

Can you maybe write some pseudo-code that starts with an import statement and ends with loading a file from the host filesystem and storing it under a vfs path name?

Sure. Here's a short and very informal version:

parse_remappings(remappings):
    for every remapping
        split into context, prefix and target
        decide if it is a root remapping or import remapping

    validate import remappings
    validate root remappings
    
    if no root remapping for @, add one pointing at current working dir

    return root remappings and import remappings as separate dicts

import_path_to_source_unit_name(import_path, current_source_unit_name):
    filter out import remappings with context no matching the importing source unit

    if direct import
        if root is @
            take root from importing module
        remap root using filtered import remappings
    elif relative import
        take both root and parent dir from importing module
        remap root using filtered import remappings
    elif remote import
        find import remapping matching URL prefix
        remap prefix to a root using filtered import remappings        
    else
        FAIL

    if path not normalized or root has funny characters, FAIL        
    return root + source path

source_unit_name_to_filesystem_path(source_unit_name):
    split into root and source path
    if no root remapping for root, FAIL
    return remapped root + source path
    
filesystem_path_to_source_unit_name(filesystem_path):
    if path starts with @, pass it through source_unit_name_to_filesystem_path() to convert it into actual path
    make path absolute (relative to CWD), convert to UNIX format and normalize it
    find root that contains the path
    return root + path relative to root

For an import statement you would call source_unit_name_to_filesystem_path() to get the name the compiler uses its VFS. Then you could call import_path_to_source_unit_name() on that to get actual filesystem path.

If you wanted to get a VFS path from something specified on the command line or in sources.urls, you would call filesystem_path_to_source_unit_name().

@cameel
Copy link
Member Author

cameel commented Mar 24, 2021

Also, here's a more concrete version of the pseudocode, using Python syntax). I wrote this one first but then I thought it might a bit too long and detailed and I rewrote it into the simplified one I posted above.

If the less formal version above is enough and explains everything, you can just skip this one.

(IMPORT_REMAPPINGS, ROOT_REMAPPINGS) = parse_remappings(remappings)

def parse_remappings(remappings):
    import_remappings = {}
    root_remappings = {}

    for remapping in remappings:
        (context, prefix, target) = split_remapping(remapping)

        if target.startswith("@") and not target.startswith("@@"):
            import_remappings[prefix] = {"context": context, "target": target}
        else:
            if not prefix.startswith("@") or prefix.startswith("@@"):
                raise InvalidRootRemapping()

            root_remappings[prefix] = target

    for url_prefix_or_root, import_remapping in import_remappings:
        validate_root(import_remapping.target)

        if url_prefix_or_root.startswith("@") and not url_prefix_or_root.startswith("@@"):
            validate_root(url_prefix_or_root)
            validate_root(import_remapping.context)

            if root_remapping.context not in root_remappings:
                raise InvalidRemapping()
        else:
            if import_remapping.context != "" or not is_url(url_prefix_or_root):
                raise InvalidRemapping()    
            
    for root, root_remapping in root_remappings:
        validate_root(root)

    if "@/" not in root_remappings:
        root_remappings["@/"] = get_current_working_directory()

    return (import_remappings, root_remappings)

def import_path_to_source_unit_name(import_path, current_source_unit_name):
    local_import_remappings = filter_remappings(IMPORT_REMAPPINGS, context=get_root(current_source_unit_name))

    if import_path.startswith("@/"):
        root = get_root(current_source_unit_name)
        source_path = import_path.removeprefix("@/")
    elif import_path.startswith("@") and not import_path.startswith("@@"):
        root = get_root(import_path)
        source_path = import_path.removeprefix(root)
    elif import_path.startswith("./"):
        root = get_root(current_source_unit_name)
        source_path = dirname(current_source_unit_name.removeprefix(root)) + import_path.removeprefix("./")
    elif is_url(import_path):
        (prefix, remainder) = find_longest_matching_prefix(import_path, local_import_remappings.keys())
        root = prefix
        source_path = remainder
    else:
        raise InvalidImportPath()
        
    if normalize_import_path(source_path) != source_path:
        raise InvalidImportPath()

    if  not validate_root(local_import_remappings[root]):
        raise InvalidImportPath()
        
    return local_import_remappings[root] + normalize_import_path(source_path)

def source_unit_name_to_filesystem_path(source_unit_name):
    root = get_root(source_unit_name)
    source_path = get_source_path(source_unit_name)

    if root not in ROOT_REMAPPINGS:
        raise RootResolutionFailed()

    return ROOT_REMAPPINGS[root] + source_path
    
def filesystem_path_to_source_unit_name(filesystem_path):
    if filesystem_path.startswith("@"):
        if not filesystem_path.startswith("@@"):
            filesystem_path = source_unit_name_to_filesystem_path(filesystem_path)
        else:
            filesystem_path = filesystem_path.removeprefix("@")

    filesystem_path = normalize_path(to_unix_format(make_absolute(filesystem_path, relative_to=get_current_working_directory())))
    root = find_root_containing_path(filesystem_path, ROOT_REMAPPINGS)
    source_path = filesystem_path.removeprefix(ROOT_REMAPPINGS[root])

    return root + source_path

@cameel
Copy link
Member Author

cameel commented Mar 24, 2021

A small update: I just realized that context is only useful in remappings of the form @a=@b. I have updated the spec and the pseudocode.

@cameel
Copy link
Member Author

cameel commented Mar 24, 2021

We discussed the spec today on the channel and later on the call. Some changes were proposed. Here's what will change:

From the channel:

  • Allow relative imports starting with ../ (but disallow going above the root)
  • Make import "/project/contract.sol" and import "project/contract.sol" translate to import "@/project/contract.sol".
    • @/ can be further translated to a named root if the importing file is inside one.
  • Allow URLs to be another form of keys in the VFS and do not normalize them at all.
    • Make CLI file loader recognize URLs and refuse to load them.
    • The JS callback can do with them whatever it likes.
    • Disallow mapping URL prefixes to roots

From the call:

  • Make things more backwards-compatible so that most of the existing code that uses imports in a typical way can still compile:
    • Allow arbitrary remappings.
    • Allow arbitrary import paths.
    • Allow arbitrary identifiers as source unit names with only a few exceptions:
      • Disallow ../ and ./ in them to avoid weird interactions with relative paths.
      • Disallow source unit names starting with /.

Also, some general feedback from the call:

  • We should just suggest the recommended way to use the import system in the docs (including remappings for libraries) instead of forcing users to use them.
    • Prefer warnings for patterns that are not recommended instead of completely disallowing them.
  • It would be best to have an example of how the new system would be used in practice for the next time we discuss it.
  • The spec should ideally describe how the VFS works first and only then go on to explain the interaction with file loader, Standard JSON, etc.

@cameel
Copy link
Member Author

cameel commented May 19, 2021

Closing in favor or smaller issues addressing individual points: #11408, #11409, #11410, #11411, #11412, #11413.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ⚠️ language design :rage4: Any changes to the language, e.g. new features
Projects
None yet
Development

No branches or pull requests

2 participants