Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path resolution docs + proposed changes (path spec v1) #11105

Closed
wants to merge 2 commits into from

Conversation

cameel
Copy link
Member

@cameel cameel commented Mar 15, 2021

Related to #11036 and PR #9353.
Also related to a bunch of smaller bug reports/feature requests related to paths: #9346, #2738, #4702, #5146, #11042, #11039, #11038, #10980, #4623, #5281.

This PR is an attempt to explain how I think the compiler is currently supposed to handle paths used in imports, on the command line, in Standard JSON, on the standard input, etc. This means that it does not really reflect how it all works now. I ignored known bugs and introduced some changes to make it match my expectations better and I'm going point out the differences in review comments. This is meant as a starting point for discussion and I think it should be merged into the docs once we agree whether the changes I'm proposing make sense and we implement them.

The docs ended up being pretty long. All the available features and abstractions make this surprisingly complex and produce many weird corner cases, though after working through it systematically many things I considered bugs while reporting #11036 suddenly appear to make sense. They definitely do not seem as such when you first encounter them so unfortunately I think that a long explanation is necessary if we expect users to understand how it all works.

I introduced some new terminology in the PR (import keys, import paths, virtual filesystem, imports relative to base/source) which might be a bit controversial (I'm open to renaming them) but expanding the vocabulary is badly needed here to be able to refer to things without being ambiguous or having to clarify terms with a full sentence every time they appear.

@cameel cameel self-assigned this Mar 15, 2021
@cameel cameel force-pushed the path-resolution-docs branch 5 times, most recently from 8a049ab to c4f3c20 Compare March 15, 2021 16:16
Comment on lines 320 to 340

When using ``--standard-json`` you cannot provide additional source files as command-line
arguments but it does not mean that the compiler will not load any extra files from disk.
If a contract imports a file that is not present in `sources`, the compiler will use the file
loader as in any other situation, which may result in the source being read from disk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually desirable? Is there a practical use case for this? Honestly, I expected the compiler to never touch the actual filesystem when using Standard JSON and I suspect many people would expect that too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer this, but I think it was a user request. It might be very useful because it means that the compiler performs import resolution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I guess it's this one: #4688?

Maybe we could introduce a command-line option or something in settings in the JSON file so that you have to explicitly enable this?

The path specified in ``urls`` is only passed to the file loader and used to locate the file.
It does not affect the import key and is not included in contract metadata.

Paths in ``urls`` are affected by base path and any other transformations performed by the file loader.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been reported as a bug in #9346. I think the behavior will be fine if make base path affect only relative imports (as assumed in the rest of these docs).


If no file is present under that key in the virtual filesystem, the file loader will also use it as
is for filesystem lookup.
The resulting filesystem path is not affected by the value of base path.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently not the case. Base path is prepended to both absolute and relative imports. I think it's reasonable to expect the file loader not to prepend base path if the import key starts with /.

Comment on lines 436 to 445
Base path also affects the way paths specified on the command line are converted into import keys.
The import key normally is the absolute, normalized path to the file in the UNIX format but if the
file happens to be inside the directory designated as the base path or one of its subdirectories
the prefix is stripped from its import key and it becomes relative to base.

.. code-block:: bash

cd /home/user
solc /project/contract.sol # Import key: /project/contract.sol
solc /project/contract.sol --base-path /project # Import key: contract.sol
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how it works now but the current behavior has been reported as a bug in #4702.

#4702 (comment) pretty clearly indicates that stripping of base path is desirable. It states that the file path should not be converted to an absolute one though. I think it should because this way we will get consistent import keys regardless of where a file is being compiled from. When using a relative path on the command line you have to make it relative to the current working directory and there is no other choice. Therefore you cannot always use relative paths on the command line to match relative paths in your imports. I think that making paths inside base path relative and others absolute is a better mechanism.

Comment on lines 447 to 453
Note that if you do not specify base path, it is by default equal to the current working directory:

.. code-block:: bash

cd /project
solc /home/user/contract.sol # Import key: contract.sol
solc /home/user/contract.sol --base-path /project # Import key: contract.sol
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not true. Base path is by default "". This does not matter when the file loader does filesystem lookup (the result will be the same) but does make a difference for the purpose of stripping base path from file paths specified on the command line.

Comment on lines 605 to 615
- This means that you cannot remap ``./`` or ``../`` directly since they are replaced during
translation to import keys but you can remap the source locations they resolve into:

.. code-block:: bash

solc ./=a /project=b /project/contract.sol

.. code-block:: solidity
:caption: /project/contract.sol

import "./util.sol" as util; // Import key: b/util.sol
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that being able to remap the resolved path is actually useful. See my comment from #5281 (comment):

Also, I wonder if the current behavior is actually useful in practice. In normal circumstances it seems redundant. For example if you do /a/b=/x/y then you change import "/a/b/c.sol" to import "/x/y/c.sol". If d.sol then contains import "./d.sol", the compiler will look for d.sol in /x/y/ (because that's where c.sol is physically located) so the remapping does not play any role here.

The only thing you can achieve by remapping the expanded path is the unintuitive behavior where import "./d.sol" ends up importing a file that's not in the same directory as c.sol. This would happen for example if you used /a/b=/x/y /x/y=/a/b in the example above.

For that reason I think it would be better to just completely forbid remapping . and ... To me it looks like its main use would be writing underhanded code :)

Comment on lines +636 to +665
- Remapping happens after paths relative to the source directory have already been resolved.
This means that targets starting with ``./`` and ``../`` have no special meaning and are
relative to the base directory rather than to the source location.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if might be better to disallow remapping targets to start with ./ or ../. I think they are not useful in practice and they are ambiguous (I originally assumed they would actually work like imports relative to source but they do not).

#. **At most one remapping can be applied to a single import.**

- If multiple remappings match the same key, the one with the longest matching prefix is chosen.
- If prefixes are identical, the one specified last wins.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using duplicate prefixes with the same context should be an error. It should still be possible to have identical prefixes in different contexts though.

I haven't checked what happens in Standard JSON when there are multiple identical prefixes but I'm assuming it's the same as on the command line (I will need to include that case in tests).

Comment on lines 729 to 768
``file://`` prefix is stripped from import paths and from filesystem paths specified in
``urls`` in Standard JSON. It is **not** stripped from filesystem paths provided on the command
line.
For example the following will not result in ``contract.sol`` being loaded:

::
.. code-block:: bash

solc file://contract.sol

The compiler will instead try to find it in a directory called ``file:`` and fail if such a
directory does not exist or does not contain ``contract.sol``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should strip file:// from command-line paths too if we're stripping it in other situations.

Comment on lines 423 to 427
The compiler sees both as opaque identifiers and there is no normalization involved:

::

import "lib/../lib///math.sol" as math; // Import key: lib/../lib///math.sol
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing this all down, I think that not normalizing imports makes behavior more complex than it needs to be and causes tons of unexpected corner cases. Even more so given that we do normalize them in some cases (i.e. in imports relative to source). I think that we should either do some limited normalization on all imports (i.e. remove duplicate slashes, collapse ./ and ../ segments where possible, etc.), or import paths to be already normalized.

Virtual Filesystem
~~~~~~~~~~~~~~~~~~

The compiler maintains an internal database (virtual filesystem) where each compiled module is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use the word "module"? (I would have expected "source unit')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it. But doesn't source unit also include the stuff it imports?

Copy link
Member Author

@cameel cameel Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the docs and I see that we use "source unit" most of the time. "Module" is used very rarely in that sense. This page was actually the one that used the term "module" the most (in the old version) :)

I replaced all mentions of "module" with "source unit" in the text except for a few cases where I think it is used in a more general sense.

~~~~~~~~~~~~~~~~~~

The compiler maintains an internal database (virtual filesystem) where each compiled module is
assigned a unique *import key* which is an opaque and unstructured identifier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can say that, since . and .. actually use structure inside these identifiers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these docs I distinguish between "import paths" (paths you specify in import statement) and "import key" (the ID that ends up assigned to the file in the virtual filesystem). AFAIK . and .. only have special meaning in import paths, not import keys.

}

The ``sources`` dictionary specifies the initial content of the virtual filesystem and you
can use import keys directly there.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should really use term "import key" here. These values are the names of the source files as the compiler calls them. The import statement is just one way to reference to them.

Copy link
Member Author

@cameel cameel Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"import key" is exactly the term I am using for "names of the source files as the compiler calls them" :) And is different from "import path", which is the exact text you put inside import "...";.

Is there another name that would be clearer? Maybe "source unit ID"? Or "VFS key"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed "import key" to "source unit ID" everywhere in the text.

.. warning::

The compiler uses import keys to determine whether imports refer to the same module or not.
If you refer to a file in multiple ways that translate to different keys, it will be compiled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it would only generate the code twice, but the problem is that you might also get symbol collisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and --metadata shows three different instances of the token contract.

You're right about the collision. I need to add the as part to the imports to avoid them.

import "/project/lib/util.sol" as util; // Import key: /project/lib/util.sol
import "/project/lib/../lib///math.sol" as math; // Import key: /project/lib/../lib///math.sol

In the above you might expect the import key to be reduced to ``/project/lib/math.sol`` but it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix this,shouldn't we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we agree that it should not work that way. From #2738 I got the impression that things are not being normalized on purpose.

@cameel
Copy link
Member Author

cameel commented Mar 17, 2021

We discussed import paths with @chriseth today. I posted a summary on the channel but I'm pasting it here as well because it's related:

  • Should absolute imports (i.e. import "/a/b/c.sol") be affected by --base-path? Currently this will import /project/a/b/c.sol if you use --base-path /project.
    • We expect people to keep stuff in a single base directory so this is by design. Paths in imports are not meant to be seen as filesystem paths.
    • I think that such imports are seen by users as absolute in the underlying filesystem anyway. Having base path affect absolute imports is unexpected and confuses them.
  • Currently import remappings have the form context:prefix=target. We agree that the context part is not very useful in practice and we could remove it to simplify the whole system.
    • context was meant to let users apply remappings selectively but this did not turn out to be a common use case.
  • We discussed making bigger (and potentially breaking) changes in how imports and remappings work to make them simpler and more intuitive:
    • We could move import remapping from CompilerStack (where it is applied when import paths are converted to internal source unit IDs) to file loader (where it would be performed on source unit IDs that are converted to actual filesystem paths). This way remappings would no longer be part of metadata.
      • I think this is a good idea but we should also drop current remappings. Having both would make already complex system even more complex.
      • This would also force JS tools using the compiler to handle remappings on their own in the callback (currently they can rely on compiler doing that).
    • We should forbid imports relative to the source file that go a level up in the hierarchy (i.e. imports starting with ../). We already disourage using them.
    • We could allow only imports that do not start with /, ./ or ../ and then require the first directory on the path to be explicitly remapped if it is not located inside the base directory.
      • We discussed several other variations of this and couldn't really agree on the specifics and which one would work best so this needs more brainstorming.
    • I suggested that we could require top-level directories in relative imports have to start with @ (and be remapped).
      • It is a nice convention but it should stay just that. We don't want to orce everyone to change their imports to that.
      • There was a concern about about a library forcing the user to use the same @dir that it uses internally. I'm not sure I completely understood why this is a problem.
  • Path normalization
    • We should require import paths to be normalized. So no duplicate slashes, no ../ segments in the middle of the path, etc.
    • Not having them normalized means that it's pretty easy to import stuff in a way that seems interchangeable to the user but isn't and results in files being compiled twice (or not compiling at all due to name clashes).
    • We might also be ok with just normalizing them automatically since we already do that for imports starting with ./ and ../.
    • One problem with normalization is that we want to allow using URLs as imports (with the intention of having them remapped to actual paths). Normalizing everything would replace the :// with :/.
      • I think we should just make URLs a special case. If the import starts with something://, then the import path would not be normalized at all. Otherwise it would be normalized as a whole.
      • I did not mention it then but I think that file:// would have to be another special case. The path after it should be normalized.
  • We agreed that path being relative/absolute on the command line should not affect whether the compiler sees it as relative or absolute. It should depend on whether the file is inside the base path(s) or not.

filesystem, and the path can also map to resources such as ipfs, http or git.
.. index:: source unit ID, import path, filesystem path

Virtual Filesystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still all inside "Layout of a source file". I think we should only explain the syntax of the import statement and the different variants there and move the path-related stuff somewhere else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #11263. This is now a separate section called "Path Resolution" in the side menu. The layout page only contains a short summary.

In case of the command-line compiler the file loader simply uses it as a path.
The `JavaScript interface <https://github.com/ethereum/solc-js>`_ is a bit more flexible in that
regard and allows the user to provide a callback to perform this operation.
In this case the IDs can be arbitrary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IDs can always be arbitrary, the cmdline interface is just unlikely to find appropriate files. Also standard-json allows arbitrary Ids doesn't it?

@cameel cameel force-pushed the path-resolution-docs branch from 6cd8eae to 0e47450 Compare April 15, 2021 23:13
@cameel cameel changed the title Path resolution docs Path resolution docs + proposed changes v1 Apr 15, 2021
@cameel cameel changed the title Path resolution docs + proposed changes v1 Path resolution docs + proposed changes (path spec v1) Apr 15, 2021
@cameel
Copy link
Member Author

cameel commented Apr 15, 2021

I have just pushed some minimal changes: fixes for some of the comments, renaming of "source unit ID" to "source unit name" and small tweaks I made while working on #11263. None of it changes the text substantially. This is only to keep it clean in case I need to get back to it.

I'm closing this PR in favor of #11263. The text has changed a lot so I have decided to create a new PR and keep this one for reference. Also, the goal of the PR has changed - the new one only documents the existing situation without any proposed modifications.

@cameel cameel closed this Apr 15, 2021
@cameel cameel deleted the path-resolution-docs branch May 20, 2021 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants