-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Path resolution docs + proposed changes (path spec v1) #11105
Conversation
8a049ab
to
c4f3c20
Compare
docs/layout-of-source-files.rst
Outdated
|
||
When using ``--standard-json`` you cannot provide additional source files as command-line | ||
arguments but it does not mean that the compiler will not load any extra files from disk. | ||
If a contract imports a file that is not present in `sources`, the compiler will use the file | ||
loader as in any other situation, which may result in the source being read from disk. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually desirable? Is there a practical use case for this? Honestly, I expected the compiler to never touch the actual filesystem when using Standard JSON and I suspect many people would expect that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer this, but I think it was a user request. It might be very useful because it means that the compiler performs import resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I guess it's this one: #4688?
Maybe we could introduce a command-line option or something in settings
in the JSON file so that you have to explicitly enable this?
The path specified in ``urls`` is only passed to the file loader and used to locate the file. | ||
It does not affect the import key and is not included in contract metadata. | ||
|
||
Paths in ``urls`` are affected by base path and any other transformations performed by the file loader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been reported as a bug in #9346. I think the behavior will be fine if make base path affect only relative imports (as assumed in the rest of these docs).
|
||
If no file is present under that key in the virtual filesystem, the file loader will also use it as | ||
is for filesystem lookup. | ||
The resulting filesystem path is not affected by the value of base path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently not the case. Base path is prepended to both absolute and relative imports. I think it's reasonable to expect the file loader not to prepend base path if the import key starts with /
.
docs/layout-of-source-files.rst
Outdated
Base path also affects the way paths specified on the command line are converted into import keys. | ||
The import key normally is the absolute, normalized path to the file in the UNIX format but if the | ||
file happens to be inside the directory designated as the base path or one of its subdirectories | ||
the prefix is stripped from its import key and it becomes relative to base. | ||
|
||
.. code-block:: bash | ||
|
||
cd /home/user | ||
solc /project/contract.sol # Import key: /project/contract.sol | ||
solc /project/contract.sol --base-path /project # Import key: contract.sol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how it works now but the current behavior has been reported as a bug in #4702.
#4702 (comment) pretty clearly indicates that stripping of base path is desirable. It states that the file path should not be converted to an absolute one though. I think it should because this way we will get consistent import keys regardless of where a file is being compiled from. When using a relative path on the command line you have to make it relative to the current working directory and there is no other choice. Therefore you cannot always use relative paths on the command line to match relative paths in your imports. I think that making paths inside base path relative and others absolute is a better mechanism.
docs/layout-of-source-files.rst
Outdated
Note that if you do not specify base path, it is by default equal to the current working directory: | ||
|
||
.. code-block:: bash | ||
|
||
cd /project | ||
solc /home/user/contract.sol # Import key: contract.sol | ||
solc /home/user/contract.sol --base-path /project # Import key: contract.sol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not true. Base path is by default ""
. This does not matter when the file loader does filesystem lookup (the result will be the same) but does make a difference for the purpose of stripping base path from file paths specified on the command line.
docs/layout-of-source-files.rst
Outdated
- This means that you cannot remap ``./`` or ``../`` directly since they are replaced during | ||
translation to import keys but you can remap the source locations they resolve into: | ||
|
||
.. code-block:: bash | ||
|
||
solc ./=a /project=b /project/contract.sol | ||
|
||
.. code-block:: solidity | ||
:caption: /project/contract.sol | ||
|
||
import "./util.sol" as util; // Import key: b/util.sol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that being able to remap the resolved path is actually useful. See my comment from #5281 (comment):
Also, I wonder if the current behavior is actually useful in practice. In normal circumstances it seems redundant. For example if you do
/a/b=/x/y
then you changeimport "/a/b/c.sol"
toimport "/x/y/c.sol"
. Ifd.sol
then containsimport "./d.sol"
, the compiler will look ford.sol
in/x/y/
(because that's wherec.sol
is physically located) so the remapping does not play any role here.The only thing you can achieve by remapping the expanded path is the unintuitive behavior where
import "./d.sol"
ends up importing a file that's not in the same directory asc.sol
. This would happen for example if you used/a/b=/x/y /x/y=/a/b
in the example above.For that reason I think it would be better to just completely forbid remapping
.
and..
. To me it looks like its main use would be writing underhanded code :)
- Remapping happens after paths relative to the source directory have already been resolved. | ||
This means that targets starting with ``./`` and ``../`` have no special meaning and are | ||
relative to the base directory rather than to the source location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if might be better to disallow remapping targets to start with ./
or ../
. I think they are not useful in practice and they are ambiguous (I originally assumed they would actually work like imports relative to source but they do not).
docs/layout-of-source-files.rst
Outdated
#. **At most one remapping can be applied to a single import.** | ||
|
||
- If multiple remappings match the same key, the one with the longest matching prefix is chosen. | ||
- If prefixes are identical, the one specified last wins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that using duplicate prefixes with the same context should be an error. It should still be possible to have identical prefixes in different contexts though.
I haven't checked what happens in Standard JSON when there are multiple identical prefixes but I'm assuming it's the same as on the command line (I will need to include that case in tests).
docs/layout-of-source-files.rst
Outdated
``file://`` prefix is stripped from import paths and from filesystem paths specified in | ||
``urls`` in Standard JSON. It is **not** stripped from filesystem paths provided on the command | ||
line. | ||
For example the following will not result in ``contract.sol`` being loaded: | ||
|
||
:: | ||
.. code-block:: bash | ||
|
||
solc file://contract.sol | ||
|
||
The compiler will instead try to find it in a directory called ``file:`` and fail if such a | ||
directory does not exist or does not contain ``contract.sol``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should strip file://
from command-line paths too if we're stripping it in other situations.
docs/layout-of-source-files.rst
Outdated
The compiler sees both as opaque identifiers and there is no normalization involved: | ||
|
||
:: | ||
|
||
import "lib/../lib///math.sol" as math; // Import key: lib/../lib///math.sol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After writing this all down, I think that not normalizing imports makes behavior more complex than it needs to be and causes tons of unexpected corner cases. Even more so given that we do normalize them in some cases (i.e. in imports relative to source). I think that we should either do some limited normalization on all imports (i.e. remove duplicate slashes, collapse ./
and ../
segments where possible, etc.), or import paths to be already normalized.
docs/layout-of-source-files.rst
Outdated
Virtual Filesystem | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
The compiler maintains an internal database (virtual filesystem) where each compiled module is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use the word "module"? (I would have expected "source unit')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it. But doesn't source unit also include the stuff it imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the docs and I see that we use "source unit" most of the time. "Module" is used very rarely in that sense. This page was actually the one that used the term "module" the most (in the old version) :)
I replaced all mentions of "module" with "source unit" in the text except for a few cases where I think it is used in a more general sense.
docs/layout-of-source-files.rst
Outdated
~~~~~~~~~~~~~~~~~~ | ||
|
||
The compiler maintains an internal database (virtual filesystem) where each compiled module is | ||
assigned a unique *import key* which is an opaque and unstructured identifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can say that, since .
and ..
actually use structure inside these identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these docs I distinguish between "import paths" (paths you specify in import statement) and "import key" (the ID that ends up assigned to the file in the virtual filesystem). AFAIK .
and ..
only have special meaning in import paths, not import keys.
docs/layout-of-source-files.rst
Outdated
} | ||
|
||
The ``sources`` dictionary specifies the initial content of the virtual filesystem and you | ||
can use import keys directly there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should really use term "import key" here. These values are the names of the source files as the compiler calls them. The import statement is just one way to reference to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"import key" is exactly the term I am using for "names of the source files as the compiler calls them" :) And is different from "import path", which is the exact text you put inside import "...";
.
Is there another name that would be clearer? Maybe "source unit ID"? Or "VFS key"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed "import key" to "source unit ID" everywhere in the text.
docs/layout-of-source-files.rst
Outdated
.. warning:: | ||
|
||
The compiler uses import keys to determine whether imports refer to the same module or not. | ||
If you refer to a file in multiple ways that translate to different keys, it will be compiled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like it would only generate the code twice, but the problem is that you might also get symbol collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked and --metadata
shows three different instances of the token contract.
You're right about the collision. I need to add the as
part to the imports to avoid them.
docs/layout-of-source-files.rst
Outdated
import "/project/lib/util.sol" as util; // Import key: /project/lib/util.sol | ||
import "/project/lib/../lib///math.sol" as math; // Import key: /project/lib/../lib///math.sol | ||
|
||
In the above you might expect the import key to be reduced to ``/project/lib/math.sol`` but it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should fix this,shouldn't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we agree that it should not work that way. From #2738 I got the impression that things are not being normalized on purpose.
c4f3c20
to
54df574
Compare
We discussed import paths with @chriseth today. I posted a summary on the channel but I'm pasting it here as well because it's related:
|
filesystem, and the path can also map to resources such as ipfs, http or git. | ||
.. index:: source unit ID, import path, filesystem path | ||
|
||
Virtual Filesystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still all inside "Layout of a source file". I think we should only explain the syntax of the import statement and the different variants there and move the path-related stuff somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #11263. This is now a separate section called "Path Resolution" in the side menu. The layout page only contains a short summary.
docs/layout-of-source-files.rst
Outdated
In case of the command-line compiler the file loader simply uses it as a path. | ||
The `JavaScript interface <https://github.com/ethereum/solc-js>`_ is a bit more flexible in that | ||
regard and allows the user to provide a callback to perform this operation. | ||
In this case the IDs can be arbitrary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IDs can always be arbitrary, the cmdline interface is just unlikely to find appropriate files. Also standard-json allows arbitrary Ids doesn't it?
6cd8eae
to
0e47450
Compare
I have just pushed some minimal changes: fixes for some of the comments, renaming of "source unit ID" to "source unit name" and small tweaks I made while working on #11263. None of it changes the text substantially. This is only to keep it clean in case I need to get back to it. I'm closing this PR in favor of #11263. The text has changed a lot so I have decided to create a new PR and keep this one for reference. Also, the goal of the PR has changed - the new one only documents the existing situation without any proposed modifications. |
Related to #11036 and PR #9353.
Also related to a bunch of smaller bug reports/feature requests related to paths: #9346, #2738, #4702, #5146, #11042, #11039, #11038, #10980, #4623, #5281.
This PR is an attempt to explain how I think the compiler is currently supposed to handle paths used in imports, on the command line, in Standard JSON, on the standard input, etc. This means that it does not really reflect how it all works now. I ignored known bugs and introduced some changes to make it match my expectations better and I'm going point out the differences in review comments. This is meant as a starting point for discussion and I think it should be merged into the docs once we agree whether the changes I'm proposing make sense and we implement them.
The docs ended up being pretty long. All the available features and abstractions make this surprisingly complex and produce many weird corner cases, though after working through it systematically many things I considered bugs while reporting #11036 suddenly appear to make sense. They definitely do not seem as such when you first encounter them so unfortunately I think that a long explanation is necessary if we expect users to understand how it all works.
I introduced some new terminology in the PR (import keys, import paths, virtual filesystem, imports relative to base/source) which might be a bit controversial (I'm open to renaming them) but expanding the vocabulary is badly needed here to be able to refer to things without being ambiguous or having to clarify terms with a full sentence every time they appear.