Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stripping --base-path from CLI paths + CLI path normalization (path spec v3) #11408

Closed
cameel opened this issue May 19, 2021 · 4 comments
Closed
Assignees

Comments

@cameel
Copy link
Member

cameel commented May 19, 2021

Closes #4702.
Related to #9353.
Part of a set of issues that replaces #11138.

Abstract

Currently the paths specified on the command line are used directly as source unit names in compiler's virtual filesystem. This means that the paths embedded in contract metadata (and as a consequence also the generated bytecode) can be different depending on where the compiler is invoked from and whether an absolute or a relative path was used. This may lead to problems with reproducing the same bytecode for verification. Another undesirable effect is that when the same file is used in an import, its source unit name may be different leading to it being seen by the compiler as a different file and being compiled again (#4702).

Many users are not even aware of the distinction between CLI paths and source unit names (see #11263 for a detailed description of how paths/imports work currently) which makes it hard to understand and resolve the resulting issues.

My proposed solution is to always convert paths specified on the CLI to a canonical form and, if possible, make them relative to base path.

Motivation

Let's assume we have /tmp/contract.sol with the following content:

import "tokens/token.sol"; // source unit name: tokens/token.sol

/tmp/tokens/token.sol contains:

contract Token {}

and we compile them with:

cd /tmp
solc --metadata contract.sol /tmp/tokens/token.sol tokens///token.sol # source unit names: /tmp/tokens/token.sol, tokens///token.sol

token.sol ends up in the virtual filesystem under three different source unit names even though all the paths refer to the same file in the underlying filesystem:

======= /tmp/tokens/token.sol:Token =======
Metadata:
{"compiler":{"version":"0.8.4+commit.c7e474f2"},"language":"Solidity","output":{"abi":[],"devdoc":{"kind":"dev","methods":{},"version":1},"userdoc":{"kind":"user","methods":{},"version":1}},"settings":{"compilationTarget":{"/tmp/tokens/token.sol":"Token"},"evmVersion":"istanbul","libraries":{},"metadata":{"bytecodeHash":"ipfs"},"optimizer":{"enabled":false,"runs":200},"remappings":[]},"sources":{"/tmp/tokens/token.sol":{"keccak256":"0x0ef316769f47a1402d4b77e0eee0e4b048f2928077bd178d3a0c32015b0ee9e9","urls":["bzz-raw://a4123fed04d2dccefceb39a66f625e7f3bad8f8c40def33580cfce2bafd395ad","dweb:/ipfs/QmcngRtGbNaXjQc7PQU3qvMV3jg1MQjYKaZtGKzDxsZ2vb"]}},"version":1}

======= tokens///token.sol:Token =======
Metadata:
{"compiler":{"version":"0.8.4+commit.c7e474f2"},"language":"Solidity","output":{"abi":[],"devdoc":{"kind":"dev","methods":{},"version":1},"userdoc":{"kind":"user","methods":{},"version":1}},"settings":{"compilationTarget":{"tokens///token.sol":"Token"},"evmVersion":"istanbul","libraries":{},"metadata":{"bytecodeHash":"ipfs"},"optimizer":{"enabled":false,"runs":200},"remappings":[]},"sources":{"tokens///token.sol":{"keccak256":"0x0ef316769f47a1402d4b77e0eee0e4b048f2928077bd178d3a0c32015b0ee9e9","urls":["bzz-raw://a4123fed04d2dccefceb39a66f625e7f3bad8f8c40def33580cfce2bafd395ad","dweb:/ipfs/QmcngRtGbNaXjQc7PQU3qvMV3jg1MQjYKaZtGKzDxsZ2vb"]}},"version":1}

======= tokens/token.sol:Token =======
Metadata:
{"compiler":{"version":"0.8.4+commit.c7e474f2"},"language":"Solidity","output":{"abi":[],"devdoc":{"kind":"dev","methods":{},"version":1},"userdoc":{"kind":"user","methods":{},"version":1}},"settings":{"compilationTarget":{"tokens/token.sol":"Token"},"evmVersion":"istanbul","libraries":{},"metadata":{"bytecodeHash":"ipfs"},"optimizer":{"enabled":false,"runs":200},"remappings":[]},"sources":{"tokens/token.sol":{"keccak256":"0x0ef316769f47a1402d4b77e0eee0e4b048f2928077bd178d3a0c32015b0ee9e9","urls":["bzz-raw://a4123fed04d2dccefceb39a66f625e7f3bad8f8c40def33580cfce2bafd395ad","dweb:/ipfs/QmcngRtGbNaXjQc7PQU3qvMV3jg1MQjYKaZtGKzDxsZ2vb"]}},"version":1}

Specification

Paths to files specified on the command line should be processed according to the following rules:

  1. Make the path canonical:
    • Make it absolute if it's relative (treating as relative to the current working directory).
    • Replace platform-specific path separators with /.
    • Collapse ./ and ../ segments.
    • Squash sequences of multiple consecutive slashes into a single slash.
    • If the filesystem is case-insensitive, use the case the filesystem reports (rather than the exact case used on the command line).
  2. If the file is located inside the directory designated as --base-path, the path is made relative to that directory.
  3. Implement the same logic in solcjs.

Backwards Compatibility

The change is fully backwards-compatible. It does not change semantics or break any contracts. It also affects only files given on the command line (--standard-json remains completely unaffected).

Any CLI invocation that worked before will still work but now it may produce slightly different bytecode due to a change in metadata.

@cameel cameel self-assigned this May 19, 2021
@cameel cameel changed the title [path spec v3] Stripping --base-path from CLI paths + CLI path normalization Stripping --base-path from CLI paths + CLI path normalization (path spec v3) May 19, 2021
@chriseth
Copy link
Contributor

Does this also include "default --base-path to current directory"? Otherwise, this would worsen the situation when you use solc x.sol.

Can you also expand the specification to which logic is applied for the file load callback (or just explain how it behaves wrt base path currently)?

@cameel
Copy link
Member Author

cameel commented May 20, 2021

Does this also include "default --base-path to current directory"? Otherwise, this would worsen the situation when you use solc x.sol.

Good point. I did not take that into account. The default for --base-path is an empty path and this proposal does not change that so nothing would be stripped from CLI paths by default. solc x.sol would unfortunately give you an absolute path in metadata.

In that case I think we should treat empty --base-path specially. I.e. when it's empty strip current working directory instead.

A better solution in my opinion would be to change the default since the ability to use an empty path is just a weird loophole. But that would break importing from absolute paths even before #11410 so I'd go with the special case instead.

Can you also expand the specification to which logic is applied for the file load callback (or just explain how it behaves wrt base path currently)?

Current behavior is very simple: just take the exact string you got on the CLI and use boost::filesystem::path::generic_string() on it. This simply converts backslashes to forward slashes but otherwise leaves the path unchanged.

So for example if you give solc any of these: /project/test.sol, test.sol, ./././..////test.sol, you will end up with that exact string in the metadata. On the other hand if you use C:\project\test.sol on Windows, you'll get C:/project/test.sol in metadata.

I have updated the spec section to mention the slash conversion.

@cameel
Copy link
Member Author

cameel commented Jul 5, 2021

The solc part of this issue is now mostly implemented in #11617 (without CLI tests) and #11545 (with CLI tests).

Things that were not implemented or were implemented differently than specified here:

  • Extra normalization: on Windows drive letter is removed from the path if possible.
  • On case-sensitive case-preserving filesystems does not use the case from disk.
  • It was not mentioned in the description but symlinks in base path and in source file paths should not be resolved. The PR upholds this with one exception: if the path to the current working directory contains symlinks, a version with resolved symlinks is used when converting a relative path to an absolute one.
  • UNC path are a source of a lot of corner cases and they are not portable anyway. I think we should just disallow them.

One downside of having this implemented without #11410 is that the normalization may result in absolute paths in cases where the user might not expect them and the user will likely not notice:

  • solc ../contract.sol will use the absolute path for contract.sol.
  • If the path to the current working directory contains a symlink, solc --base-path="$PWD/a/b" a/b/c.sol might result in an absolute path or not, depending on whether the symlink in $PWD is resolved or not.

@cameel
Copy link
Member Author

cameel commented Aug 27, 2021

Implemented by #11545 and ethereum/solc-js#533.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants