Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stripping base path from CLI paths (without CLI refactor) #11617

Closed

Conversation

cameel
Copy link
Member

@cameel cameel commented Jul 5, 2021

Fixes #4702.
Implements the solc part of #11408. solc-js changes will be submitted in that repo.

This is a version of #11545 that does not depend on the CLI refactor PRs (#11518, #11520, #11544) and does not include test cases for the CLI.

Things that were not implemented or were implemented differently than specified in the issue:

  • Extra normalization: on Windows drive letter is removed from the path if possible.
  • On case-sensitive case-preserving filesystems does not use the case from disk.

@cameel cameel self-assigned this Jul 5, 2021
@cameel
Copy link
Member Author

cameel commented Jul 5, 2021

In case anyone is wondering why this is so much more complex than my initial PR, here's a non-exhaustive list of path-related weirdness it has to account for:

  • /a/b/c on Windows is a relatve path. Only C:/a/b/c (or C:\a\b\c) is absolute.
  • A path consisting only of a drive letter (C:) is relative and not equivalent to the path to the root directory (C:\, \ or /).
  • If the path to the working directory contains symlinks, absolute() creates an absolute path that may or may not have these links resolved.
    • On macOS they are resolved and /var/ (which contains the tmp dir in my tests) is a symlink to /private/var/.
    • On Windows they are not resolved.
  • On Windows it is possible for / and \ to represent two different directory trees.
  • Windows does support symbolic links. But:
    • There is a special type of symlink for directories. A non-directory symlink can still point at a directory, but it is not recognized as a directory (you can't use it as a part of a path for example).
    • If the symlink target is a relative path, only \ works as a path separator. If it's absolute, both / and \ work.
  • Boost's canonical() and weakly_canonical() both resolve symlinks. There's no direct equivalent that does not.
  • Boost's lexically_normal() does not convert /../../a/b/c into /a/b/c. It has to be done manually.
  • Boost's normalization always includes drive letter in Windows paths. This makes canonical normalized paths different on different between Windows and other platforms. Drive letters need to be stripped manually.
  • UNC paths (e.g. //host/b/c or \\host\b\c on Windows) are relative despite having a root name.
  • On Windows, when the current directory is an UNC path, \ is not a valid path (at least in the shell) and does not refer to the root directory.
  • Making an UNC path absolute results in the current path being appended at the end rather than at the beginning like with normal paths.
  • Even though duplicate path separators (a//b/c.sol) and . at the end of a path (a/b/c/.) are ignored when Boost compares paths, they are still stored in the path object.

@@ -83,7 +84,51 @@ class FileReader
return [this](std::string const& _kind, std::string const& _path) { return readFile(_kind, _path); };
}

/// Normalizes a filesystem path in a way that removes small, inconsequential differences. Specifically:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just sommarize in the header and put the full description in the cpp file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I thought the docstring would be a better place for this. It is a bit detailed but on the other hand it still describes the effects visible to the user of the function rather than implementation details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

docs/path-resolution.rst Outdated Show resolved Hide resolved
docs/path-resolution.rst Outdated Show resolved Hide resolved
docs/path-resolution.rst Outdated Show resolved Hide resolved
docs/path-resolution.rst Outdated Show resolved Hide resolved
docs/path-resolution.rst Outdated Show resolved Hide resolved
@chriseth
Copy link
Contributor

chriseth commented Jul 6, 2021

Did not review the actual code.

@cameel cameel force-pushed the stripping-base-path-from-cli-without-cli-refactor branch from 46ac037 to a9c06fb Compare July 6, 2021 13:40
Copy link

@Jessejames1180 Jessejames1180 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

);

string rootName = normalizedPath.root_name().string();
boost::filesystem::path normalizedRootPath =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really correct to define normalizedRootPath here again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It should not be redeclared. Weird that it did not break any tests though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The normalization from \\host and //host for UNC paths was broken because of this. Tests did not break because boost ignores slashes is comparisons. Slashes do not matter as long as we use generic_string() on the path before using it but better have it normalized in case someone uses native() instead.

@cameel cameel force-pushed the stripping-base-path-from-cli-without-cli-refactor branch 3 times, most recently from 0d67e71 to 6eaf232 Compare July 13, 2021 08:20
Copy link
Member

@leonardoalt leonardoalt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, looks good in general

libsolidity/interface/FileReader.h Outdated Show resolved Hide resolved
/// Paths are treated as case-sensitive. Does not require the path to actually exist in the
/// filesystem and does not follow symlinks. Only considers whole segments, e.g. /abc/d is not
/// considered a prefix of /abc/def. Both paths must be non-empty.
static bool isPathPrefix(boost::filesystem::path _prefix, boost::filesystem::path const& _path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason _prefix is not const&?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm modifying it in place inside the function. I could add a local variable but it was just more convenient to modify the parameter.

libsolidity/interface/FileReader.h Outdated Show resolved Hide resolved
libsolidity/interface/FileReader.h Outdated Show resolved Hide resolved
/// If @a _prefix is actually a prefix of @p _path, removes it from @a _path to make it relative.
/// Otherwise returns @a _path unchanged.
/// Returns '.' if @a _path and @_prefix are identical.
static boost::filesystem::path stripPrefixIfPresent(boost::filesystem::path _prefix, boost::filesystem::path const& _path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re const&

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_prefix.remove_filename();

boost::filesystem::path strippedPath = _path.lexically_relative(_prefix);
solAssert(strippedPath.empty() || *strippedPath.begin() != "..", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So each element of strippedPath is a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually path but strings are implicitly convertible to that type.

Boost let you iterate over path segments while skipping separators. E.g. in /a/bc/def.sol you can iterate over a, bc and def.sol.

rootName.size() == 2 ||
(rootName.size() > 2 && rootName[2] != rootName[1])
) && (
(rootName[0] == '/' && rootName[1] == '/')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean "why both characters should be /" or "why both // and \\"?

If it's the former - this is meant to detect UNC paths of the form //host/a/bc/def.sol.
If the latter: Windows accepts both //host and \\host as UNC path root name. Outside of windows only the one starting with // is treated specially.

BOOST_TEST(FileReader::normalizeCLIPathForVFS("/a/b/c/../") == "/a/b/");

BOOST_TEST(FileReader::normalizeCLIPathForVFS("/a/b/c/../../..") == "/");
BOOST_TEST(FileReader::normalizeCLIPathForVFS("/a/b/c/../../../") == "/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about some error cases? Like /a/b/c/../../../../

Copy link
Member Author

@cameel cameel Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function has no error cases :) The function is supposed to be able to normalize every possible path.

/a/b/c/../../../../ will reduce to /. I do have a test for that. In fact lots of tests - see normalizeCLIPathForVFS_path_beyond_root below. I grouped them together not to have monster test cases spanning half of the file :)

test/libsolidity/interface/FileReader.cpp Outdated Show resolved Hide resolved

// NOTE: If path to work dir contains symlinks (of then the case on macOS), boost might resolve
// them, making the path different from tempDirPath.
boost::filesystem::path expectedWorkDir = boost::filesystem::current_path().parent_path().parent_path().parent_path();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why parent 3 times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the working directory is set to tempDir.path() / "x/y/z" and the paths I'm testing below sometimes go up a few levels so they do not always preserve the x/y/z part. So I basically want tempDir.path() but normalized the way a working directory would be (e.g. on MacOS the temp dir I get usually has a symlink in it that I need to resolve; on Windows it has a drive letter).

But I just realized that now it's no longer working dir so the name is a bit misleading so maybe that's what you meant. Renamed to expectedPrefix.

@cameel cameel force-pushed the stripping-base-path-from-cli-without-cli-refactor branch 2 times, most recently from 44b33eb to f6fde4e Compare July 16, 2021 10:10
@cameel cameel requested review from leonardoalt and aarlt July 16, 2021 10:12
@cameel cameel force-pushed the stripping-base-path-from-cli-without-cli-refactor branch from f6fde4e to b8e7786 Compare July 24, 2021 03:19
@cameel
Copy link
Member Author

cameel commented Jul 24, 2021

Similar to #11544 (comment), I'm adding some refactors and small changes from work on #11688 as fixups (previous fixups squashed). Again, functionality did not change and almost all the changes were just in tests:

  • Docstring tweaks: document that isPathPrefix() ignores the trailing slash and that normalizeCLIPathForVFS() always adds a slash after . + test cases for this.
  • dotDotPrefix moved moved to a better spot in in normalizeCLIPathForVFS() (does not affect functionality).
  • Unused code removed from case-sensitivity test.
  • Root name stripping test now is not completely skipped outside of Windows (helps to make sure it at least compiles).
  • TEST_CASE_NAME.
  • BOOST_TEST() replaced with BOOST_CHECK_EQUAL() when comparing paths. The former for some reason does not print boost::filesystem::path values when a test fails, which makes debugging CI failures much more tedious.
    • Extra parentheses removed from BOOST_TEST() in a few cases where the << operator actually exists (these parentheses were a way to make the the macro not try to print stuff).

@cameel
Copy link
Member Author

cameel commented Jul 27, 2021

I'm closing this in favor of #11545. The only point of this PR was to be able to merge it without having to review all the dependencies first. Now that the dependencies are already merged, there's no point in keeping it around.

I kept these PRs in sync at all times so if you already reviewed this, the only extra thing to review in #11545 is test/solc/CommandLineInterface.cpp.

@cameel cameel closed this Jul 27, 2021
@cameel cameel deleted the stripping-base-path-from-cli-without-cli-refactor branch July 27, 2021 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CLI] Relative paths result in different contracts
5 participants