Skip to content

Commit

Permalink
FileReader: Normalize base path and strip it from normalized source p…
Browse files Browse the repository at this point in the history
…aths
  • Loading branch information
cameel committed Jul 12, 2021
1 parent dec43b6 commit 76a250f
Show file tree
Hide file tree
Showing 6 changed files with 693 additions and 10 deletions.
1 change: 1 addition & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Language Features:

Compiler Features:
* AssemblyStack: Also run opcode-based optimizer when compiling Yul code.
* Commandline Interface: Normalize paths specified on the command line and make them relative whenever files are located inside base path.
* Yul EVM Code Transform: Do not reuse stack slots that immediately become unreachable.
* Yul EVM Code Transform: Also pop unused argument slots for functions without return variables (under the same restrictions as for functions with return variables).
* Yul Optimizer: Move function arguments and return variables to memory with the experimental Stack Limit Evader (which is not enabled by default).
Expand Down
48 changes: 45 additions & 3 deletions docs/path-resolution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,10 @@ The initial content of the VFS depends on how you invoke the compiler:
solc contract.sol /usr/local/dapp-bin/token.sol
The source unit name of a file loaded this way is simply the specified path after shell expansion
and with platform-specific separators converted to forward slashes.
The source unit name of a file loaded this way is constructed by converting its path to a
canonical form and making it relative to the base path if it is located inside.
See :ref:`Base Path Normalization and Stripping <base-path-normalization-and-stripping>` for
a detailed description of this process.

.. index:: standard JSON

Expand Down Expand Up @@ -313,6 +315,46 @@ interpreted as absolute paths on disk.
If the base path itself is relative, it is also interpreted as relative to the current working
directory of the compiler.

.. _base-path-normalization-and-stripping:

Base Path Normalization and Stripping
-------------------------------------

When source file paths are specified on the command line, the base path affects the source unit
names assigned to them in the compiler's VFS.
To compute the names, both base path and source file paths must first be converted to a canonical form.
This ensures that the result is predictable and as platform-independent as possible:

- If a path is relative, it is made absolute by prepending the current working directory to it.

- If the path to the working directory contains symbolic links, they are resolved into actual
directories.

- Internal ``.`` and ``..`` segments are collapsed.
- Platform-specific path separators are replaced with forward slashes.
- Sequences of multiple consecutive path separators are squashed into a single separator (unless
they are the leading slashes of an `UNC path <https://en.wikipedia.org/wiki/Path_(computing)#UNC>`_).
- If the path includes a root name (e.g. a drive letter on Windows) and the root is the same as the
root of the current working directory, the root is replaced with ``/``.
- Symbolic links in the path itself are **not** resolved.
- The original case of the path is preserved even if the filesystem is case-insensitive but
`case-preserving <https://en.wikipedia.org/wiki/Case_preservation>`_ and the actual case on
disk is different.

.. note::

There are situations where paths cannot be made platform-independent.
For example on Windows the compiler can avoid using drive letters by referring to the root
directory of the current drive as ``/`` but drive letters are still necessary for paths leading
to other drives.
You can avoid such situations by ensuring that all the files are available within a single
directory tree on the same drive.

Once canonicalized, the base path is stripped from all source file paths that start with it.
If the base path is empty (e.g. if it is not explicitly provided), it is treated as if it was equal
to the path to the current working directory with all symbolic links resolved.
The result becomes the source unit name.

.. index:: ! remapping; import, ! import; remapping, ! remapping; context, ! remapping; prefix, ! remapping; target
.. _import-remapping:

Expand Down Expand Up @@ -414,7 +456,7 @@ Here are the detailed rules governing the behaviour of remappings:

.. code-block:: bash
solc /project/=/contracts/ /project/contract.sol --base-path /project # source unit name: /project/contract.sol
solc /project/=/contracts/ /project/contract.sol --base-path /project # source unit name: contract.sol
.. code-block:: solidity
:caption: /project/contract.sol
Expand Down
162 changes: 161 additions & 1 deletion libsolidity/interface/FileReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
#include <libsolutil/CommonIO.h>
#include <libsolutil/Exceptions.h>

#include <boost/algorithm/string/predicate.hpp>

using solidity::frontend::ReadCallback;
using solidity::langutil::InternalCompilerError;
using solidity::util::errinfo_comment;
Expand All @@ -31,9 +33,20 @@ using std::string;
namespace solidity::frontend
{

void FileReader::setBasePath(boost::filesystem::path const& _path)
{
if (_path.empty())
m_basePath = "";
else
m_basePath = normalizeCLIPathForVFS(_path);
}

void FileReader::setSource(boost::filesystem::path const& _path, SourceCode _source)
{
m_sourceCodes[_path.generic_string()] = std::move(_source);
boost::filesystem::path normalizedPath = normalizeCLIPathForVFS(_path);
boost::filesystem::path prefix = (m_basePath.empty() ? normalizeCLIPathForVFS(".") : m_basePath);

m_sourceCodes[stripPrefixIfPresent(prefix, normalizedPath).generic_string()] = std::move(_source);
}

void FileReader::setSources(StringMap _sources)
Expand Down Expand Up @@ -92,5 +105,152 @@ ReadCallback::Result FileReader::readFile(string const& _kind, string const& _so
}
}

boost::filesystem::path FileReader::normalizeCLIPathForVFS(boost::filesystem::path const& _path)
{
// Detailed normalization rules:
// - Makes the path either be absolute or have slash as root (note that on Windows paths with
// slash as root are not considered absolute by Boost). If it is empty, it becomes
// the current working directory.
// - Collapses redundant . and .. segments.
// - Removes leading .. segments from an absolute path (i.e. /../../ becomes just /).
// - Squashes sequences of multiple path separators into one.
// - Ensures that forward slashes are used as path separators on all platforms.
// - Removes the root name (e.g. drive letter on Windows) when it matches the root name in the
// path to the current working directory.
//
// Also note that this function:
// - Does NOT resolve symlinks (except for symlinks in the path to the current working directory).
// - Does NOT check if the path refers to a file or a directory. If the path ends with a slash,
// the slash is preserved even if it's a file.
// - Preserves case. Even if the filesystem is case-insensitive but case-preserving and the
// case differs, the actual case from disk is NOT detected.

boost::filesystem::path canonicalWorkDir = boost::filesystem::weakly_canonical(boost::filesystem::current_path());

// NOTE: On UNIX systems the path returned from current_path() has symlinks resolved while on
// Windows it does not. To get consistent results we resolve them on all platforms.
boost::filesystem::path absolutePath = boost::filesystem::absolute(_path, canonicalWorkDir);

// NOTE: boost path preserves certain differences that are ignored by its operator ==.
// E.g. "a//b" vs "a/b" or "a/b/" vs "a/b/.". lexically_normal() does remove these differences.
boost::filesystem::path normalizedPath = absolutePath.lexically_normal();
solAssert(normalizedPath.is_absolute() || normalizedPath.root_path() == "/", "");

// lexically_normal() will not squash paths like "/../../" into "/". We have to do it manually.
boost::filesystem::path dotDotPrefix = absoluteDotDotPrefix(normalizedPath);

// If the path is on the same drive as the working dir, for portability we prefer not to
// include the root name. Do this only for non-UNC paths - my experiments show that on Windows
// when the working dir is an UNC path, / does not not actually refer to the root of the UNC path.
// For UNC paths only normalize the root name to start with //.
boost::filesystem::path normalizedRootPath = normalizedPath.root_path();
if (!isUNCPath(normalizedPath))
{
boost::filesystem::path workingDirRootPath = canonicalWorkDir.root_path();
if (normalizedRootPath == workingDirRootPath)
normalizedRootPath = "/";
}
else
{
solAssert(
boost::starts_with(normalizedPath.root_name().string(), "//") ||
boost::starts_with(normalizedPath.root_name().string(), "\\\\"),
""
);

string rootName = normalizedPath.root_name().string();
boost::filesystem::path normalizedRootPath =
normalizedPath.root_directory().string() +
"//" +
(rootName.size() > 2 ? rootName.substr(2, string::npos) : "");
}

boost::filesystem::path normalizedPathNoDotDot = normalizedPath;
if (dotDotPrefix.empty())
normalizedPathNoDotDot = normalizedRootPath / normalizedPath.relative_path();
else
normalizedPathNoDotDot = normalizedRootPath / normalizedPath.lexically_relative(normalizedPath.root_path() / dotDotPrefix);
solAssert(!hasDotDotSegments(normalizedPathNoDotDot), "");

// NOTE: On Windows lexically_normal() converts all separators to forward slashes. Convert them back.
// Separators do not affect path comparison but remain in internal representation returned by native().
normalizedPathNoDotDot = normalizedPathNoDotDot.generic_string();

// For some reason boost considers "/." different than "/" even though for other directories
// the trailing dot is ignored.
if (normalizedPathNoDotDot == "/.")
return "/";

return normalizedPathNoDotDot;
}

bool FileReader::isPathPrefix(boost::filesystem::path _prefix, boost::filesystem::path const& _path)
{
solAssert(!_prefix.empty() && !_path.empty(), "");
// NOTE: On Windows paths starting with a slash (rather than a drive letter) are considered relative by boost.
solAssert(_prefix.is_absolute() || isUNCPath(_prefix) || _prefix.root_path() == "/", "");
solAssert(_path.is_absolute() || isUNCPath(_path) || _path.root_path() == "/", "");
solAssert(_prefix == _prefix.lexically_normal() && _path == _path.lexically_normal(), "");
solAssert(!hasDotDotSegments(_prefix) && !hasDotDotSegments(_path), "");

// Before 1.72.0 lexically_relative() was not handling paths with empty, dot and dot dot segments
// correctly (see https://github.com/boostorg/filesystem/issues/76). The only case where this
// is possible after our normalization is a directory name ending in a slash (filename is a dot).
if (_prefix.filename_is_dot())
_prefix.remove_filename();

boost::filesystem::path strippedPath = _path.lexically_relative(_prefix);
return !strippedPath.empty() && *strippedPath.begin() != "..";
}

boost::filesystem::path FileReader::stripPrefixIfPresent(boost::filesystem::path _prefix, boost::filesystem::path const& _path)
{
if (!isPathPrefix(_prefix, _path))
return _path;

if (_prefix.filename_is_dot())
_prefix.remove_filename();

boost::filesystem::path strippedPath = _path.lexically_relative(_prefix);
solAssert(strippedPath.empty() || *strippedPath.begin() != "..", "");
return strippedPath;
}

boost::filesystem::path FileReader::absoluteDotDotPrefix(boost::filesystem::path const& _path)
{
solAssert(_path.is_absolute() || _path.root_path() == "/", "");

boost::filesystem::path _pathWithoutRoot = _path.relative_path();
boost::filesystem::path prefix;
for (boost::filesystem::path const& segment: _pathWithoutRoot)
if (segment.filename_is_dot_dot())
prefix /= segment;

return prefix;
}

bool FileReader::hasDotDotSegments(boost::filesystem::path const& _path)
{
for (boost::filesystem::path const& segment: _path)
if (segment.filename_is_dot_dot())
return true;

return false;
}

bool FileReader::isUNCPath(boost::filesystem::path const& _path)
{
string rootName = _path.root_name().string();

return (
rootName.size() == 2 ||
(rootName.size() > 2 && rootName[2] != rootName[1])
) && (
(rootName[0] == '/' && rootName[1] == '/')
#if defined(_WIN32)
|| (rootName[0] == '\\' && rootName[1] == '\\')
#endif
);
}

}
48 changes: 42 additions & 6 deletions libsolidity/interface/FileReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,27 +45,28 @@ class FileReader
boost::filesystem::path _basePath = {},
FileSystemPathSet _allowedDirectories = {}
):
m_basePath(std::move(_basePath)),
m_allowedDirectories(std::move(_allowedDirectories)),
m_sourceCodes()
{}
{
setBasePath(_basePath);
}

void setBasePath(boost::filesystem::path _path) { m_basePath = std::move(_path); }
void setBasePath(boost::filesystem::path const& _path);
boost::filesystem::path const& basePath() const noexcept { return m_basePath; }

void allowDirectory(boost::filesystem::path _path) { m_allowedDirectories.insert(std::move(_path)); }
FileSystemPathSet const& allowedDirectories() const noexcept { return m_allowedDirectories; }

StringMap const& sourceCodes() const noexcept { return m_sourceCodes; }

/// Retrieves the source code for a given source unit ID.
/// Retrieves the source code for a given source unit name.
SourceCode const& sourceCode(SourceUnitName const& _sourceUnitName) const { return m_sourceCodes.at(_sourceUnitName); }

/// Resets all sources to the given map of source unit ID to source codes.
/// Resets all sources to the given map of source unit name to source codes.
/// Does not enforce @a allowedDirectories().
void setSources(StringMap _sources);

/// Adds the source code for a given source unit ID.
/// Adds the source code under a source unit name created by normalizing the file path.
/// Does not enforce @a allowedDirectories().
void setSource(boost::filesystem::path const& _path, SourceCode _source);

Expand All @@ -83,7 +84,42 @@ class FileReader
return [this](std::string const& _kind, std::string const& _path) { return readFile(_kind, _path); };
}

/// Normalizes a filesystem path to make it include all components up to the filesystem root,
/// remove small, inconsequential differences that do not affect the meaning and make it look
/// the same on all platforms (if possible). Symlinks in the path are not resolved.
/// The resulting path uses forward slashes as path separators, has no redundant separators,
/// has no redundant . or .. segments and has no root name if removing it does not change the meaning.
/// The path does not have to actually exist.
static boost::filesystem::path normalizeCLIPathForVFS(boost::filesystem::path const& _path);

/// Returns true if all the path components of @a _prefix are present at the beginning of @a _path.
/// Both paths must be absolute (or have slash as root) and normalized (no . or .. segments, no
/// multiple consecutive slashes).
/// Paths are treated as case-sensitive. Does not require the path to actually exist in the
/// filesystem and does not follow symlinks. Only considers whole segments, e.g. /abc/d is not
/// considered a prefix of /abc/def. Both paths must be non-empty.
static bool isPathPrefix(boost::filesystem::path _prefix, boost::filesystem::path const& _path);

/// If @a _prefix is actually a prefix of @p _path, removes it from @a _path to make it relative.
/// Otherwise returns @a _path unchanged.
/// Returns '.' if @a _path and @_prefix are identical.
static boost::filesystem::path stripPrefixIfPresent(boost::filesystem::path _prefix, boost::filesystem::path const& _path);

// Returns true if the specified path is an UNC path.
// UNC paths start with // followed by a name (on Windows they can also start with \\).
// They are used for network shares on Windows. On UNIX systems they do not have the same
// functionality but usually they are still recognized and treated in a special way.
static bool isUNCPath(boost::filesystem::path const& _path);

private:
/// If @a _path starts with a number of .. segments, returns a path consisting only of those
/// segments (root name is not included). Otherwise returns an empty path. @a _path must be
/// absolute (or have slash as root).
static boost::filesystem::path absoluteDotDotPrefix(boost::filesystem::path const& _path);

/// Returns true if the path contains any .. segments.
static bool hasDotDotSegments(boost::filesystem::path const& _path);

/// Base path, used for resolving relative paths in imports.
boost::filesystem::path m_basePath;

Expand Down
1 change: 1 addition & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ set(libsolidity_sources
libsolidity/SyntaxTest.h
libsolidity/ViewPureChecker.cpp
libsolidity/analysis/FunctionCallGraph.cpp
libsolidity/interface/FileReader.cpp
)
detect_stray_source_files("${libsolidity_sources}" "libsolidity/")

Expand Down
Loading

0 comments on commit 76a250f

Please sign in to comment.