Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Handle extra white-space in MatchSpec #3456

Merged
merged 15 commits into from
Sep 19, 2024
54 changes: 40 additions & 14 deletions libmamba/src/specs/match_spec.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -502,32 +502,58 @@ namespace mamba::specs

auto MatchSpec::parse(std::string_view str) -> expected_parse_t<MatchSpec>
{
auto parse_error = [&str](std::string_view err) -> tl::unexpected<ParseError>
std::string raw_match_spec_str = std::string(str);
raw_match_spec_str = util::strip(raw_match_spec_str);

// Remove any with space after binary operators, such as:
// - `openmpi-4.1.4-ha1ae619_102`'s improperly encoded `constrains`: "cudatoolkit >= 10.2"
// - `pytorch-1.13.0-cpu_py310h02c325b_0.conda`'s improperly encoded
// `constrains`: "pytorch-cpu = 1.13.0", "pytorch-gpu = 99999999"
// - `fipy-3.4.2.1-py310hff52083_3.tar.bz2`'s improperly encoded `constrains` or
// `dep`: ">=4.5.2"
// - `infokonoha-4.6.3-pyhd8ed1ab_0.tar.bz2`'s `kytea >=0.1.4, 0.2.0` -> `kytea
// >=0.1.4,0.2.0`
// TODO: this solution reallocates memory several times potentially, but the
// number of operators is small and the strings are short, so it must be fine.
// If needed it can be optimized so that the string is only copied once.
for (const std::string& op : { ">=", "<=", "==", ">", "<", "!=", "=", "==", "," })
{
const std::string& bad_op = op + " ";
while (raw_match_spec_str.find(bad_op) != std::string::npos)
{
raw_match_spec_str = raw_match_spec_str.substr(0, raw_match_spec_str.find(bad_op)) + op
+ raw_match_spec_str.substr(
raw_match_spec_str.find(bad_op) + bad_op.size()
);
}
}

Comment on lines +505 to +530
Copy link
Member

@Hind-M Hind-M Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we won't have a choice and we will need to fix this this way, but I really think that we should do this properly and stop postponing everything to later, because it will just increase the complexity (making it harder to change things afterwards), especially regarding the MatchSpec...
IIRC this should be rather handled here, so we need to adapt the logic accordingly (and try to keep the string_view).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your proposal although there is some complexity with the low level parser: if we want to handle all the current cases while excluding the PEP 508 environment markers, I am afraid that the amount of complexity to manage will be far more complex that this horrible yet working and short solution.

In my opinion, the long-term robust solution (as discussed in the past) is to define a grammar for MatchSpec and use lexers to generate parsers in applications. But this goes far beyond the scope of this PR or the time we have at hand.

auto parse_error = [&raw_match_spec_str](std::string_view err) -> tl::unexpected<ParseError>
{
return tl::make_unexpected(
ParseError(fmt::format(R"(Error parsing MatchSpec "{}": {}")", str, err))
);
return tl::make_unexpected(ParseError(
fmt::format(R"(Error parsing MatchSpec "{}": {}")", raw_match_spec_str, err)
));
};

static constexpr auto npos = std::string_view::npos;
str = util::strip(str);
if (str.empty())
raw_match_spec_str = util::strip(raw_match_spec_str);
if (raw_match_spec_str.empty())
{
return {};
}

// A plain URL like https://conda.anaconda.org/conda-forge/linux-64/pkg-6.4-bld.conda
if (has_archive_extension(str))
if (has_archive_extension(raw_match_spec_str))
{
return MatchSpec::parse_url(str);
return MatchSpec::parse_url(raw_match_spec_str);
}

// A URL with hash, generated by `mamba env export --explicit` like
// https://conda.anaconda.org/conda-forge/linux-64/pkg-6.4-bld.conda#7dbaa197d7ba6032caf7ae7f32c1efa0
if (const auto idx = str.rfind(url_md5_sep); idx != npos)
if (const auto idx = raw_match_spec_str.rfind(url_md5_sep); idx != npos)
{
auto url = str.substr(0, idx);
auto hash = str.substr(idx + 1);
auto url = raw_match_spec_str.substr(0, idx);
auto hash = raw_match_spec_str.substr(idx + 1);
if (has_archive_extension(url))
{
return MatchSpec::parse_url(url).transform(
Expand All @@ -552,7 +578,7 @@ namespace mamba::specs
// - ``namespace``
// - ``spec >=3 [attr="val", ...]``
{
auto maybe_chan_ns_spec = split_channel_namespace_spec(str);
auto maybe_chan_ns_spec = split_channel_namespace_spec(raw_match_spec_str);
if (!maybe_chan_ns_spec)
{
return parse_error(maybe_chan_ns_spec.error().what());
Expand All @@ -572,7 +598,7 @@ namespace mamba::specs
out.m_channel = std::move(maybe_chan).value();
}

str = spec_str;
raw_match_spec_str = spec_str;
}

// Parse and apply bracket attributes ``attr="val"`` in ``pkg >=3 =mkl [attr="val", ...]``.
Expand All @@ -581,7 +607,7 @@ namespace mamba::specs
auto ver_str = std::string_view();
auto bld_str = std::string_view();
{
auto maybe_pkg_ver_bld = rparse_and_set_matchspec_attributes(out, str);
auto maybe_pkg_ver_bld = rparse_and_set_matchspec_attributes(out, raw_match_spec_str);
if (!maybe_pkg_ver_bld)
{
return parse_error(maybe_pkg_ver_bld.error().what());
Expand Down
60 changes: 60 additions & 0 deletions libmamba/tests/src/specs/test_match_spec.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,66 @@ TEST_SUITE("specs::match_spec")
CHECK_EQ(ms.str(), "xtensor>=0.12.3");
}

SUBCASE("python > 3.11")
{
auto ms = MatchSpec::parse("python > 3.11").value();
CHECK_EQ(ms.name().str(), "python");
CHECK_EQ(ms.version().str(), ">3.11");
CHECK(ms.build_string().is_explicitly_free());
CHECK(ms.build_number().is_explicitly_free());
CHECK_EQ(ms.str(), "python>3.11");
}

SUBCASE("numpy < 2.0")
{
auto ms = MatchSpec::parse("numpy< 2.0").value();
jjerphan marked this conversation as resolved.
Show resolved Hide resolved
CHECK_EQ(ms.name().str(), "numpy");
CHECK_EQ(ms.version().str(), "<2.0");
CHECK(ms.build_string().is_explicitly_free());
CHECK(ms.build_number().is_explicitly_free());
CHECK_EQ(ms.str(), "numpy<2.0");
}

SUBCASE("pytorch-cpu = 1.13.0")
{
auto ms = MatchSpec::parse("pytorch-cpu = 1.13.0").value();
CHECK_EQ(ms.name().str(), "pytorch-cpu");
CHECK_EQ(ms.version().str(), "=1.13.0");
CHECK(ms.build_string().is_explicitly_free());
CHECK(ms.build_number().is_explicitly_free());
CHECK_EQ(ms.str(), "pytorch-cpu=1.13.0");
}

SUBCASE("scipy >= 1.5.0, < 2.0.0")
{
auto ms = MatchSpec::parse("scipy >= 1.5.0, < 2.0.0").value();
CHECK_EQ(ms.name().str(), "scipy");
CHECK_EQ(ms.version().str(), ">=1.5.0,<2.0.0");
CHECK(ms.build_string().is_explicitly_free());
CHECK(ms.build_number().is_explicitly_free());
CHECK_EQ(ms.str(), "scipy[version=\">=1.5.0,<2.0.0\"]");
}

SUBCASE("scikit-learn >1.0.0")
{
auto ms = MatchSpec::parse("scikit-learn >1.0.0").value();
CHECK_EQ(ms.name().str(), "scikit-learn");
CHECK_EQ(ms.version().str(), ">1.0.0");
CHECK(ms.build_string().is_explicitly_free());
CHECK(ms.build_number().is_explicitly_free());
CHECK_EQ(ms.str(), "scikit-learn>1.0.0");
}

SUBCASE("kytea >=0.1.4, 0.2.0")
{
auto ms = MatchSpec::parse("kytea >=0.1.4, 0.2.0").value();
CHECK_EQ(ms.name().str(), "kytea");
CHECK_EQ(ms.version().str(), ">=0.1.4,==0.2.0");
CHECK(ms.build_string().is_explicitly_free());
CHECK(ms.build_number().is_explicitly_free());
CHECK_EQ(ms.str(), "kytea[version=\">=0.1.4,==0.2.0\"]");
}

SUBCASE("_libgcc_mutex 0.1 conda_forge")
{
auto ms = MatchSpec::parse("_libgcc_mutex 0.1 conda_forge").value();
Expand Down
8 changes: 8 additions & 0 deletions micromamba/tests/env-extra-white-space.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
channels:
- conda-forge
dependencies:
- python > 3.11
- numpy < 2.0
- pytorch-cpu = 1.13.0
- scipy >= 1.5.0, < 2.0.0
- scikit-learn >1.0.0
42 changes: 42 additions & 0 deletions micromamba/tests/test_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,7 @@ def test_env_update_pypi_with_conda_forge(tmp_home, tmp_root_prefix, tmp_path):
## See: https://github.com/mamba-org/mamba/issues/2059
pip_list_output = helpers.umamba_run("-p", env_prefix, "pip", "list", "--format=json")
pip_packages_list = yaml.safe_load(pip_list_output)

# When numpy 2.0.0 is installed using mamba,
# `numpy-2.0.0.dist-info/` is still created in `env_prefix/lib/pythonx.x/site-packages/` alongside `numpy-1.26.4.dist-info`
# therefore causing an unexpected result when listing the version.
Expand All @@ -370,3 +371,44 @@ def test_env_update_pypi_with_conda_forge(tmp_home, tmp_root_prefix, tmp_path):
pkg["name"] == "numpy" and Version(pkg["version"]) >= Version("1.26.4")
for pkg in pip_packages_list
)


@pytest.mark.parametrize("shared_pkgs_dirs", [True], indirect=True)
def test_env_create_whitespace(tmp_home, tmp_root_prefix, tmp_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this test, tests in test_match_spec are enough to make sure the parsing is done correctly.
Adding this one is redundant and is not relevant in my opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, I just wanted to have a non-regression test for the reported issue.

# Non-regression test for: https://github.com/mamba-org/mamba/issues/3453

env_prefix = tmp_path / "env-extra-white-space"

create_spec_file = tmp_path / "env-extra-white-space.yaml"

shutil.copyfile(__this_dir__ / "env-extra-white-space.yaml", create_spec_file)

res = helpers.run_env("create", "-p", env_prefix, "-f", create_spec_file, "-y", "--json")
assert res["success"]

# Check that the env was created
assert env_prefix.exists()
# Check that the env has the right packages
packages = helpers.umamba_list("-p", env_prefix, "--json")

assert any(
package["name"] == "python" and Version(package["version"]) > Version("3.11")
for package in packages
)
assert any(
package["name"] == "numpy" and Version(package["version"]) < Version("2.0")
for package in packages
)
assert any(
package["name"] == "pytorch-cpu" and Version(package["version"]) == Version("1.13.0")
for package in packages
)
assert any(
package["name"] == "scipy"
and Version("1.5.0") <= Version(package["version"]) < Version("2.0.0")
for package in packages
)
assert any(
package["name"] == "scikit-learn" and Version(package["version"]) > Version("1.0.0")
for package in packages
)
Loading