Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydra composition writes scientific float without decimal point which is parsed by YAML as str #8466

Closed
aschuh-hf opened this issue Oct 24, 2022 · 6 comments

Comments

@aschuh-hf
Copy link

aschuh-hf commented Oct 24, 2022

Bug Report

Description

When using the Hydra Composition, float numbers in the input config files in scientific format, e.g., 1.0e-5 are written to the generated params.yaml file as 1e-5. The latter is parsed as a str instead of a float. The decimal point is required.

Reproduce

conf/config.yaml

number: 1.0e-5

test.py

from pathlib import Path
import yaml

text = Path("params.yaml").read_text()
config = yaml.load(text, Loader=yaml.SafeLoader)
assert isinstance(config["number"], float)

dvc.yaml

stages:
  test:
    cmd: python test.py
dvc exp run test

Generated params.yaml:

number: 1e-5

Expected

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.30.0 (rpm)
---------------------------------
Platform: Python 3.8.3 on Linux-3.10.0-1160.15.2.el7.x86_64-x86_64-with-glibc2.14
Subprojects:

Supports:
        azure (adlfs = None, knack = 0.10.0, azure-identity = None),
        gdrive (pydrive2 = 1.14.0),
        gs (gcsfs = None),
        hdfs (fsspec = None, pyarrow = 9.0.0),
        http (aiohttp = None, aiohttp-retry = 2.8.3),
        https (aiohttp = None, aiohttp-retry = 2.8.3),
        oss (ossfs = 2021.8.0),
        s3 (s3fs = None, boto3 = None),
        ssh (sshfs = 2022.6.0),
        webdav (webdav4 = 0.9.7),
        webdavs (webdav4 = 0.9.7),
        webhdfs (fsspec = None)
Cache types: hardlink, symlink
Cache directory: xfs on /dev/md124
Caches: local, s3
Remotes: s3, s3
Workspace directory: xfs on /dev/md124
Repo: dvc (subdir), git

Additional Information (if any):

@daavoo
Copy link
Contributor

daavoo commented Oct 25, 2022

Hi @aschuh-hf ! Thanks for the detailed report.

This is kind of expected.

DVC requires YAML 1.2 (https://dvc.org/doc/command-reference/params#description) and I believe you are using PyYAML which only supports 1.1 and causes the issue.

If you change the code to use dvc.api.params_show (or any other library supporting YAML 1.2) it should fix the error:

import dvc.api

config = dvc.api.params_show()
assert isinstance(config["number"], float)

The above code uses ruaml.yaml internally.

Closing the issue as "won't fix", as it is expected behavior, but don't hesitate to comment

@daavoo daavoo closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2022
@dberenbaum
Copy link
Collaborator

I think it's a duplicate of #5971. @aschuh-hf If you want to comment, it might have more visibility there.

@aschuh-hf
Copy link
Author

aschuh-hf commented Oct 25, 2022

Thanks, indeed this is a duplicate. I would just add that I've installed DVC from the official YUM repo, and thus would expect that DVC has thus been packaged with an incompatible YAML version?

In my own code, I am using pyyaml =6.0 from conda-forge. But as noted, DVC was installed in Docker as follows:

RUN wget https://dvc.org/rpm/dvc.repo -O /etc/yum.repos.d/dvc.repo \
    && rpm --import https://dvc.org/rpm/iterative.asc \
    && yum update -y \
    && yum install -y dvc-2.28.0-1 \
    && yum clean all \
    && rm -rf /var/cache/yum

@dberenbaum
Copy link
Collaborator

@aschuh-hf The problem is that pyyaml is still stuck on YAML 1.1 (see yaml/pyyaml#116), which is why DVC uses ruamel.yaml. So if you use pyyaml and scientific notation, you are likely to hit this problem.

@aschuh-hf
Copy link
Author

But when I install DVC from YUM repo, does it not use its own virtual environment / libraries in /usr/lib/dvc?

@aschuh-hf
Copy link
Author

OK, I think I understand now. The conclusion is that the params.yaml you produce requires a YAML 1.2 compatible reader. And PyYAML is not, which is what I use in my code currently. Switching to ruaml.yaml will fix it. Got it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants