Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching from PyYAML to ruamel.yaml #47

Open
kenloen opened this issue Mar 5, 2024 · 2 comments
Open

Switching from PyYAML to ruamel.yaml #47

kenloen opened this issue Mar 5, 2024 · 2 comments

Comments

@kenloen
Copy link
Collaborator

kenloen commented Mar 5, 2024

The current implementation is using PyYAML, which is is no longer maintained (although it has recently had some activity, but it still seems that it do not have a development group).
The PyYAML implementation is wrapping a C implementation and hence it can lead to issues when installing, which I have seen for some Windows users.
I would suggest switching to ruamel.yaml instead. It is a pure python version and hence do not suffer from the same installation issue. As an added benefit it can also do round-trip conversion which means it will preserve comments and white space. It can also use the PyYAML parser that is faster by changing the parser type (e.g. typ="safe").

As the implementation aims to make a default parser I would suggest the following for the yaml settings for the yaml object:

from ruamel.yaml import YAML
yaml_obj = YAML()

yaml_obj.default_flow_style = False
yaml_obj.width = 1e6
yaml_obj.indent(mapping=4, sequence=6, offset=3)
yaml_obj.allow_unicode = False

# Convert numpy types to build in data types
yaml_obj.Representer.add_representer(
    np.str_, lambda dumper, data: dumper.represent_str(str(data))
)
yaml_obj.Representer.add_representer(
    np.float64, lambda dumper, data: dumper.represent_float(float(data))
)
yaml_obj.Representer.add_representer(
    np.float32, lambda dumper, data: dumper.represent_float(float(data))
)

# Write 1D list with flow-style
def list_rep(dumper, data):
    if isinstance(data[0], (list, dict)):
        return dumper.represent_sequence(
            "tag:yaml.org,2002:seq", data, flow_style=False
        )
    return dumper.represent_sequence(
        "tag:yaml.org,2002:seq", data, flow_style=True
    )
yaml_obj.Representer.add_representer(list, list_rep)

# Convert numpy arrays to lists
def ndarray_rep(dumper, data):
    return list_rep(dumper, data.tolist())
yaml_obj.Representer.add_representer(np.ndarray, ndarray_rep)
@gbarter
Copy link
Contributor

gbarter commented Mar 5, 2024

Love all of the ideas you have been posting here, @kenloen . For this one and others like it, I would suggest just moving straight to a PR as I don't imagine anyone would object.

@kenloen
Copy link
Collaborator Author

kenloen commented Mar 5, 2024

Great to hear the you appreciate my effort here.

I know that some of the issues that I have posted might seem small, and that creating these issues might take the same amount of time as just creating a PR and doing it my self. But as always with code development, the time that one expect it will take is always much more than what it ends up taking.
Furthermore, the purpose of all these issues is to map all the things that should be added/changed so that we can priotize which things are more urgent.

I am sorry if this comes across as me trying to not do it my self. My aim was to first map some of the issues and I hope that would motivate others to also add issues for some of the things they would like to add or change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants