Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folded block scalars with whitespace at the end causes problems #86

Open
braydonk opened this issue Jan 12, 2023 · 4 comments
Open

Folded block scalars with whitespace at the end causes problems #86

braydonk opened this issue Jan 12, 2023 · 4 comments
Labels
yaml_v3_problem A bug in the underlying yaml library. These issues are vastly harder to fix.

Comments

@braydonk
Copy link
Collaborator

braydonk commented Jan 12, 2023

While investigating #84 I realized that the yaml library parses weirdly when there is whitespace at the end of lines in a folded block scalar.

When scan_folded_as_literal: false, you get the original bug shown in issue #84.

When scan_folded_as_literal: true, you get the following with the same input:

Foobar:
  baz: "Lorem Ipsum is simply dummy text of the printing_and_typesetting industry.
    \n#magic___^_^___line\n"
Foobaz:
  baz: "foobar"

Will need to figure out why whitespace at the end of the line causes the library to think it's not printable.

@braydonk braydonk added this to the v0.8.0 milestone Jan 12, 2023
@braydonk
Copy link
Collaborator Author

This is probably going to be challenging, so I'm going to push it out compared to the other easy stuff I've got slated for v0.8.0

@braydonk
Copy link
Collaborator Author

Realized I never wrote the explanation for why this hasn't been resolved yet.

This is caused by the hack around the fact that yaml.v3 doesn't retain plain line-break information. yamlfmt will insert a magic string before being serialized into yaml.v3's node structure. Then after the new output is produced, we use the magic strings to put them back in. However, one place that newline information will actually be retained is in block scalars. So the magic line string being thrown on there messes with the serialization. I can't think of a way around this without somehow getting rid of the hack. I've tried in the past to fix this in yaml.v3 but I came up short. I've been wanting to build my own yaml parser instead to rid myself of yaml.v3 in general, and in that case this and many other fixes/features I've wanted to implement would be possible. Haven't had the time to invest to make that happen though.

@juliusl
Copy link

juliusl commented Jun 26, 2024

@braydonk naive question, but does yaml.v3 still not retain plain line-break information if the newLineStr is "\r\n" instead of "\n"?

@braydonk
Copy link
Collaborator Author

I am pretty sure it will not, though I'd be pleasantly surprised to be wrong cause that would be a glimmer of hope. Been a while since the last time I looked at it, but the problem iirc is that the AST doesn't maintain empty newline information. So in the manner that yamlfmt operates, serialize the yaml document into the yaml.Node representation from the library, and then feeding that into the library's emitter, there is no newline information in the AST to re-emit as the data has already been lost in the serialization. This is why the yamlfmt solution is to insert a magic string that is maintained throughout the process and then replaced after the new output is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
yaml_v3_problem A bug in the underlying yaml library. These issues are vastly harder to fix.
Projects
None yet
Development

No branches or pull requests

2 participants