Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly remove multiline trailing whitespace #103

Merged
merged 2 commits into from
Jul 27, 2022

Conversation

darichey
Copy link
Contributor

Summary & Motivation

Previously, trailing whitespace on the last line of a multiline string was always removed, but this is not necessarily correct. For example, the string ''hello '' should keep its trailing whitespace.

This PR attempts to mimic the logic used in the reference implementation of the parser to instead only remove the final line if it consists solely of whitespace.

src/value.rs Outdated
// Last index
remove_trailing(text);
match text.rfind('\n') {
Some(p) if text[p + 1..].chars().all(|c| c == ' ') => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not recognize tabs and such as whitespace, I assume this is intentional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, really I should say "spaces" and not "whitespace." If the last line has some other whitespace, like a tab character, it should not be removed. This matches the behavior of the reference implementation:

# test.nix
''
hello
	'' # this is a tab
$ nix-instantiate --parse test.nix
"hello\n\t"

@fogti fogti mentioned this pull request Jul 22, 2022
@Ma27
Copy link
Member

Ma27 commented Jul 23, 2022

If we do this, we should do it properly from the start, i.e. remove all whitespace prefixes. In fact I have started on a patch for that, but it needs some more polishing, expect a PR ~next week :)

@fogti
Copy link
Contributor

fogti commented Jul 23, 2022

don't we already do that? The only difference necessary is that the first and last line are handled differently (in them, if they contain only spaces ' ', they're omitted)

@darichey
Copy link
Contributor Author

Right, I believe that is already handled. That said, I also think that this entire logic could use a rework. I started by trying to just dumbly port the reference implementation in this commit: darichey@d56b15f

I confirmed that it at least parses all multiline strings in nixpkgs (see #107), but honestly the code is kind of ugly, and I haven't done any kind of benchmarks, so I didn't bother making a PR yet. Please feel free to use it as inspiration @Ma27 :)

@oberblastmeister oberblastmeister merged commit 226eb24 into nix-community:master Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants