Fix line ending bug #90

kekavc24 · 2024-07-01T13:17:19Z

I debugged #89 further while submitting #91 and noted several issues that are closely coupled:

Trailing line-break only applied to YamlList or YamlMap and never a YamlScalar
Dangling comments and line-breaks left behind if a comment spans multiple lines.

This PR:

Applies a line break after each YamlNode encoded within a block.
Fixes Literal and folded strings cannot handle "\n \n" at the end of a string #89.
Introduces a normalizeEncodeBlock function that dynamically prunes the additional line break if:
- The deepest YamlScalar isn't encoded as ScalarStyle.LITERAL or ScalarStyle.FOLDED. Also fixes Literal and folded strings cannot handle "\n \n" at the end of a string #89
- The old node didn't have a trailing line-break.
Introduces a skipAndExtractComments function which greedily skips any comments and whitespace belonging to the YamlNode being replaced.

Further changes made in #93

I’ve reviewed the contributor guide and applied the relevant portions to this PR.

Contribution guidelines:

See our contributor guide for general expectations for PRs.
Larger or significant changes should be discussed in an issue before creating a PR.
Contributions to our repos should follow the Dart style guide and use dart format.
Most changes should add an entry to the changelog and may need to rev the pubspec package version.
Changes to packages require corresponding tests.

Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

kekavc24 · 2024-07-04T17:21:33Z

@jonasfj ~~I have improved this PoC further by:~~

~~1. Lazily looking ahead for comments as that would bring the draft closer to the how YamlEditor currently does its edits.~~
2. Refactored the functions that edit YamlNode that are encoded as block elements to take advantage of the skipAndExtractComments which makes code a bit straight-forward and easy to understand (entirely subjective until you take a look) 😅

~~All in all, the existing and currently failing tests kinda vouch for this PoC but I'm ready to push its limit to prove its worth with stronger and more convoluted tests.~~

See #90, #93 & #94

jonasfj · 2024-07-04T21:45:29Z

I'm down for taking a look at this.
Currently, traveling and I fear this requires a bit of attention to understand. So I'll try to take a look next week.

I saw you had nice comments explaining some of the methods. That's awesome.

Feel free to give some review hints in the PR description :)

Also if any of these changes could be done independently, please do consider submitting smaller independent PRs. It's much easier to review, which often leads to faster turnaround time.

…iteral

kekavc24 · 2024-07-05T18:48:02Z

I've tried to split the 22 commits over 3 PRs based on the changes I made. #90 -> #93 -> #94.

lib/src/equality.dart

jonasfj · 2024-07-09T09:17:37Z

lib/src/utils.dart

+/// [YamlList] or a top-level [YamlScalar] of the [yaml] string provided.
+///
+/// [currentEndOffset] represents the end offset of [YamlScalar] or [YamlList]
+/// or [YamlMap] being replaced, that is, `end + 1`.


This is probably better illustrated with an example.

/// ```dart /// final yaml = 'my_key: "replaced-value"\n'; /// // ^--- currentEndOffset points to the newline /// fimal currentEndOffset = 24; /// ```

ofcourse, I might have counted wrong here 🤣

Also does this mean that currentEndOffset can be greater than the length of the yaml document? (currentEndOffset == yaml.length, so yaml[currentEndOffset] might throw)

Example: k: v will have currentEndOffset: 4 (if I'm replacing v, right).

Sorry, if I'm asking dumb questions here, but it might be worth making some examples.

And I think we should make sure we're doing it right, if we get he wrong invariants we'll have a lot of bugs to fix :D

Also does this mean that currentEndOffset can be greater than the length of the yaml document? (currentEndOffset == yaml.length, so yaml[currentEndOffset] might throw)

currentEndOffset shouldn't be greater than yaml.length when passed in as an argument. May be I should throw if it is?

Example: k: v will have currentEndOffset: 4 (if I'm replacing v, right).

I think 3 since it's always zero-based.

Sorry, if I'm asking dumb questions here, but it might be worth making some examples.

And I think we should make sure we're doing it right, if we get he wrong invariants we'll have a lot of bugs to fix :D

Sure, will add examples.

lib/src/utils.dart

lib/src/editor.dart

lib/src/utils.dart

jonasfj · 2024-07-09T10:06:00Z

lib/src/utils.dart

+/// Normalizes an encoded [YamlNode] encoded as a string by pruning any
+/// dangling line-breaks.
+///
+/// This function checks the last `YamlNode` of the [update] that is a
+/// `YamlScalar` and removes any unwanted line-break within the
+/// [updateAsString].
+///
+/// This is achieved by obtaining the chunk of the [yaml] that is after the
+/// current node being replaced using its [nodeToReplaceEndOffset]. If:
+///   1. The chunk has any trailing line-break then the it is left untouched.
+///   2. The node being replaced with [update] is not the last node, then it
+///      is left untouched.
+///   3. The terminal node in [update] is a `YamlScalar`, that is,
+///      the last [YamlNode] within the [update] that is not a collection.


Please consider adding some example.

Also why is it that we don't do this normalization when we create the updateAsString? Wouldn't that be more correct -- I don't know -- I probably don't understand all the context here, that's why I'm asking :D

Sure, I’ll add an example.

I do that because yamlEncodeBlock has no context of which method called it or the current structure of the yaml document. It could be:

updateInList or _appendToBlockList or _insertInBlockList for lists

_replaceInBlockMap or _addToBlockMap for block maps

Yaml.update for top level YamlNode.

Thus, the callers should make a call to the function to prune the trailing line break because they have an idea of the yaml's structure.

Additionally, I found it quite advantageous on our part to try and include line breaks for an existing YamlNode being replaced to edits. This reduces bugs and removes the need to try and guess where the next line break is.

It’s better to include it and check later if it was there just before suggesting an edit using a definite endOffset

jonasfj · 2024-10-22T17:22:34Z

Feel free to mark comments as "Resolved" once you've:

addressed the comment, or,
replied explaining why you don't want to do anything (not all comments need be addressed).

I think it would be useful to add some examples in the code. Especially for skipAndExtractCommentsInBlock and normalizeEncodeBlock.
I'm not sure I fully understand what they are even supposed to do. To be fair, maybe I just didn't look hard enough at it yet 🤣

> Update function doc and inline comments > Make function return named record > Prefer named parameters > Refactor existing code referencing function

kekavc24 · 2024-10-26T18:20:08Z

I've added commits I had moved to the other PRs. Without the changes, the skipAndExtractCommentsInBlock and normalizeEncodedBlock won't make sense. (see failing tests for bonus points)

The existing code was only scanning for the first instance of # and \n but yaml allow for comments spanning multiple lines. A key issue was with blocks lists in yaml since it scanned for -. But comments can also have -? Case in point:

# Existing code works great here for edits
- valueToRemove # Comment with hyphen (-)
- nextValue


# Existing code fails horribly here for edits

- valueToRemove # Comment with hyphen (-)
                     # but spanning (oops, hyphen "-")
                     #
                     # multiple lines (another hyphen "-")
- nextValue

skipAndExtractCommentsInBlock:

Scans ahead until it skips all comments whether it's a single line or spans multiple lines
Can do it lazily (remembers the first \n it saw while scanning ahead if no comments are present) but also greedily (skips both \n and/or comments)

I realized we can do it lazily when making updates within maps or lists i.e for inserts or splices etc. And greedily exclusively for removing elements in maps/lists.

normalizeEncodedBlock just helps us to avoid always falling back to double quotes even for string that can be styled as folded or literal

jonasfj · 2024-10-28T14:56:09Z

lib/src/utils.dart

+/// Returns the `endOffset` of the last comment extracted that is `end + 1`
+/// and a `List<String> comments`. It is recommended (but not necessary) that
+/// the caller checks the `endOffset` is still within the bounds of the [yaml].
+({int endOffset, List<String> comments}) skipAndExtractCommentsInBlock(


Could you give me a few examples of how this works.

Or maybe a few unit tests, even if we don't keep them, I'd like to understand a bit more what the intention here is.

It seems like it takes a string slice, and returns:

Comments from within the slice (not sure why)

Offset where the next non-comment thing in the slice begins, or ends? (or did I misunderstand that?)

It seems like it takes a string slice, and returns:

Comments from within the slice (not sure why)

Offset where the next non-comment thing in the slice begins, or ends? (or did I misunderstand that?)

Yes. It returns where the non-comment starts but with a nuanced approach based on the bool argument you pass into the greedy parameter.

Could you give me a few examples of how this works.

Yeah, sure. See the examples below.

- valueToRemove # Comment with hyphen (-) # but spanning (oops, hyphen "-") - nextValue

greedy as true returns the offset of the - indicating the start of the next value in the list.

greedy as false returns the offset of the last \n (line-break) + 1 which is just after the last closing bracket )

Or maybe a few unit tests, even if we don't keep them, I'd like to understand a bit more what the intention here is.

A nice example even without the unit tests would be an element in a list or map with multi-line comments being removed. For example:

Performing YamlEditor(yaml).remove([0]); on the string below:

- valueToRemove # Comment with hyphen (-) # but spanning (oops, hyphen "-") # # multiple lines (another hyphen "-") - nextValue # Expected output without this PR # but spanning (oops, hyphen "-") # # multiple lines (another hyphen "-") - nextValue # Expected output with PR - nextValue

The offending code can be found here which is similar to _removeFromBlockMap:

yaml_edit/lib/src/list_mutations.dart

Lines 343 to 374 in 35f4248

if (start > 0) {

final lastHyphen = yaml.lastIndexOf('-', start - 1);

final lastNewLine = yaml.lastIndexOf('\n', start - 1);

if (lastHyphen > lastNewLine) {

start = lastHyphen + 2;

/// If there is a `-` before the node, we need to check if we have

/// to update the indentation of the next node.

if (index < list.length - 1) {

/// Since [end] is currently set to the next new line after the current

/// node, check if we see a possible comment first, or a hyphen first.

/// Note that no actual content can appear here.

///

/// We check this way because the start of a span in a block list is

/// the start of its value, and checking from the back leaves us

/// easily confused if there are comments that have dashes in them.

final nextHash = yaml.indexOf('#', end);

final nextHyphen = yaml.indexOf('-', end);

final nextNewLine = yaml.indexOf('\n', end);

/// If [end] is on the same line as the hyphen of the next node

if ((nextHash == -1 || nextHyphen < nextHash) &&

nextHyphen < nextNewLine) {

end = nextHyphen;

}

}

} else if (lastNewLine > lastHyphen) {

start = lastNewLine + 1;

}

}

return SourceEdit(start, end - start, '');

Disclaimer: These thoughts are not fully conceived, I'm still trying to understand.

So if the input for YamlEditor(yaml).remove([0]) is:

- valueToRemove # Comment with hyphen (-) # but spanning (oops, hyphen "-") # # multiple lines (another hyphen "-") # nextValue is awesome: - nextValue

Then with this PR we'd get:

- nextValue

Right?

I'm just highlighting this example, because detecting that a comment is spanning multiple lines is difficult.

It's kind of obvious to us humans that the comment # nextValue is awesome: is associated with - nextValue and that all the comments with equal indentation are related to - valueToRemove.

But if we tried to make a heuristic to detect this, such heuristic would fail if comments aren't perfectly aligned 🙈 I can see no end of corner cases 🤣

So maybe we have to ask, when faced with YamlEditor(yaml).remove([0]) on:

# A # B - valueToRemove # C # D # E # F # G - nextValue # H # I

What comments do we wish to remove / preserve?

At the top of my head, I'm thinking:

H and I should definition be preserved.

Removing C seems reasonable.

Do we want to treat A and B the same way?

Would we want to detect that A and B are different based on indentation?

Do we want to treat D, E, F and G the same way?

Would we want to detect that D and E are not the same based on indentation?

Would we want to detect that E and F are not the same based on empty line?

Would we want to detect that F and G are not the same based on indentation?

I'm suspecting that maybe it's simplest to say that:

A and B are treated the same way: we preserve them.

D, E, F and G are treated the same way: we preserve them.

It could be that it's best to say retain comments, unless they are inside the value we're removing (or on the same line). We risk leaving some half gabled comments, but which is worse:

Removing too many comments?

Retaining half a comment related to a value that's been removed?

If we make heuristics for all this to be more accurate, don't we risk things becoming extremely complicated? 🤣

Right?

Yes.

Funnily enough, I totally understand your point of view 😄. That’s how my head was spinning before I wrote skipAndExtractComments.

However, I reduced this PR’s scope to actually do what YamlEditor currently WANTS to do while ensuring we can have folded/literal strings. This is still a PoC.

From the PR, you can see I didn’t change how YamlEditor works just nudged it to do what existing code wants to do.

Personally, I don’t see it as complexity but rather as a challenge to live up to the package’s description the RIGHT WAY since if we provide a way to preserve comments, wouldn’t it be great to at least provide a subjective way to remove them?

If you are open to it, I can come up with an issue by the end of the week with my view and where we can both float ideas on how to tackle this and once we settle on something I will submit a PR. If not, then please indicate what we can keep from this PR and what not to keep.

If not, then please indicate what we can keep from this PR and what not to keep.

I'll have to admit that I'm not entirely sure what this PR does 🙈

Also that we have things like skipAndExtractCommentsInBlock but we only use endOffset from it, scares me a bit. It might actually makes sense, if I understood what the future plans for skipAndExtractCommentsInBlock was 🤣

If you are open to it, I can come up with an issue by the end of the week with my view and where we can both float ideas on how to tackle this and once we settle on something I will submit a PR.

I think that's a good idea!

Let's outline:

What problems we're trying to solve.

What the limitations to our approach will be.

What will the downsides be? (if we're doing heuristics, there will be scenarios where it works less than perfect, what do those scenarios look like).

once we settle on something I will submit a PR.

Small PRs are MUCH easier to land!

If we can't do it all in one PR, maybe we should create an intermediate branch, try to land small PRs there and then eventually merge the branch into main.

There isn't a whole lot of development on main, so it's not like there'll be a lot of merge conflicts 🤣 (benefits of working in a small repository).

Of course, there is a non-trival risk that we get stuck somewhere along the way don't land anything.

But if we try this route we could perhaps consider writing a test cases upfront.
Maybe we could extend the setup in test/testdata to support more kinds of tests, and maybe also support tests that are known to fail now, but that we want to fix.

Sometimes it's a lot easier to discuss things through examples.

Personally, I don’t see it as complexity but rather as a challenge to live up to the package’s description the RIGHT WAY since if we provide a way to preserve comments, wouldn’t it be great to at least provide a subjective way to remove them?

Yes and no. Also scratching my head and wondering how ambitious a package description we wrote 🙈

Some background, this package started as part of GSoC, the original aim was to enable us to create a dart pub add <package> command.

Since then it's also be used for dart pub upgrade --major-versions, dart pub upgrade --tighten, and a few other things left and right.

We needed some way to preserve comments, because while JSON is a subset of YAML, I don't think anyone would like to use dart pub add if it parsed your pubspec.yaml added the package and then saved pubspec.yaml formatted as JSON. As people frequently have comments in their pubspec.yaml.

But we also don't really care to be perfectionist. Ideally, we just want to not cause comments and whitespace from parts of the YAML file unrelated to the modification we're making from being changed. But if a comment next to the thing we're changing becomes garbled or whitespace changes a bit, then perhaps that's okay'ish.

In theory I wouldn't object to this package having support for reading, stripping, modifying and removing comments. In theory I also wouldn't mind this package using heuristics to guess whether a comment belongs to a value being removed or not.

In practice this package is used to make small changes to YAML configuration files. If we don't limit the scope of the package, then it'll have an endless stream of bugs and feature requests.

It might be better declare that some behavior or feature is out of scope, than it is to have a buggy implementation. It's not like we have lot resources to invest in the maintenance of this package.

IMO, we should aim for (in order of priority):

(1) Reliably produce:

Valid YAML

A semantically correct mutation
(_performEdit ensures this, by parsing and comparing the result, throwing if there are errors)

(2) Avoid internal errors

Bugs

Inconsistencies caught by _performEdit

(3) Stable API and predictable behavior.

(4) Preserve comments and whitespace in parts of the YAML document unrelated to the mutation.

(5) Reasonable heuristics for preservation or removal of comments and whitespace in part of the YAML document affected by the mutation.

I think that when adding features or tweaking (5) we should consider what implication it might have for maintainability and remember that the objective is to modify small configuration files.

I'll have to admit that I'm not entirely sure what this PR does 🙈

We can now encode strings with a trailing line-break as ScalarStyle.FOLDED or ScalarStyle.LITERAL instead of falling back to double quotes. See:

Literal and folded strings cannot handle "\n \n" at the end of a string #89

Fix fold literal encoding with trailing line break #91

Because we can do 1 above, it comes with a cost. The edit we suggest may have a line-break already since we ensure we preserve any dangling line-breaks after calling normalizeEncodedBlock which:

Preserves line-breaks for ScalarStyle.FOLDED or ScalarStyle.LITERAL or if the value of the YamlScalar itself has a line-break

Checks if the node being replaced had a line-break and preserves it in the new encoded block

The cost is to ensure we include any existing line-break in the edit without trying to introduce any breaking behavior to the current YamlEditor functionality.

From my previous comment, YamlEditor CURRENTLY wants to get rid of inline comments in the terminal node but does it poorly causing issues with edits with some of the tests actually enabling the bug. (See some of the failing tests when this PR is run. I’ll highlight them on request)

Thus, why I added skipAndExtractComments which skips the inline comments if that’s what YamlEditor wanted to do.

I must admit, maybe the function is a bit overzealous and returns the comments it skipped but I thought why not.

Also, this PR was just a PoC. Nothing more than that.

@kekavc24 how about you start an issue outline what we want to do.
And we try to reach consensus on that first.

I think perhaps it's best to try write tests and make small changes (many small PRs are infinitely better, than large ones). It's more work to split logic into separate PRs. But the challenge is also getting there without introducing bugs (actually, it's getting there while convincing maintainers of this package that we're not introducing new bugs).

kekavc24 added 10 commits June 30, 2024 17:06

Add function to skip and/or extract comments

7501cb4

Add function to normalize trailing line breaks in encode block

647329c

Return index getting key node in map

502fa0a

Apply line-break after each encoded yaml block

c3056c8

Encode folded/literal strings based on c3056c8

e610993

Skip comments and include \n in map mutations

e659cb9

Skip comments and remove additional \n added in list mutations

53f9637

Normalize top level edits

edd8d38

Remove defensive encoding function after fix in e659cb9 and 53f9637

f5a259b

Run dart format

5a51cbe

kekavc24 mentioned this pull request Jul 1, 2024

Literal and folded strings cannot handle "\n \n" at the end of a string #89

Open

kekavc24 added 2 commits July 2, 2024 14:50

Skip comments for top-level edits

c7aec85

Lazily look ahead for comments

157063a

kekavc24 added 10 commits July 5, 2024 10:30

Refactor function to normalize encoded blocks

f7fe2d3

Prevent pruning in YamlScalar with ScalarStyles plain, any, folded, l…

9101d79

…iteral

Allow comments to be skipped greedily or lazily

3d99caf

Ensure _appendToBlockList appends after last comment

4034652

Fix issue where loop never exits

7ada07e

Avoid skipping line break eagerly when extracting comments

ee6a29e

Add utility method to reclaim indent skipped

422731d

Refactor _removeFromBlockList to correctly skip comments

f3a265e

Refactor _removeFromBlockMap to correctly skip comments

814672b

Use span length to determine true state of null

7378e92

kekavc24 force-pushed the fix-line-ending-bug branch from 2555b32 to c7aec85 Compare July 5, 2024 09:35

kekavc24 marked this pull request as ready for review July 5, 2024 10:07

This was referenced Jul 5, 2024

Refactor comment skipping #93

Closed

Refactor list and map mutations #94

Closed