Vendor and incorporate a minimal edit distance algorithm for diffing lists #2295

t0yv0 · 2024-08-09T15:05:26Z

Build up on #2294 by improving array diffs to show minimal changes. This implies using a minimum edit distance over PropertyValue[] sequences. Consider alternatives here in either coding from scratch or finding a suitable Go library to vendor. Earlier I could not find a suitable generic Go implementation, although there are very good implementations for diffing text specifically, so wrote my own, that could be one candidate to code-review and vendor: https://github.com/t0yv0/godifft

Resolves #2239

…ge (#2405) This PR adds a new algorithm for calculating the detailed diff which acts on the pulumi property values for the SDKv2 bridge, comparing the planned state returned by `Diff` to the prior state. This is flagged under the existing `DiffEqualDecisionOverride` feature flag. The results look very promising so far - all the detailed diff integration tests pass and the issues previously reported are almost all fixed by this. ## Why ## The current detailed diff acts on the `InstanceDiff` structure which is internal to the plugin-sdk. This has a few shortcomings: - TF does not actually refer to this for the detailed diff, so it might point to diffs which are not present in TF. - It refers to TF attribute paths, which are tricky to translate back in some cases. - It does not compare the planned state with the prior state but compares the news vs olds - this misses properties added by TF planning. ## Implementation ## The new algorithm is under `pkg/tfbridge/detailed_diff.go` and used in `pkg/tfbridge/provider.go:Diff` only for the SDKv2 and only if the `DiffEqualDecisionOverride` is enabled. The main entrypoint is `makePulumiDetailedDiffV2` - which in turn calls `makePropDiff` on each property. That branches on the type of the property and we have a separate function responsible for calculating the detailed diff for each property type. There's a few interesting bits here: - We always walk the full tree even when comparing against a nil property and simplify the diff after in `simplifyDiff`. This is in order to get replaces correct. More on that later. - When returning the diff to the engine we only return the simplest possible diff which explains the changes. So instead of `prop: Update, prop.subprop: Add`, we only return `prop.subprop: Add`. This seems to work much better in the engine and goes around some weird behaviour in the detailed diff display (see #2234 and #2400). Moreover, the first can be inferred from the second, so there is no reason for the bridge to return the full tree if only a leaf was changed. - We can not correctly identify diffs between nil and empty collections because of how the TF SDKv2 works without additional work. This is studied in `TestNilVsEmptyListProperty` and `TestNilVsEmptyMapProperty` in `pkg/cross-tests/diff_cross_test.go`. This is probably fine for now and a full fix is not possible. We can make a partial fix for non-computed properties by inspecting the pulumi inputs, before the plan. - There's a bit of an edge case with Unknowns and Replaces - we might not have enough information to tell the user they'll get a replace because the property which causes the replaces is nested inside an unknown. There's not much to do here, except to choose which side to err on. The algorithm currently does not say there's a replace. ### On Replaces ### We do not short-circuit detailed diffs when comparing non-nil properties against nil ones. The reason for that is that a replace might be triggered by a `ForceNew` inside a nested property of a non-`ForceNew` property. We instead always walk the full tree even when comparing against a nil property. We then later do a simplification step for the detailed diff in `simplifyDiff` in order to reduce the diff to what the user expects to see. For example: This is a list of objects with two properties, one of which is `ForceNew` ``` schema = { "list_prop": { Type: List, Elem: { "prop": String "force_new_prop": StringWithForceNew } } } ``` We are diffing an unspecified list against a list with a single element ``` olds = {} news = { "list_prop": [ { "prop": "val", "force_new_prop" : "val" } ] ``` The user expects to see: ``` + list_prop ``` or because of how collections work in TF SDKv2 (see #2233) ``` + list_prop[0] ``` An element was added to the list. When calculating the detailed diff we can short-circuit the diff when comparing the two lists, as we can see they have different lengths. This would identify the correct element to be displayed to the user as the reason for the diff but would fail to identify the diff as a replace, since we never saw the `ForceNew` on the nested property `force_new_prop` of the list elements. That is why, instead of short-circuiting the diff, we walk the full tree down and compare every property against a nil if it is not specified on the other side. We then to a simplification pass over the detailed diff, which respects any replaces triggered by nested properties and bubbles these up. There is a full case study of the TF behaviour around replaces in `pkg/cross-tests/diff_cross_test.go` `TestAttributeCollectionForceNew`, `TestBlockCollectionForceNew`, `TestBlockCollectionElementForceNew`. ## Testing ## Unit tests for the detailed diff algorithm in `pkg/tfbridge/detailed_diff_test.go` - this tries to cover all meaningful permutations of schemas: - `TestBasicDetailedDiff` tests all the meaningful pairs between nil values, empty values, non-empty values and computed for each TF type. - `TestDetailedDiffObject`, `TestDetailedDiffList`, `TestDetailedDiffMap`, `TestDetailedDiffSet` covers the cases not covered above for object and collection types. - `TestDetailedDiffTFForceNewPlain`, `TestDetailedDiffTFForceNewAttributeCollection`, `TestDetailedDiffTFForceNewBlockCollection`, `TestDetailedDiffTFForceNewElemBlockCollection`, `TestDetailedDiffTFForceNewObject` cover `ForceNew` behaviour in all TF types. - `TestDetailedDiffPulumiSchemaOverride` covers pulumi schema overrides Integration tests in `pkg/tests/schema_pulumi_test.go`, mostly `TestDetailedDiffPlainTypes` and `TestUnknownBlocks`. Note that most of the edge cases and issues previously discovered here are resolved by this PR. ## Follow-up Work ## Not done but will follow-up in separate PRs: - Non-trivial set diffing - sets are currently diffed the same as lists, which has all the previous issues with set diffs. #2200 - Non-trivial list diffing - we can do something like #2295 here. Note that we still need to investigate how this interacts with ForceNew and how TF preserves or does not preserve list element identity. We likely need to respect that in order not to have confusing unexplained replaces caused by list element changes. ## Related Issues ## fixes: - fixes #2294 - fixes #2296 - fixes #1504 - fixes #1895 - fixes #2141 - fixes #2235 - fixes #2325 - fixes #2400 - fixes #2234 - fixes #2427 does not fix: - #2399 - we must either fix the saved state to not contain redundant nils or fix the display logic in the engine to ignore these. - #2233 - This works the same as TF and seems to be a limitation of the SDKv2.

VenelinMartinov · 2024-10-29T17:16:35Z

I believe we first need to exhaustively test the TF behaviour on lists here, as we need to match it. The reason for that is replaces - we should trigger a diff against an element only if TF does as that informs the replace/no replace decision.

We can in turn use replaces to reverse-engineer the TF behaviour

VenelinMartinov · 2024-10-29T17:21:36Z

@mjeffryes @iwahbe @t0yv0 should we prioritize this before PF accurate previews? I believe without this we could be showing wrong replace decisions in the detailed diff in certain cases

VenelinMartinov · 2024-10-29T17:22:19Z

We already have cross-tests for diff so it should be somewhat straight-forward to test list behaviour.

t0yv0 · 2024-10-29T20:20:15Z

It's on the bubble, I think PF support is slightly more important.

t0yv0 · 2024-11-12T16:24:28Z

Having some more thoughts here, this might be a little finicky to get exactly right. Vanilla minimal edit distance works well on exact equality when items are removed/added, but when items are deeply modified in-place it will mis-classify this change into a removed/added where an ideal diff would show an in-place deep update. Perhaps some heuristics here could be admissible.

VenelinMartinov · 2025-01-21T14:10:18Z

Did some digging in opentofu and found their code responsible for diffing lists: https://github.com/opentofu/opentofu/blob/cb2e9119aa75eeb8e1fa175e2b7a205a4fef129f/internal/command/jsonformat/collections/slice.go#L39

I believe we might need to match that in order to present the correct diff in the presence of replacements. For example:

Schema:
   prop: List(
      nested: String,
      other: String, Optional, ForceNew
   )

We have a schema with a list of objects. One of the properties of the nested object is ForceNew.

If we have the following change:

before = {prop: [
  {
    nested: val1
  },
  {
    nested: val2,
    other: other
  }
]}

after = {prop: [
  {
    nested: val2,
    other: other
  }
]}

It matters if we present this as:

(Current Pulumi behaviour)

0: Update
1: Delete

or
(Current Terraform behaviour)

0: Delete

In the first case that would be a replacement and the second one wouldn't.

We can use this technique to easily test and compare this behaviour. I'll write up some tests around this.

VenelinMartinov · 2025-01-21T14:31:21Z

They use this LongestCommonSubsequence algorithm to determine a sequence of unchanged values:
https://github.com/opentofu/opentofu/blob/cb2e9119aa75eeb8e1fa175e2b7a205a4fef129f/internal/plans/objchange/lcs.go#L38

They then mark all the values outside of this LCS which are in before but not after as "deleted" and all the values outside of the LCS which are in after but not before as "added".

Additionally, we already vendor the module this function is in, we can easily vendor this too.

VenelinMartinov · 2025-01-21T16:41:07Z

Seems like I was incorrect about my assumption about the Terraform behaviour. They only use the LongestCommonSubsequence method for List Attributes but not for List Blocks (lists of objects).

See https://github.com/pulumi/pulumi-terraform-bridge/blob/413419b267d84aa48ff87144ea954440741ba72b/pkg/tests/diff_test/testdata/TestSDKv2DetailedDiffList/list_attribute/long_list_added_front.golden for attributes
See
https://github.com/pulumi/pulumi-terraform-bridge/blob/413419b267d84aa48ff87144ea954440741ba72b/pkg/tests/diff_test/testdata/TestSDKv2DetailedDiffList/list_block/long_list_added_front.golden for the block behaviour.

Since blocks are unaffected, we can choose to present the diff in another way here instead of matching TF. This should never lead to incorrectly explained replacements.

VenelinMartinov · 2025-01-21T19:06:17Z

There's two alternative implementations here:

Detailed diff use longest common sequence for list attribute diffs #2862 uses the same algorithm which TF uses for list attributes.
Detailed diff use minimal edit distance for list attribute diffs #2863 uses a minimal edit distance algorithm from https://github.com/t0yv0/godifft for computing the diff.

The first implementation would match exactly what TF does and the second one should handle more complicated cases better.

t0yv0 · 2025-01-21T21:48:58Z

I'd like to understand a bit more detail on what happens to blocks in TF and whether we can do any better. Thanks.

t0yv0 changed the title ~~[2] Vendor and incorporate a minimal edit distance algorithm for diffing lists~~ Vendor and incorporate a minimal edit distance algorithm for diffing lists Aug 9, 2024

t0yv0 self-assigned this Aug 21, 2024

mjeffryes added the kind/task Work that's part of an ongoing epic label Aug 23, 2024

mjeffryes added this to the 0.110 milestone Sep 12, 2024

VenelinMartinov mentioned this issue Sep 26, 2024

Schema-aware pulumi-level detailed diff calculation in the SDKv2 bridge #2405

Merged

mjeffryes removed this from the 0.110 milestone Oct 2, 2024

iwahbe assigned VenelinMartinov and unassigned t0yv0 Oct 29, 2024

mjeffryes added this to the 0.112 milestone Oct 30, 2024

mjeffryes modified the milestones: 0.112, 0.113 Nov 13, 2024

VenelinMartinov mentioned this issue Nov 25, 2024

PF detailed diff #2629

Merged

mjeffryes modified the milestones: 0.113, 0.114 Dec 3, 2024

mjeffryes modified the milestones: 0.114, 0.115 Jan 17, 2025

This was referenced Jan 21, 2025

Detailed diff use minimal edit distance for list attribute diffs #2863

Merged

Detailed diff use longest common sequence for list attribute diffs #2862

Closed

VenelinMartinov closed this as completed in #2863 Jan 27, 2025

VenelinMartinov closed this as completed in 93de27b Jan 27, 2025

pulumi-bot added the resolution/fixed This issue was fixed label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vendor and incorporate a minimal edit distance algorithm for diffing lists #2295

Vendor and incorporate a minimal edit distance algorithm for diffing lists #2295

t0yv0 commented Aug 9, 2024 •

edited

Loading

VenelinMartinov commented Oct 29, 2024

VenelinMartinov commented Oct 29, 2024

VenelinMartinov commented Oct 29, 2024

t0yv0 commented Oct 29, 2024

t0yv0 commented Nov 12, 2024

VenelinMartinov commented Jan 21, 2025

VenelinMartinov commented Jan 21, 2025 •

edited

Loading

VenelinMartinov commented Jan 21, 2025 •

edited

Loading

VenelinMartinov commented Jan 21, 2025

t0yv0 commented Jan 21, 2025

Vendor and incorporate a minimal edit distance algorithm for diffing lists #2295

Vendor and incorporate a minimal edit distance algorithm for diffing lists #2295

Comments

t0yv0 commented Aug 9, 2024 • edited Loading

VenelinMartinov commented Oct 29, 2024

VenelinMartinov commented Oct 29, 2024

VenelinMartinov commented Oct 29, 2024

t0yv0 commented Oct 29, 2024

t0yv0 commented Nov 12, 2024

VenelinMartinov commented Jan 21, 2025

VenelinMartinov commented Jan 21, 2025 • edited Loading

VenelinMartinov commented Jan 21, 2025 • edited Loading

VenelinMartinov commented Jan 21, 2025

t0yv0 commented Jan 21, 2025

t0yv0 commented Aug 9, 2024 •

edited

Loading

VenelinMartinov commented Jan 21, 2025 •

edited

Loading

VenelinMartinov commented Jan 21, 2025 •

edited

Loading