Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong diff in merge view #7106

Open
cjayyy opened this issue Sep 9, 2024 · 10 comments
Open

Wrong diff in merge view #7106

cjayyy opened this issue Sep 9, 2024 · 10 comments

Comments

@cjayyy
Copy link

cjayyy commented Sep 9, 2024

Hi,

I ran into weird diff behavior when it shows change for non-changed line when it follows new line ending with comma. I tried to reproduce the issue in demo of underlying library (here) but with no success, therefore I think the problem might be with implementation to CM5.

Screen.Recording.2024-09-09.at.13.33.38.mov

Can you please confirm the bug and hopefully suggest fix?

Thanks!

@marijnh
Copy link
Member

marijnh commented Sep 9, 2024

Can you please confirm the bug

Without the inputs that lead to it, probably not.

@tomasfejfar
Copy link

It happened to me with the following:

Left:

CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    "Review_Count",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;

Right:

CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
    HAVING
        COUNT("Airline_Reviews"."id") > 1
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;

The Right line 27 with "Rating_StdDev", is incorrectly show as added.

@tomasfejfar
Copy link

image

var target = document.getElementById("view");
  target.innerHTML = "";
dv = CodeMirror.MergeView(target, {
    value: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    "Review_Count",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;`,
    origLeft: panes == 3 ? orig1 : null,
    orig: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
    HAVING
        COUNT("Airline_Reviews"."id") > 1
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;`,
    lineNumbers: true,
    mode: "text/html",
    highlightDifferences: highlight,
    connect: connect,
    collapseIdentical: collapse
  });

@marijnh
Copy link
Member

marijnh commented Sep 9, 2024

Oh, I see what's going on. If you look at the green underline, it matches the comma at the end of the line in the shorter document to a comma on the next line in the longer document (which is also a valid diff). There's multiple valid diffs for any given change like this, and this library doesn't really try to influence the diffing library to pick a particular one. As such, the change spans two lines when displayed, even though another diff could fit on a single line. I don't think this is going to change in this version of the library (@codemirror/merge for CM6 should behave better).

@tomasfejfar
Copy link

Oh, I see! So the root cause is "misleading highlighting" in our template where the actual diff is not highlighted, but only lines spanning the diff are highlighted. Thanks!

@cjayyy
Copy link
Author

cjayyy commented Sep 9, 2024

You are right. Thank you very much for the clarity!

@cjayyy cjayyy closed this as completed Sep 9, 2024
@cjayyy
Copy link
Author

cjayyy commented Sep 9, 2024

Well, in the end I think the issue is still somewhere else. When trying it in CM6 there is still the same problem. But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff. Please see the video - in the beginning the comma on line 25 is not highlighted, but after adding some change to lines 20-21, the comma on line 25 is highlighted now.

Screen.Recording.2024-09-09.at.17.05.05.mov

@cjayyy cjayyy reopened this Sep 9, 2024
@tomasfejfar
Copy link

tomasfejfar commented Sep 9, 2024

But what constitutes the diff is still the diff library responsibility, not he code editor's, isn't it?

@cjayyy
Copy link
Author

cjayyy commented Sep 9, 2024

I don't know as the diff library has lots of options which CodeMirror doesn't propagate. I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror.

marijnh added a commit to codemirror/merge that referenced this issue Sep 10, 2024
FIX: Improve the way `presentableDiff` aligns changes to line boundaries when possible.

Issue codemirror/codemirror5#7106
@marijnh
Copy link
Member

marijnh commented Sep 10, 2024

But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff.

Again, diffs are not deterministic based on input. There's often multiple correct diffs for a given set of documents.

But it seems the line boundary alignment in @codemirror/merge wasn't working as well as I thought it was. Attached patch should help with that.

I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror.

I don't think diff_match_patch gives any guarantees on where it locates changes relative to line breaks, so I'm not really sure what 'reproducing the issue in the diff library' would look like.

@codemirror codemirror deleted a comment from GIgako19929 Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants