Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HexEditor diff #522

Merged
merged 18 commits into from
Aug 10, 2024
Merged

HexEditor diff #522

merged 18 commits into from
Aug 10, 2024

Conversation

tomilho
Copy link
Contributor

@tomilho tomilho commented May 16, 2024

MVP to receive some early feedback on the diff process. (Closes #47 )

As I was unsure if it was possible to do this since there isn't any explicit API to create custom diff, I rushed most components to figure out the diff process. There is still plenty of work to do, but hopefully this shows how the diff can look.

Simple Diff Algorithm

diffv1

Improved Diff Alg

image

There are some issues due to VSCode not supporting custom diff editors (I think), below is a list.

(Unfixable?) Issues

  • Switching Tabs - vscode.window.TabGroup does not have a class type for custom editor's diff, so .input is undefined.
  • No Status Bar Item display
  • Keybinds do not work - Insert does not work

Applies custom css styles on data cells by specifying a decorator style
and its offset range.
Opens two selected files into hex diff view. Works with any file
extension by changing the file scheme.
Introduces the O(d^2) space version and its decorators. With this
diff algorithm, the diffs should be more insightful compared to
using a simple per-offset comparison.
Inserted bytes are show in the original file by a stripe data cell.
@tomilho
Copy link
Contributor Author

tomilho commented Jul 3, 2024

I changed the diff algorithm, lmk what you think.

Finds the lower and upper bound of the decorators that fit in the
page, instead of using .filter(). In addition to that, it also
removes the need to re-initialize the decorator's ranges.
@connor4312
Copy link
Member

Thanks for the PR, looks super cool! I'll give this a look this week.

You're right in that there's no "proper" support for custom diff editors in VS Code. That's something I would like to get done in the future, but this is a nice start and should be reusable if/when support arrives.

@lramos15
Copy link
Member

lramos15 commented Jul 8, 2024

This is really neat. I've added it to my review queue when I get some free time.

cc @mjbvz if you have any feedback on the best way to do custom editor diffs since it's not very common there are not many examples of this

@connor4312
Copy link
Member

connor4312 commented Jul 8, 2024

also, regarding diff algorithms, I wonder if we can use existing implementations. I'm sure there are good libraries out there, though maybe ones that would need to be compiled to wasm. Doesn't need to happen in this PR.

@tomilho
Copy link
Contributor Author

tomilho commented Jul 9, 2024

Thank you for the comments!! I will for sure look into some libraries since I am not happy at all with my latest version (comparing 2MB files results in a crash). But please don't feel pressured to prioritize this as it is a WIP.

@tomilho
Copy link
Contributor Author

tomilho commented Jul 19, 2024

Regarding diff algorithms, I really liked the wfa2-lib diff used in biodiff. However, it does not work properly in wasm32 as it is coded to work for 64bit but since wasm64 is right around the corner, I think we can use diff in the meantime. And when wasm64 is released, I can bring wfa2 into the project.

Let me know if this is okay with you.

@connor4312
Copy link
Member

connor4312 commented Jul 19, 2024

Another thing we could do is make the hex editor a platform-specific extension and ship binaries that are subprocessed into on supported platforms (64 bit platforms minus web, presumably.) This is not super hard to do from an engineering perspective since Python recently paved the way with its native locator. I can try and carve out some time next iteration to get that in if you think it makes sense :)

We could use the JS diff library on platforms where that's not supported. We may want to do that on a worker_thread though to avoid blocking the extension host on very large diffs.

If you want to get that going in this PR I can follow up next month with a wfa2/biodiff version.

@tomilho
Copy link
Contributor Author

tomilho commented Jul 19, 2024

I had no idea vscode extensions could ship with binaries! I really like the idea to be honest, especially when wasm64 may still take some time even though there is only one task left. I will start with the JS diff and worker_thread. Let me know if I can help you with the wfa2 stuff.

Done to simplify diff, because recomputing diffs after a file change
is too complex to tackle at the moment.
Although still O(d^2), it improves memory usage in other ways,
provides some optimization for edge cases and has some useful
built-in options to exit on large diffs.
@tomilho
Copy link
Contributor Author

tomilho commented Jul 24, 2024

I've added the worker and the JS Diff. I did the worker such that it only works when running web vscode, but I wasn't entirely sure how to structure it, see these lines.

@connor4312
Copy link
Member

You can also do it in Node.js using the worker_threads module which implements a very similar interface to the webworker :)

@connor4312
Copy link
Member

Cool, thanks! I'll give this a more thorough review soon.

I also put in an issue for biodiff for avenues we could explore to leverage it here: 8051Enthusiast/biodiff#20

@tomilho
Copy link
Contributor Author

tomilho commented Jul 25, 2024

Thank you! Let me know how I can help

@tomilho tomilho marked this pull request as ready for review July 25, 2024 22:07
Copy link
Member

@connor4312 connor4312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work as usual, some comments :)

media/editor/dataDisplay.tsx Outdated Show resolved Hide resolved
src/extension.ts Outdated
Comment on lines 116 to 146
let worker: Worker;
const workerFilePath = vscode.Uri.joinPath(
context.extensionUri,
"dist",
"diffWorker.js",
).toString();

try {
worker = new Worker(workerFilePath);
} catch {
// eslint-disable-next-line @typescript-eslint/no-var-requires
const { Worker } = require("worker_threads") as typeof import("worker_threads");
const nodeWorker = new Worker(new URL(workerFilePath));
// Web and node js have different worker interfaces, so we share a function
// to initialize both workers the same way.
const ref = nodeWorker.addListener;
(nodeWorker as any).addEventListener = ref;
worker = nodeWorker as any;
}

const workerMessageHandler = new MessageHandler<ToDiffWorkerMessage, FromDiffWorkerMessage>(
// Always return undefined as the diff worker
// does not request anything from extension host
async () => undefined,
message => worker.postMessage(message),
);

worker.addEventListener("message", e =>
// e.data is used in web worker and e is used in node js worker
e.data ? workerMessageHandler.handleMessage(e.data) : workerMessageHandler.handleMessage(e as any),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code could be moved into its own file and the Worker instantiated lazily rather than at activation time. Most usages of the hex editor won't be using it for diff operations, so this may be wasted work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done both, but I still kept the dispose to be done when the extension is deactivated since I am not entirely sure when to dispose it. Should I keep it like this?

src/extension.ts Outdated Show resolved Hide resolved
src/extension.ts Outdated Show resolved Hide resolved
shared/hexDiffModel.ts Outdated Show resolved Hide resolved
@@ -68,6 +70,20 @@ export class HexEditorRegistry extends Disposable {
};
}

/** adds a diff model */
public addDiff(diffModelBuilder: HexDiffModelBuilder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lifecycle of this is kind of weird and I'm not sure all cases are handled, such as reopening an editor without re-running the "compare selected". It seems like the main point of this is sharing the model builder between the two URIs.

Instead I would suggest stashing some query parameters when you build the URIs like ?token={uuid}&side={modified|original} having the getDiff be something like

getDiff(uri: vscode.Uri): { builder: HexDiffModelBuilder, dispose: () => void } {
    const { token, side } = parseQueryStringFrom(uri);
    if (!this.diffsBuilder.get(token)) {
        this.diffsBuilder.set(token, { refCount: 0, value: new DiffsBuilder() });
    }
    const builder = this.diffsBuilder.get(token);
    builder.refCount++;
    builder.value.set(side, uri);

    return { builder: builder.value, dispose: () => { /* refCount--, remove if 0 */ }}
}

...and have HexDiffModelBuilder.build() await both sides being set similarly to how it works for the documents today. That approach should allow things to work and the editor to be reopened/moved/duplicated without issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great suggestion, it made the lifecycle a lot simpler! I did some small tweaks, but overall it should be the same.

@tomilho
Copy link
Contributor Author

tomilho commented Aug 4, 2024

Thank you for the review and sorry for the late changes.

Copy link
Member

@connor4312 connor4312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, nice work!

@connor4312 connor4312 enabled auto-merge (squash) August 9, 2024 23:04
@vs-code-engineering vs-code-engineering bot added this to the August 2024 milestone Aug 9, 2024
@connor4312 connor4312 merged commit dce2189 into microsoft:main Aug 10, 2024
3 checks passed
@tomilho
Copy link
Contributor Author

tomilho commented Aug 10, 2024

Thank you for the opportunity to work on this and taking your time to review everything! I learnt so much about diffing and vscode.

@tomilho tomilho deleted the fr-diff branch August 12, 2024 21:57
@heartacker
Copy link

is this release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hex diff - Hex compare
5 participants