-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HexEditor diff #522
HexEditor diff #522
Conversation
Applies custom css styles on data cells by specifying a decorator style and its offset range.
Opens two selected files into hex diff view. Works with any file extension by changing the file scheme.
Introduces the O(d^2) space version and its decorators. With this diff algorithm, the diffs should be more insightful compared to using a simple per-offset comparison.
Inserted bytes are show in the original file by a stripe data cell.
I changed the diff algorithm, lmk what you think. |
Finds the lower and upper bound of the decorators that fit in the page, instead of using .filter(). In addition to that, it also removes the need to re-initialize the decorator's ranges.
Thanks for the PR, looks super cool! I'll give this a look this week. You're right in that there's no "proper" support for custom diff editors in VS Code. That's something I would like to get done in the future, but this is a nice start and should be reusable if/when support arrives. |
This is really neat. I've added it to my review queue when I get some free time. cc @mjbvz if you have any feedback on the best way to do custom editor diffs since it's not very common there are not many examples of this |
also, regarding diff algorithms, I wonder if we can use existing implementations. I'm sure there are good libraries out there, though maybe ones that would need to be compiled to wasm. Doesn't need to happen in this PR. |
Thank you for the comments!! I will for sure look into some libraries since I am not happy at all with my latest version (comparing 2MB files results in a crash). But please don't feel pressured to prioritize this as it is a WIP. |
Regarding diff algorithms, I really liked the wfa2-lib diff used in biodiff. However, it does not work properly in wasm32 as it is coded to work for 64bit but since wasm64 is right around the corner, I think we can use diff in the meantime. And when wasm64 is released, I can bring wfa2 into the project. Let me know if this is okay with you. |
Another thing we could do is make the hex editor a platform-specific extension and ship binaries that are subprocessed into on supported platforms (64 bit platforms minus web, presumably.) This is not super hard to do from an engineering perspective since Python recently paved the way with its native locator. I can try and carve out some time next iteration to get that in if you think it makes sense :) We could use the JS diff library on platforms where that's not supported. We may want to do that on a worker_thread though to avoid blocking the extension host on very large diffs. If you want to get that going in this PR I can follow up next month with a wfa2/biodiff version. |
I had no idea vscode extensions could ship with binaries! I really like the idea to be honest, especially when wasm64 may still take some time even though there is only one task left. I will start with the JS diff and worker_thread. Let me know if I can help you with the wfa2 stuff. |
Done to simplify diff, because recomputing diffs after a file change is too complex to tackle at the moment.
Although still O(d^2), it improves memory usage in other ways, provides some optimization for edge cases and has some useful built-in options to exit on large diffs.
I've added the worker and the JS Diff. I did the worker such that it only works when running web vscode, but I wasn't entirely sure how to structure it, see these lines. |
You can also do it in Node.js using the worker_threads module which implements a very similar interface to the webworker :) |
Cool, thanks! I'll give this a more thorough review soon. I also put in an issue for biodiff for avenues we could explore to leverage it here: 8051Enthusiast/biodiff#20 |
Thank you! Let me know how I can help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work as usual, some comments :)
src/extension.ts
Outdated
let worker: Worker; | ||
const workerFilePath = vscode.Uri.joinPath( | ||
context.extensionUri, | ||
"dist", | ||
"diffWorker.js", | ||
).toString(); | ||
|
||
try { | ||
worker = new Worker(workerFilePath); | ||
} catch { | ||
// eslint-disable-next-line @typescript-eslint/no-var-requires | ||
const { Worker } = require("worker_threads") as typeof import("worker_threads"); | ||
const nodeWorker = new Worker(new URL(workerFilePath)); | ||
// Web and node js have different worker interfaces, so we share a function | ||
// to initialize both workers the same way. | ||
const ref = nodeWorker.addListener; | ||
(nodeWorker as any).addEventListener = ref; | ||
worker = nodeWorker as any; | ||
} | ||
|
||
const workerMessageHandler = new MessageHandler<ToDiffWorkerMessage, FromDiffWorkerMessage>( | ||
// Always return undefined as the diff worker | ||
// does not request anything from extension host | ||
async () => undefined, | ||
message => worker.postMessage(message), | ||
); | ||
|
||
worker.addEventListener("message", e => | ||
// e.data is used in web worker and e is used in node js worker | ||
e.data ? workerMessageHandler.handleMessage(e.data) : workerMessageHandler.handleMessage(e as any), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code could be moved into its own file and the Worker instantiated lazily rather than at activation time. Most usages of the hex editor won't be using it for diff operations, so this may be wasted work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done both, but I still kept the dispose to be done when the extension is deactivated since I am not entirely sure when to dispose it. Should I keep it like this?
src/hexEditorRegistry.ts
Outdated
@@ -68,6 +70,20 @@ export class HexEditorRegistry extends Disposable { | |||
}; | |||
} | |||
|
|||
/** adds a diff model */ | |||
public addDiff(diffModelBuilder: HexDiffModelBuilder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lifecycle of this is kind of weird and I'm not sure all cases are handled, such as reopening an editor without re-running the "compare selected". It seems like the main point of this is sharing the model builder between the two URIs.
Instead I would suggest stashing some query parameters when you build the URIs like ?token={uuid}&side={modified|original}
having the getDiff be something like
getDiff(uri: vscode.Uri): { builder: HexDiffModelBuilder, dispose: () => void } {
const { token, side } = parseQueryStringFrom(uri);
if (!this.diffsBuilder.get(token)) {
this.diffsBuilder.set(token, { refCount: 0, value: new DiffsBuilder() });
}
const builder = this.diffsBuilder.get(token);
builder.refCount++;
builder.value.set(side, uri);
return { builder: builder.value, dispose: () => { /* refCount--, remove if 0 */ }}
}
...and have HexDiffModelBuilder.build()
await both sides being set similarly to how it works for the documents today. That approach should allow things to work and the editor to be reopened/moved/duplicated without issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great suggestion, it made the lifecycle a lot simpler! I did some small tweaks, but overall it should be the same.
Thank you for the review and sorry for the late changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, nice work!
Thank you for the opportunity to work on this and taking your time to review everything! I learnt so much about diffing and vscode. |
is this release |
MVP to receive some early feedback on the diff process. (Closes #47 )
As I was unsure if it was possible to do this since there isn't any explicit API to create custom diff, I rushed most components to figure out the diff process. There is still plenty of work to do, but hopefully this shows how the diff can look.
Simple Diff Algorithm
Improved Diff Alg
There are some issues due to VSCode not supporting custom diff editors (I think), below is a list.
(Unfixable?) Issues
vscode.window.TabGroup
does not have a class type for custom editor's diff, so.input
is undefined.Insert
does not work