Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DiffMatchPatch diff handler #16483

Merged
merged 8 commits into from
May 9, 2024
Merged

Conversation

Danstiv
Copy link
Contributor

@Danstiv Danstiv commented May 5, 2024

Link to issue number:

Follow up to #16027
Fixes #15850

Summary of the issue:

Diff Match Patch proxy crashes and the calling thread deadlocks

Description of user facing changes

Diff Match Patch proxy will become more stable

Description of development approach

Refactored DiffMatchPatch diff handler.

Now, when reading from stdout of a proxy process, if not enough bytes are read, the return code is checked.

If a return code was received, an exception is raised and a fallback to difflib occurs.

Testing strategy:

Changes were made to the dmp code that caused it to crash when receiving certain characters.
Next, a test sequence of characters was typed in the terminal, which caused dmp to crash.
Exception messages in dmp and fallback to difflib were logged in the NVDA log.
An attempt to restart NVDA was successful, nvda restarted without delay, and the exit sound was played. This did not happen with deadlocked dmp diff handler.

Known issues with pull request:

None yet

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@Danstiv
Copy link
Contributor Author

Danstiv commented May 5, 2024

besides this, some changes and fixes were made to the fast_diff_match_patch (thanks to @beqabeqa473).

The fast_diff_match_patch currently can process emoji, and also does not crash when processing large amounts of text.

Now all known issues have been fixed and I think nvda could switch back to the fast_diff_match_patch

I also refactored the nvda dmp application which could potentially improve performance, @codeofdusk could you review it please?

@Danstiv Danstiv marked this pull request as ready for review May 5, 2024 16:13
@Danstiv Danstiv requested a review from a team as a code owner May 5, 2024 16:13
@Danstiv Danstiv requested a review from michaelDCurran May 5, 2024 16:13
@seanbudd seanbudd added this to the 2024.3 milestone May 6, 2024
@seanbudd seanbudd added the merge-early Merge Early in a developer cycle label May 6, 2024
@seanbudd
Copy link
Member

seanbudd commented May 6, 2024

Do you know if this effects #15841 #15850 or #15786

source/diffHandler.py Outdated Show resolved Hide resolved
source/diffHandler.py Show resolved Hide resolved
@Danstiv
Copy link
Contributor Author

Danstiv commented May 6, 2024

Do you know if this effects #15841 #15850 or #15786

#15850 only

Co-authored-by: Sean Budd <seanbudd123@gmail.com>
@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label May 7, 2024
@seanbudd
Copy link
Member

seanbudd commented May 7, 2024

Could you please add a change log entry to changes.md, under bug fixes? Please reference the two issues addressed

@Danstiv
Copy link
Contributor Author

Danstiv commented May 7, 2024

@seanbudd, @gerald-hartig
What are we going to do with diff match patch proxy?
@codeofdusk hasn't answered me since february, and the changes, in my opinion, are quite important.
Firstly, @beqabeqa473 made changes to dmp-cpp-stl, which were accepted in the fast diff match patch repository.
These changes solve the problem of fast diff match patch crashing when calculating large diffs.
The problem with unicode characters has also been resolved, both on the fast diff match patch side and on the pr side in the diff match patch proxy (bytes used instead of strings).
Secondly, according to this comment, the performance of pure python diff match patch library is 4 to 10 times lower, which probably affects users who actively use terminal applications on Windows 10.

Copy link
Contributor

@codeofdusk codeofdusk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this!

source/diffHandler.py Outdated Show resolved Hide resolved
source/diffHandler.py Show resolved Hide resolved
user_docs/en/changes.md Outdated Show resolved Hide resolved
Copy link
Contributor

@codeofdusk codeofdusk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Danstiv
Copy link
Contributor Author

Danstiv commented May 8, 2024

@codeofdusk
Thanks for merge!

Can I update submodule and requirements.txt in this pr?

@codeofdusk
Copy link
Contributor

See #16508.

@seanbudd seanbudd merged commit 1791eb9 into nvaccess:master May 9, 2024
1 check passed
@LeonarddeR
Copy link
Collaborator

On a linux machine, I'm getting several errors like this after a docker compose pull:

Exception in DMP, falling back to difflib
Traceback (most recent call last):
File "diffHandler.pyc", line 104, in diff
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 2: invalid start byte

XLTechie pushed a commit to XLTechie/xlnvda that referenced this pull request May 10, 2024
Follow up to nvaccess#16027
Fixes nvaccess#15850

Summary of the issue:
Diff Match Patch proxy crashes and the calling thread deadlocks

Description of user facing changes
Diff Match Patch proxy will become more stable

Description of development approach
Refactored DiffMatchPatch diff handler.

Now, when reading from stdout of a proxy process, if not enough bytes are read, the return code is checked.

If a return code was received, an exception is raised and a fallback to difflib occurs.
@Danstiv
Copy link
Contributor Author

Danstiv commented May 10, 2024

I was able to reproduce.
It seems that this is another problem with unicode in the fast diff match patch.
Now I'll try to investigate.

@LeonarddeR
Copy link
Collaborator

LeonarddeR commented May 10, 2024 via email

@Danstiv
Copy link
Contributor Author

Danstiv commented May 10, 2024

This can probably be fixed by a partial revert of dmp changes.
When strings instead of byte sequences used, the problem with docker compose pull is not reproduced.
@codeofdusk, what do you think?

@Danstiv
Copy link
Contributor Author

Danstiv commented May 10, 2024

The problem described here does not occur with the new fast diff match patch, even when passing strings instead of bytes.

@Danstiv
Copy link
Contributor Author

Danstiv commented May 11, 2024

I created a pr just in case, I will make additional changes if necessary.

@codeofdusk codeofdusk mentioned this pull request May 11, 2024
5 tasks
seanbudd pushed a commit that referenced this pull request May 13, 2024
#16483 (comment)

Summary of the issue:
The current nvda_dmp is out of date.

Description of how this pull request fixes the issue:
Update nvda_dmp to latest commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. merge-early Merge Early in a developer cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UIA instabilities mostly in console windows
5 participants