Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimization of tree edit distance #102

Merged
merged 17 commits into from
May 16, 2024

Conversation

stefanhahmann
Copy link
Collaborator

@stefanhahmann stefanhahmann commented May 16, 2024

Resolves #59

This PR adds mainly 4 optimizations:

  • Simplification of min cost max flow operation in case of comparing binary trees
  • ~40-60% speed up
  • Avoid using Stream API when Add an optimization for the tree edit distance for the case of binary trees
    • ~25% speed up
  • Replace Maps in ZhangUnorderedTreeEditDistance class with Arrays and CachedTrees
    • ~60-70% speed up
  • Cache Attribute Value in BranchSpotTree
    • ~20% speed up

In total, this PR gains a speed up of up to ~90%
As Benchmark mainly this demo has been used:

or this mastodon file
flatSim2_be_2aba_1bab.zip

with these settings
grafik

stefanhahmann and others added 16 commits May 8, 2024 16:39
…flow operation can be simplified when comparing binary trees
…g time of the dense tree comparison example

* The time measurement does not seem to work inside the actual test (the test produces out of memory error)
… of using stream API

* Profiling showed that this would speed up tree edit distance computation by ~20%
* Using these methods in ZhangUnorderedTreeEditDistance allows to speed up distance computation by 20 to 40% depending on test case
…idimap objects in ZhangUnorderedTreeEditDistance
The idea is that the Zhang algorithm makes a copy of the input trees as
"CachedTree". The CachedTree allows to store tree cost, forest cost, attribute, index etc. which makes the algorithm faster.
@stefanhahmann stefanhahmann changed the title Optimize zhang further Performance optimization of tree edit distance May 16, 2024
Instead, introduce a new class Node that can be implemented by CachedTree
This allows that TreeUtils.getAllAttributes() can be reused in the constructor for ZhangUnorderedTreeEditDistance
Copy link

sonarcloud bot commented May 16, 2024

@stefanhahmann stefanhahmann merged commit 11a0c76 into master May 16, 2024
3 checks passed
@stefanhahmann stefanhahmann deleted the optimize-zhang-further branch May 22, 2024 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance of Zhang Tree Edit Distance
2 participants