Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant sourceWordsBeingConsidered #2

Open
maali-mnasri opened this issue Dec 3, 2015 · 4 comments
Open

Redundant sourceWordsBeingConsidered #2

maali-mnasri opened this issue Dec 3, 2015 · 4 comments

Comments

@maali-mnasri
Copy link

In aligner.py lines 1267 and 1268, each source/target word may be appended many times to the sourceWordsBeingConsidered/targetWordsBeingConsidered lists, which make these lists too big due to redundant elements. I do not see the point of including words indices many times as this makes the next loop (line 1285) very time consuming.
To accelerate the execution, I converted sourceWordsBeingConsidered and targetWordsBeingConsidered lists to sets to remove duplicates. It is far faster now and I get the same alignment in testalign.py, however, I want to be sure that this does not deteriorate the alignment quality in other cases. Can you please confirm that removing redudancy is safe?

@ma-sultan
Copy link
Owner

Thanks for catching this; what you have done is what was originally intended. The alignments should still be the same, because of the two continues on lines 1293 and 1297. I will update the source soon.

@maali-mnasri
Copy link
Author

Great! Thank you.

@eoehri
Copy link

eoehri commented Mar 9, 2017

Hi, I'm also running in performance issues. Could you please provide your adjusted code? Many thanks.

@maali-mnasri
Copy link
Author

@eoehri
Hi, I just added in aligner.py file these two lines
sourceWordIndicesBeingConsidered=list(set(sourceWordIndicesBeingConsidered)) targetWordIndicesBeingConsidered=list(set(targetWordIndicesBeingConsidered))
between line 1282 and line 1285 (just before the loop) . I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants