Redundant sourceWordsBeingConsidered #2

maali-mnasri · 2015-12-03T13:28:53Z

In aligner.py lines 1267 and 1268, each source/target word may be appended many times to the sourceWordsBeingConsidered/targetWordsBeingConsidered lists, which make these lists too big due to redundant elements. I do not see the point of including words indices many times as this makes the next loop (line 1285) very time consuming.
To accelerate the execution, I converted sourceWordsBeingConsidered and targetWordsBeingConsidered lists to sets to remove duplicates. It is far faster now and I get the same alignment in testalign.py, however, I want to be sure that this does not deteriorate the alignment quality in other cases. Can you please confirm that removing redudancy is safe?

ma-sultan · 2015-12-04T07:29:36Z

Thanks for catching this; what you have done is what was originally intended. The alignments should still be the same, because of the two continues on lines 1293 and 1297. I will update the source soon.

maali-mnasri · 2015-12-04T08:42:50Z

Great! Thank you.

eoehri · 2017-03-09T09:07:33Z

Hi, I'm also running in performance issues. Could you please provide your adjusted code? Many thanks.

maali-mnasri · 2017-03-09T11:23:04Z

@eoehri
Hi, I just added in aligner.py file these two lines
sourceWordIndicesBeingConsidered=list(set(sourceWordIndicesBeingConsidered)) targetWordIndicesBeingConsidered=list(set(targetWordIndicesBeingConsidered))
between line 1282 and line 1285 (just before the loop) . I hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant sourceWordsBeingConsidered #2

Redundant sourceWordsBeingConsidered #2

maali-mnasri commented Dec 3, 2015

ma-sultan commented Dec 4, 2015

maali-mnasri commented Dec 4, 2015

eoehri commented Mar 9, 2017

maali-mnasri commented Mar 9, 2017

Redundant sourceWordsBeingConsidered #2

Redundant sourceWordsBeingConsidered #2

Comments

maali-mnasri commented Dec 3, 2015

ma-sultan commented Dec 4, 2015

maali-mnasri commented Dec 4, 2015

eoehri commented Mar 9, 2017

maali-mnasri commented Mar 9, 2017