Looking forward to a multi-thread RemoveDuplicateLine Functions! #5

henryjj99 · 2018-12-11T21:21:15Z

No description provided.

henryjj99 · 2018-12-11T21:23:06Z

In my recent project there is around 300000 lines to 'RemoveDuplicateLine' but the single thread func in Kangaroo2 is super slow. Hope I can do multi thread some day!

dcascaval · 2018-12-12T13:43:25Z

That seems doable - I'll see if I can prototype it soon, and will let you know. Thanks for the request!

henryjj99 · 2019-01-05T16:52:16Z

Really appreciate it! Thanks

dcascaval · 2019-01-15T18:26:39Z

Hi Henry -

I've attached a prototype as part of a new Impala GHA, along with an example/test file to go with it. The component is called 'ParRemDupLns'. It seems to be performing relatively well (a 'random' test does 300000 lines in under a second on my machine, but results may vary depending on the system. In any case it should be more tuned than the sequential version.)

A couple things to note: it reorders the lines coming through (similarly to other RemoveDuplicateLine implementations). A smaller (e.g more precise) tolerance will result in a faster computation speed. The 'granularity' parameter just affects how it batches its work into parallel portions. This should ONLY affect runtime, not the computed result. The optimal value for this depends on the system, so feel free to adjust to whichever works fastest on your machine (as a rule of thumb I've found 500-1000 are decent.)

Additionally, it behaves (slightly) differently than the original version in cases where there are many (non-duplicate) lines of very similar length, in that it chooses to cull differently - the result should still be usable.

Just replace your current Impala GHA (or download from Food4Rhino - you'll still need the other .dll dependencies) with this one to get it rolling. If you could let me know how it works for you, I'd really appreciate any feedback!

remove_duplicates.zip

henryjj99 · 2019-01-16T03:07:07Z

Hi there:

Thank you for your work! Fantastic! It does work and it works really fast in my case. Screenshot is attached below. I am using New Surface Pro i5 7300u with 2 cores, 4 threads.

Best

dcascaval · 2019-01-16T04:48:38Z

Glad to hear it! I'll refine and test it a bit more and likely add it to the next version of Impala. Thanks for the suggestion!

henryjj99 changed the title ~~Looking forward for a multi-thread RemoveDuplicateLine Functions~~ Looking forward for a multi-thread RemoveDuplicateLine Functions! Dec 11, 2018

henryjj99 changed the title ~~Looking forward for a multi-thread RemoveDuplicateLine Functions!~~ Looking forward to a multi-thread RemoveDuplicateLine Functions! Dec 11, 2018

dcascaval added the enhancement New feature or request label Jan 15, 2019

dcascaval mentioned this issue Oct 12, 2019

Tag 1.0 and move towards a 1.1 release #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looking forward to a multi-thread RemoveDuplicateLine Functions! #5

Looking forward to a multi-thread RemoveDuplicateLine Functions! #5

henryjj99 commented Dec 11, 2018

henryjj99 commented Dec 11, 2018

dcascaval commented Dec 12, 2018

henryjj99 commented Jan 5, 2019

dcascaval commented Jan 15, 2019 •

edited

Loading

henryjj99 commented Jan 16, 2019

dcascaval commented Jan 16, 2019

Looking forward to a multi-thread RemoveDuplicateLine Functions! #5

Looking forward to a multi-thread RemoveDuplicateLine Functions! #5

Comments

henryjj99 commented Dec 11, 2018

henryjj99 commented Dec 11, 2018

dcascaval commented Dec 12, 2018

henryjj99 commented Jan 5, 2019

dcascaval commented Jan 15, 2019 • edited Loading

henryjj99 commented Jan 16, 2019

dcascaval commented Jan 16, 2019

dcascaval commented Jan 15, 2019 •

edited

Loading