Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] buildGroups is unnecessarily expensive #3412

Closed
a-johnston opened this issue Jun 26, 2018 · 2 comments
Closed

[jvm-packages] buildGroups is unnecessarily expensive #3412

a-johnston opened this issue Jun 26, 2018 · 2 comments

Comments

@a-johnston
Copy link
Contributor

Added in #3387, the method relies on a loop followed by two Seq.apply calls:

while (i < groups.length) {
    if (groups(i) != groups(i - 1)) {
...

which for lazily generated sequences (when I checked, groups was an instance of scala.collection.immutable.Stream$Cons) is quadratic. This behavior can be seen by comparing the running times of the following accesses:

val s = Stream.continually(0).toSeq
s(0)
s(1000000)
s(100000000)

(I ctrl-C'd before that last one finished on my machine)

The method buildGroups should do a single pass and not rely on index access.

@CodingCat
Copy link
Member

@a-johnston are you interested in filing a PR?

@a-johnston
Copy link
Contributor Author

Yep, in progress

@lock lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants