Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with HList's Filter #596

Closed
nhap69 opened this issue May 19, 2016 · 19 comments
Closed

Performance issue with HList's Filter #596

nhap69 opened this issue May 19, 2016 · 19 comments

Comments

@nhap69
Copy link

nhap69 commented May 19, 2016

It seems there is a problem with shapeless.ops.hlist.Filter where the compiler takes too much time to filter on a very short HList e.g. on an HList of 10 Nat it takes almost 3 mins to compile.

Reproduced code:

import shapeless.{::, HNil}
import shapeless.nat._

type NatHList = _1 :: _2 :: _3 :: _4 :: _5 :: // 5 seconds
  _6 :: // 6s
  _7 :: // 9s
  _8 :: // 18s
  _9 :: // 50s
  _10 :: // 174s
  HNil

shapeless.ops.hlist.Filter[NatHList, _0]
@milessabin
Copy link
Owner

Would you like to take a swing at trying to speed it up?

@nhap69
Copy link
Author

nhap69 commented May 19, 2016

Thanks for your response, I'll try to do dig it down a bit.

@fthomas
Copy link
Contributor

fthomas commented May 21, 2016

I don't know why this happens or what to do about it but I just want to mention that the exponential growth in compilation time can be easily measured with the newly added compileTime macro, e.g.:

val times = List(
  compileTime("Filter[HNil, _0]"),
  compileTime("Filter[_1 :: HNil, _0]"),
  compileTime("Filter[_1 :: _2 :: HNil, _0]"),
  compileTime("Filter[_1 :: _2 :: _3 :: HNil, _0]"),
  compileTime("Filter[_1 :: _2 :: _3 :: _4 :: HNil, _0]"),
  compileTime("Filter[_1 :: _2 :: _3 :: _4 :: _5 :: HNil, _0]"),
  compileTime("Filter[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: HNil, _0]"),
  compileTime("Filter[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: HNil, _0]"),
  compileTime("Filter[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: _8 :: HNil, _0]")
)

scala> times.map(_.toMillis).zipWithIndex.map(t => t._2 + "  " + t._1).mkString("\n")
res0: String =
0  9
1  16
2  65
3  164
4  497
5  1594
6  4287
7  13989
8  55662

@fthomas
Copy link
Contributor

fthomas commented May 24, 2016

I played a little bit more with compileTime and other operations and noticed that this issue also applies to hlist.Partition (which is used to implement hlist.Filter) and coproduct.Partition.

@wilaszekg
Copy link

In your examples this must be because of using typed numbers. For non-recursive types I get more reasonable and non-exponential results:

0  17
1  7
2  13
3  15
4  31
5  109
6  148
7  327
8  631
9  1296
10  2821
11  5546
12  11601

while for recursive numbers my times are similar to yours:

0  1
1  9
2  28
3  174
4  373
5  1303
6  3261
7  9943
8  35320

@nhap69
Copy link
Author

nhap69 commented May 28, 2016

The problem persists even on non-recursive types e.g. _0 but on a bit larger HList:

type L9  = _0 :: _0 :: _0 ::
  _0 :: _0 :: _0 ::
  _0 :: _0 :: _0 ::
  HNil
type L10 = _0 :: L9
type L11 = _0 :: L10
type L12 = _0 :: L11
type L13 = _0 :: L12

val times = List(
  compileTime("Filter[L9, _0]"),
  compileTime("Filter[L10, _0]"),
  compileTime("Filter[L11, _0]"),
  compileTime("Filter[L12, _0]"),
  compileTime("Filter[L13, _0]")
)

still exponential time:

0  1593
1  2979
2  5849
3  12709
4  27687

@mdedetrich
Copy link

I am not sure if this is related specifically to Filter, but I am getting huge compile times when using slickless (https://github.com/underscoreio/slickless) with shapeless 2.3.1 to compile a HList of 39 elements (40 if including HNil) to a case class.

So far its been compiling for over 5 minutes

@milessabin
Copy link
Owner

These sorts of compile times aren't completely out of line with my expectations for filter. Is this really an operation you would expect to be performing on a DB row though? If you can say what you're trying to achieve at a higher level I might be able to suggest an alternative.

@chrisbenincasa
Copy link

chrisbenincasa commented Oct 23, 2018

I just hit this issue today - I'm still pretty new to using this stuff but I had my hand at a simpler reimplementation of Filter that doesn't use Partition under-the-hood. The main difference is that this implementation uses an implicit =:= rather than Shapeless's =:!= plus prioritization. I've also verified that the new implementation passes the current HList filter tests.

I've run comparisons for the various tests above (with Scala 2.12.5, shapeless 2.3.3): #596 (comment)

Existing Filter
0  7
1  10
2  40
3  174
4  540
5  1433
6  3489
7  11571
8  47236

New Filter
0  0
1  3
2  9
3  26
4  50
5  183
6  219
7  504
8  1075

Non-recursive types
#596 (comment)

Existing Filter
0  2326
1  4412
2  8518
3  17908
4  30146

New Filter
0  65
1  14
2  11
3  12
4  15

If this looks good, I'd be happy to also implement FilterNot and open up a PR!

EDIT: I'm going to try reimplementing Partition to use =:=; perhaps we can fix the compilation time for that operation was well.

@chrisbenincasa
Copy link

chrisbenincasa commented Oct 23, 2018

cc: @milessabin Quick follow-up: the change to Partition worked as well.

Here are the results when rerunning the tests from #596 (comment)

Existing
0  7
1  18
2  98
3  200
4  443
5  1359
6  3624
7  11845
8  54332

New
0  2
1  7
2  11
3  23
4  53
5  106
6  320
7  481
8  1098

Additionally, I did notice the stale PR #682 doing something similar. I can piggyback off of that and complete the work. Luckily, it seems that the reimplementation of Partition would allow us to leave Filter/FilterNot alone, for now. However, I'm still running some tests on the Filter derived from Partition with the new implementation.

For completeness, the test:

{
  import shapeless.ops.hlist.Partition
  import shapeless.nat._

  val times = List(
    compileTime("Partition[HNil, _0]"),
    compileTime("Partition[_1 :: HNil, _0]"),
    compileTime("Partition[_1 :: _2 :: HNil, _0]"),
    compileTime("Partition[_1 :: _2 :: _3 :: HNil, _0]"),
    compileTime("Partition[_1 :: _2 :: _3 :: _4 :: HNil, _0]"),
    compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: HNil, _0]"),
    compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: HNil, _0]"),
    compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: HNil, _0]"),
    compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: _8 :: HNil, _0]")
  )

  println(times.map(_.toMillis).zipWithIndex.map(t => t._2 + "  " + t._1).mkString("\n"))

  println()

  val times2 = List(
    compileTime("FasterPartition[HNil, _0]"),
    compileTime("FasterPartition[_1 :: HNil, _0]"),
    compileTime("FasterPartition[_1 :: _2 :: HNil, _0]"),
    compileTime("FasterPartition[_1 :: _2 :: _3 :: HNil, _0]"),
    compileTime("FasterPartition[_1 :: _2 :: _3 :: _4 :: HNil, _0]"),
    compileTime("FasterPartition[_1 :: _2 :: _3 :: _4 :: _5 :: HNil, _0]"),
    compileTime("FasterPartition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: HNil, _0]"),
    compileTime("FasterPartition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: HNil, _0]"),
    compileTime("FasterPartition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: _8 :: HNil, _0]")
  )

  println(times2.map(_.toMillis).zipWithIndex.map(t => t._2 + "  " + t._1).mkString("\n"))
}

@milessabin
Copy link
Owner

This is great stuff ... thanks!

What Scala version are these benchmarks relative to? I'd be really interested to see if there's a significant difference in compile time between 2.12.6 and 2.12.7 in this case.

@milessabin
Copy link
Owner

Ahh ... just seen that you mention 2.12.5.

Please try the benchmark with 2.12.7 ... I anticipate a significant speedup due to scala/scala#7067. It's likely that this reworking of filter and partition will still be worth doing, but it'd be good to have up to date numbers.

@chrisbenincasa
Copy link

Of course! I should've bumped it initially; had it on 2.12.5 to get parity with the application in which I saw slowness. The 2.12.7 numbers for the existing implementation are much more encouraging (and much less so for the version using =:=)

I've included a few more test targets this time so we can get a feel.

compileTime("Partition[HNil, _0]"),
compileTime("Partition[_1 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: _8 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: _8 :: _0 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: _8 :: _0 :: _0 :: HNil, _0]"),
compileTime("Partition[_1 :: _2 :: _3 :: _4 :: _5 :: _6 :: _7 :: _8 :: _0 :: _0 :: _0 :: HNil, _9]")
Existing

0  6
1  9
2  13
3  11
4  9
5  10
6  12
7  15
8  22
9  19
10  33
11  89

New (using neither =:= nor =:!=)
0  0
1  2
2  5
3  8
4  13
5  19
6  14
7  16
8  26
9  27
10  41
11  83

New (using =:=)
0  0
1  4
2  9
3  87
4  57
5  92
6  171
7  407
8  835
9  1377
10  2839
11  6146

Interestingly, the new Partition that uses neither =:= nor =:!= still does quite well in the non-recursive, long list test:

import shapeless.nat._

type L9  = _0 :: _0 :: _0 ::
  _0 :: _0 :: _0 ::
  _0 :: _0 :: _0 ::
  HNil
type L10 = _0 :: L9
type L11 = _0 :: L10
type L12 = _0 :: L11
type L13 = _0 :: L12

val times = List(
  compileTime("Partition[L9, _0]"),
  compileTime("Partition[L10, _0]"),
  compileTime("Partition[L11, _0]"),
  compileTime("Partition[L12, _0]"),
  compileTime("Partition[L13, _0]")
)

println(times.map(_.toMillis).zipWithIndex.map(t => t._2 + "  " + t._1).mkString("\n"))

val times2 = List(
  compileTime("FasterPartition[L9, _0]"),
  compileTime("FasterPartition[L10, _0]"),
  compileTime("FasterPartition[L11, _0]"),
  compileTime("FasterPartition[L12, _0]"),
  compileTime("FasterPartition[L13, _0]")
)

println(times2.map(_.toMillis).zipWithIndex.map(t => t._2 + "  " + t._1).mkString("\n"))
Existing
0  1528
1  3462
2  6153
3  13106
4  25932

New
0  25
1  10
2  136
3  12
4  14

@milessabin
Copy link
Owner

That looks good :-)

I've lost track a bit ... are things good with 2.12.7 as they are now, or are there any improvements which still need to be made?

@chrisbenincasa
Copy link

So things look mostly good with 2.12.7 to me. The one concerning thing I found was that final test. For some reason, the new implementation is still performing much better. It's a bit unclear to me what characteristics about that final test are making the existing implementation still perform so slowly. It's interesting because that behavior isn't exhibited when the target list of the same length but using another type, but this could be separate issue entirely.

@macalinao
Copy link

Confirmed that this issue is fixed on 2.12.7 for me. I was previously on Typelevel scala 2.12.4-bin-typelevel-4.

@milessabin
Copy link
Owner

@chrisbenincasa do you think there's still value in PR'ing any or all of your changes? If not I'll close this.

@chrisbenincasa
Copy link

@milessabin happy to push my branch up for the diff, but from what I can tell the compilation time issue was resolved in 2.12.7 like you mentioned.

@joroKr21
Copy link
Collaborator

Fixed by #682

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants