Add `QueryPrinter` for getting a string representation of a Query #106

valencik · 2023-12-19T16:04:53Z

Adds QueryPrinter to print out a string representation of a Query.

QueryPrinter.print(Proximity("cats jumped", 2))

"cats jumped"~2

Resolves #107

valencik · 2023-12-20T02:52:40Z

With the simple string concat printer

[info] Result "pink.cozydev.lucille.benchmarks.QueryPrinterBenchmark.termQueriesPrint":
[info]   1243.056 ±(99.9%) 72.279 ops/ms [Average]
[info]   (min, avg, max) = (1154.453, 1243.056, 1339.269), stdev = 83.236
[info]   CI (99.9%): [1170.778, 1315.335] (assumes normal distribution)
[info] # Run complete. Total time: 00:40:03
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                               (size)   Mode  Cnt     Score    Error   Units
[info] QueryPrinterBenchmark.orQueriesPrint     10  thrpt   20  3211.063 ± 96.092  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint    100  thrpt   20   430.307 ±  1.827  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint   1000  thrpt   20    41.614 ±  0.097  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint   10  thrpt   20  1285.718 ±  7.069  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint  100  thrpt   20  1308.197 ± 11.859  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint 1000  thrpt   20  1243.056 ± 72.279  ops/ms
[success] Total time: 2406 s (40:06), completed Dec 19, 2023, 8:30:50 PM

valencik · 2023-12-21T20:44:52Z

Benchmarks with the StringBuilder approach and also just calling .toString() on the Query case class as a sort of baseline:

[info] Benchmark                                       (size)   Mode  Cnt     Score     Error   Units
[info] QueryPrinterBenchmark.orQueriesPrint                10  thrpt   20  4913.727 ±  94.479  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint               100  thrpt   20   555.244 ±  19.572  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint              1000  thrpt   20    58.698 ±   0.279  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintOld             10  thrpt   20  3176.977 ±  85.412  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintOld            100  thrpt   20   430.566 ±   3.422  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintOld           1000  thrpt   20    41.384 ±   0.404  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintToString        10  thrpt   20  1077.642 ± 102.904  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintToString       100  thrpt   20   192.621 ±  28.387  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintToString      1000  thrpt   20    19.669 ±   2.172  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint              10  thrpt   20  1280.688 ±   6.515  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint             100  thrpt   20  1260.382 ±   9.133  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint            1000  thrpt   20  1255.772 ±  10.110  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintOld           10  thrpt   20  1243.494 ±  15.798  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintOld          100  thrpt   20  1222.824 ±  34.379  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintOld         1000  thrpt   20  1263.705 ±  17.696  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintToString      10  thrpt   20   680.583 ±   9.988  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintToString     100  thrpt   20   634.705 ±   9.397  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintToString    1000  thrpt   20   674.105 ±  17.697  ops/ms

These are in operations per second, so higher is better.
The Stringbuilder approach seems about 1.3-1.5 times faster than the string concatenation approach, an 3-4x fast that toString().
That the performance gap isn't larger makes sense, the string concatenation approach still leverages StringBuilders inside all the mkString calls.

valencik · 2024-01-03T15:21:53Z

I had been hesitant to call this "done" because it doesn't expose any configuration, in particular whether or not the printer should force in OR delimiters.

For example we parse the following query: cat hat and then print it back out as cat OR hat.

However, we also don't expose any configuration on the parsing side to explicitly say the default boolean (OR vs AND). So adding this in this PR would be a bit of work.

We should offer this configuration at some point, but perhaps it's better in another PR.
I think this PR is ready to go in its current scope.

samspills · 2024-01-03T21:46:48Z

core/src/main/scala/pink/cozydev/lucille/QueryPrinter.scala

@@ -0,0 +1,95 @@
+/*
+ * Copyright 2022 CozyDev


unrelated aside: do we need to update this?

samspills

However, we also don't expose any configuration on the parsing side to explicitly say the default boolean (OR vs AND). So adding this in this PR would be a bit of work.
We should offer this configuration at some point, but perhaps it's better in another PR.

I agree about offering the configuration, and agree that work is worth it's own PR

samspills · 2024-01-03T21:54:06Z

benchmarks/src/main/scala/pink/cozydev/lucille/QueryPrinterBenchmark.scala

+
+/** To run the benchmark from within sbt:
+  *
+  * jmh:run -i 10 -wi 10 -f 2 -t 1 pink.cozydev.lucille.benchmarks.QueryPrinterBenchmark


I did not run the benchmarks, just fyi

samspills · 2024-01-03T21:54:23Z

core/src/test/scala/pink/cozydev/lucille/QueryPrinterSuite.scala

+import pink.cozydev.lucille.Query._
+import cats.data.NonEmptyList
+
+class QueryPrinterSimpleQueriesSuite extends munit.FunSuite {


Add initial, very simple printer

b570b88

valencik self-assigned this Dec 19, 2023

valencik added 2 commits December 19, 2023 12:29

Add initial benchmarks

e364ccd

Rework benchmarks, no parsing

7a3543b

valencik added 4 commits December 21, 2023 15:45

Add StringBuild approach

15f28d9

Remove old printer

dd91a73

Simplify / reduce benchmarks

d50e98c

Ditch the toString benchmarks

397901f

valencik marked this pull request as ready for review December 30, 2023 12:31

valencik requested a review from samspills January 3, 2024 15:21

samspills reviewed Jan 3, 2024

View reviewed changes

samspills approved these changes Jan 3, 2024

View reviewed changes

valencik merged commit e9f6b6d into main Jan 9, 2024
25 checks passed

valencik deleted the printer branch January 9, 2024 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `QueryPrinter` for getting a string representation of a Query #106

Add `QueryPrinter` for getting a string representation of a Query #106

valencik commented Dec 19, 2023 •

edited

Loading

valencik commented Dec 20, 2023

valencik commented Dec 21, 2023

valencik commented Jan 3, 2024

samspills Jan 3, 2024

samspills left a comment

samspills Jan 3, 2024

samspills Jan 3, 2024

Add QueryPrinter for getting a string representation of a Query #106

Add QueryPrinter for getting a string representation of a Query #106

Conversation

valencik commented Dec 19, 2023 • edited Loading

valencik commented Dec 20, 2023

valencik commented Dec 21, 2023

valencik commented Jan 3, 2024

samspills Jan 3, 2024

Choose a reason for hiding this comment

samspills left a comment

Choose a reason for hiding this comment

samspills Jan 3, 2024

Choose a reason for hiding this comment

samspills Jan 3, 2024

Choose a reason for hiding this comment

Add `QueryPrinter` for getting a string representation of a Query #106

Add `QueryPrinter` for getting a string representation of a Query #106

valencik commented Dec 19, 2023 •

edited

Loading