Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QueryPrinter for getting a string representation of a Query #106

Merged
merged 7 commits into from
Jan 9, 2024

Conversation

valencik
Copy link
Collaborator

@valencik valencik commented Dec 19, 2023

Adds QueryPrinter to print out a string representation of a Query.

QueryPrinter.print(Proximity("cats jumped", 2))

"cats jumped"~2

Resolves #107

@valencik valencik self-assigned this Dec 19, 2023
@valencik
Copy link
Collaborator Author

With the simple string concat printer

[info] Result "pink.cozydev.lucille.benchmarks.QueryPrinterBenchmark.termQueriesPrint":
[info]   1243.056 ±(99.9%) 72.279 ops/ms [Average]
[info]   (min, avg, max) = (1154.453, 1243.056, 1339.269), stdev = 83.236
[info]   CI (99.9%): [1170.778, 1315.335] (assumes normal distribution)
[info] # Run complete. Total time: 00:40:03
[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                               (size)   Mode  Cnt     Score    Error   Units
[info] QueryPrinterBenchmark.orQueriesPrint     10  thrpt   20  3211.063 ± 96.092  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint    100  thrpt   20   430.307 ±  1.827  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint   1000  thrpt   20    41.614 ±  0.097  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint   10  thrpt   20  1285.718 ±  7.069  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint  100  thrpt   20  1308.197 ± 11.859  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint 1000  thrpt   20  1243.056 ± 72.279  ops/ms
[success] Total time: 2406 s (40:06), completed Dec 19, 2023, 8:30:50 PM

@valencik
Copy link
Collaborator Author

Benchmarks with the StringBuilder approach and also just calling .toString() on the Query case class as a sort of baseline:

[info] Benchmark                                       (size)   Mode  Cnt     Score     Error   Units
[info] QueryPrinterBenchmark.orQueriesPrint                10  thrpt   20  4913.727 ±  94.479  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint               100  thrpt   20   555.244 ±  19.572  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrint              1000  thrpt   20    58.698 ±   0.279  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintOld             10  thrpt   20  3176.977 ±  85.412  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintOld            100  thrpt   20   430.566 ±   3.422  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintOld           1000  thrpt   20    41.384 ±   0.404  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintToString        10  thrpt   20  1077.642 ± 102.904  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintToString       100  thrpt   20   192.621 ±  28.387  ops/ms
[info] QueryPrinterBenchmark.orQueriesPrintToString      1000  thrpt   20    19.669 ±   2.172  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint              10  thrpt   20  1280.688 ±   6.515  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint             100  thrpt   20  1260.382 ±   9.133  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrint            1000  thrpt   20  1255.772 ±  10.110  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintOld           10  thrpt   20  1243.494 ±  15.798  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintOld          100  thrpt   20  1222.824 ±  34.379  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintOld         1000  thrpt   20  1263.705 ±  17.696  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintToString      10  thrpt   20   680.583 ±   9.988  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintToString     100  thrpt   20   634.705 ±   9.397  ops/ms
[info] QueryPrinterBenchmark.termQueriesPrintToString    1000  thrpt   20   674.105 ±  17.697  ops/ms

These are in operations per second, so higher is better.
The Stringbuilder approach seems about 1.3-1.5 times faster than the string concatenation approach, an 3-4x fast that toString().
That the performance gap isn't larger makes sense, the string concatenation approach still leverages StringBuilders inside all the mkString calls.

@valencik valencik marked this pull request as ready for review December 30, 2023 12:31
@valencik
Copy link
Collaborator Author

valencik commented Jan 3, 2024

I had been hesitant to call this "done" because it doesn't expose any configuration, in particular whether or not the printer should force in OR delimiters.

For example we parse the following query: cat hat and then print it back out as cat OR hat.

However, we also don't expose any configuration on the parsing side to explicitly say the default boolean (OR vs AND). So adding this in this PR would be a bit of work.

We should offer this configuration at some point, but perhaps it's better in another PR.
I think this PR is ready to go in its current scope.

@valencik valencik requested a review from samspills January 3, 2024 15:21
@@ -0,0 +1,95 @@
/*
* Copyright 2022 CozyDev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated aside: do we need to update this?

Copy link
Contributor

@samspills samspills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we also don't expose any configuration on the parsing side to explicitly say the default boolean (OR vs AND). So adding this in this PR would be a bit of work.
We should offer this configuration at some point, but perhaps it's better in another PR.

I agree about offering the configuration, and agree that work is worth it's own PR


/** To run the benchmark from within sbt:
*
* jmh:run -i 10 -wi 10 -f 2 -t 1 pink.cozydev.lucille.benchmarks.QueryPrinterBenchmark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not run the benchmarks, just fyi

import pink.cozydev.lucille.Query._
import cats.data.NonEmptyList

class QueryPrinterSimpleQueriesSuite extends munit.FunSuite {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

@valencik valencik merged commit e9f6b6d into main Jan 9, 2024
25 checks passed
@valencik valencik deleted the printer branch January 9, 2024 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A way to get a lucene query string back from a Query
2 participants