Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing out a different HashSet implementation #5

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hmottestad
Copy link
Contributor

@hmottestad hmottestad commented Oct 30, 2023

Before

Benchmark                                                            Mode  Cnt     Score     Error  Units
BasicProcessingAlgorithmsBenchmark.compactDatagovbeDcat              avgt    5  2136.886 ± 117.612  ms/op
BasicProcessingAlgorithmsBenchmark.compactDatagovbeDcatEmptyContext  avgt    5  1254.016 ±  45.229  ms/op
BasicProcessingAlgorithmsBenchmark.expandDatagovbeDcatFromCompact    avgt    5   376.505 ±  21.036  ms/op
BasicProcessingAlgorithmsBenchmark.expandDatagovbeDcatFromFlatten    avgt    5   382.821 ±  14.753  ms/op
BasicProcessingAlgorithmsBenchmark.flattenDatagovbeDcat              avgt    5   830.187 ±  74.280  ms/op
BasicProcessingAlgorithmsBenchmark.flattenDatagovbeDcatFromCompact   avgt    5  4268.813 ± 318.006  ms/op
ToRdfLargeFilesBenchmark.datagovbeDcat                               avgt    5  1432.005 ±  62.780  ms/op
ToRdfSmallFilesBenchmark.csiro                                       avgt    5     0.079 ±   0.002  ms/op
ToRdfSmallFilesBenchmark.difiDataset                                 avgt    5     1.042 ±   0.009  ms/op
ToRdfSmallFilesBenchmark.geonorge                                    avgt    5     0.008 ±   0.003  ms/op
ToRdfSmallFilesBenchmark.schemaExample1                              avgt    5     0.869 ±   0.020  ms/op
ToRdfSmallFilesBenchmark.schemaExample2                              avgt    5     0.854 ±   0.022  ms/op
ToRdfSmallFilesBenchmark.schemaExample3                              avgt    5     0.888 ±   0.042  ms/op
ToRdfSmallFilesBenchmark.schemaExample4                              avgt    5     0.890 ±   0.022  ms/op
ToRdfSmallFilesBenchmark.schemaExtBib                                avgt    5     1.059 ±   0.027  ms/op
ToRdfSmallFilesBenchmark.schemaExtHealthLifeSci                      avgt    5     2.724 ±   0.024  ms/op
ToRdfSmallFilesBenchmark.schemaExtMeta                               avgt    5     0.948 ±   0.196  ms/op
# Benchmark: no.hasmac.jsonld.benchmark.OOMBenchmark.datagovbeDcatToRdf

# Run progress: 0.59% complete, ETA 1 days, 06:39:54
# Fork: 1 of 1
Iteration   1: 1919.005 ms/op
Iteration   2: 822.951 ms/op
Iteration   3: 760.541 ms/op
Iteration   4: 896.373 ms/op
Iteration   5: 894.446 ms/op
Iteration   6: 757.024 ms/op
Iteration   7: 762.095 ms/op
Iteration   8: Terminating due to java.lang.OutOfMemoryError: Java heap space

After

Benchmark                                                            Mode  Cnt     Score     Error  Units
BasicProcessingAlgorithmsBenchmark.compactDatagovbeDcat              avgt    5  2121.251 ±  39.567  ms/op
BasicProcessingAlgorithmsBenchmark.compactDatagovbeDcatEmptyContext  avgt    5  1210.684 ± 122.861  ms/op
BasicProcessingAlgorithmsBenchmark.expandDatagovbeDcatFromCompact    avgt    5   385.342 ±  16.798  ms/op
BasicProcessingAlgorithmsBenchmark.expandDatagovbeDcatFromFlatten    avgt    5   406.715 ±  18.744  ms/op
BasicProcessingAlgorithmsBenchmark.flattenDatagovbeDcat              avgt    5   839.252 ±  22.937  ms/op
BasicProcessingAlgorithmsBenchmark.flattenDatagovbeDcatFromCompact   avgt    5  4554.356 ± 206.368  ms/op
ToRdfLargeFilesBenchmark.datagovbeDcat                               avgt    5  1136.119 ±  88.865  ms/op
ToRdfSmallFilesBenchmark.csiro                                       avgt    5     0.085 ±   0.001  ms/op
ToRdfSmallFilesBenchmark.difiDataset                                 avgt    5     1.210 ±   0.006  ms/op
ToRdfSmallFilesBenchmark.geonorge                                    avgt    5     0.009 ±   0.001  ms/op
ToRdfSmallFilesBenchmark.schemaExample1                              avgt    5     0.880 ±   0.008  ms/op
ToRdfSmallFilesBenchmark.schemaExample2                              avgt    5     0.867 ±   0.005  ms/op
ToRdfSmallFilesBenchmark.schemaExample3                              avgt    5     0.909 ±   0.004  ms/op
ToRdfSmallFilesBenchmark.schemaExample4                              avgt    5     0.894 ±   0.002  ms/op
ToRdfSmallFilesBenchmark.schemaExtBib                                avgt    5     1.075 ±   0.003  ms/op
ToRdfSmallFilesBenchmark.schemaExtHealthLifeSci                      avgt    5     2.923 ±   0.025  ms/op
ToRdfSmallFilesBenchmark.schemaExtMeta                               avgt    5     0.931 ±   0.002  ms/op
# Benchmark: no.hasmac.jsonld.benchmark.OOMBenchmark.datagovbeDcatToRdf

# Run progress: 0.59% complete, ETA 1 days, 06:38:39
# Fork: 1 of 1
Iteration   1: 1798.994 ms/op
Iteration   2: 1029.074 ms/op
Iteration   3: 920.315 ms/op
Iteration   4: 880.087 ms/op
Iteration   5: 843.231 ms/op
Iteration   6: 810.543 ms/op
Iteration   7: 801.672 ms/op
Iteration   8: 811.611 ms/op
Iteration   9: Terminating due to java.lang.OutOfMemoryError: Java heap space

ToRdfLargeFilesBenchmark.datagovbeDcat is about 25% faster, but some of the ToRdfSmallFilesBenchmarks are slower.

@hmottestad hmottestad marked this pull request as ready for review October 30, 2023 11:45
@hmottestad
Copy link
Contributor Author

There are also the risk of dramatically making some other use cases slower because of the use of Object2ObjectArrayMap which scaled terribly, but is much more efficient when there are only 1 or 2 items.

@hmottestad
Copy link
Contributor Author

To continue with this branch we would need to create a hybrid map implementation that would start off using the Object2ObjectArrayMap and swap it out for an more scalable map if more than 2-3 items are inserted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant