Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribute Rdf4j Spring Integration #3269

Merged
merged 19 commits into from
Oct 27, 2021
Merged

Contribute Rdf4j Spring Integration #3269

merged 19 commits into from
Oct 27, 2021

Conversation

fkleedorfer
Copy link
Contributor

@fkleedorfer fkleedorfer commented Aug 1, 2021

Addresses #3059

Note that this is work in progress, a number of things need to be done (help is greatly appreciated!)

  • review org.eclipse.rdf4j.model and org.eclipse.rdf4.sparqlbuilder.*; and possibly extract them to other rdf4j modules (may be done after the PR is merged)
  • add unit tests for individual components
  • add a reference project to illustrate usage (starting from the minimal example in spring-components/rdf4j-spring/src/main/java/org/eclipse/rdf4j/spring/package-info.java)
    • using one DAO
    • using an in-memory repo initialized from a TTL file
    • using all the optional subsystems

@fkleedorfer
Copy link
Contributor Author

@jeenbroekstra not sure how you want to proceed. Adding the tests could take some time, maybe you want to start reviewing the rest of the code?

@abrokenjester
Copy link
Contributor

Thanks for putting this up! I will try and find time to look at it soon (it's mostly weekends for me at the moment though).

Before you continue adding tests etc, can you make sure that you've signed the ECA and that your git commit author address is the same address that you used to sign? The verification step of the build is currently not finding your record.

Have a look at the contributor guidelines, in particular the points on how to sign the Eclipse ECA. Also, while you're fixing up your commits anyway, can you prefix each commit message with the issue number? Something like GH-3059 Initial commit of rdf4j-spring, etc.

Copy link
Contributor

@abrokenjester abrokenjester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a very quick first look through things, looking very good I think. Couple of inline comments, some editorial stuff. I think the biggest thing might be what to do with the SparqlBuilder extensions. They look very useful, but it's not really related to the spring stuff and getting it all tested and integrated might take some time. I'd suggest taking it out of this PR and submitting it in a separate PR.

import org.eclipse.rdf4j.sparqlbuilder.graphpattern.GraphPattern;
import org.eclipse.rdf4j.sparqlbuilder.util.SparqlBuilderUtils;

public class Bind implements GraphPattern {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting, there is actually a separate issue for that: GH-2767. This should probably be moved to the sparqlbuilder module though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're keeping this as part of this PR (which I'm fine with) we should move it to the sparqlbuilder module I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll gladly move the code. However, there is much more in the constraint package in this module, including the whole propertypath subpackage. I suggest we move all of that to the sparqlbuilder module, keeping the package structure as is. Ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I should have been clearer sorry, I meant this comment for the whole package, not just this particular class. Thanks!

@abrokenjester
Copy link
Contributor

One more thing I almost forgot, kinda important: I will likely need to pass this by the Eclipse IP team as well, as it's quite a large contribution. I will take care of that, but just be aware that at some point there may be some questions coming your way regarding code ownership etc.

@fkleedorfer
Copy link
Contributor Author

just be aware that at some point there may be some questions coming your way regarding code ownership etc.

Thanks for the heads-up. I have notified our legal people that this is coming when starting the whole process.

@fkleedorfer
Copy link
Contributor Author

can you prefix each commit message with the issue number? Something like GH-3059 Initial commit of rdf4j-spring, etc.

I thought we'd squash everything in the process of merging, no? If not, are you ok with the commits made so far being amended?

@abrokenjester
Copy link
Contributor

abrokenjester commented Aug 4, 2021

can you prefix each commit message with the issue number? Something like GH-3059 Initial commit of rdf4j-spring, etc.

I thought we'd squash everything in the process of merging, no? If not, are you ok with the commits made so far being amended?

We don't squash-merge anymore, no - we used to do that for a while but there are technical issues with the IP attribution on the squashed commit that is created that way. So we're back to regular merge-commits for now unfortunately.

If not, are you ok with the commits made so far being amended?

Absolutely! Or if you prefer you can also manually squash and fix it that way, that's fine too.

Due to the way this spring integration library was extracted from a bigger project, there are virtually no tests, as this library was tested via domain-specific integration tests.

### Acknowledgements

* The initial work on this was done by Gabriel Pickl (later taken up and continued by me).
* This code has been developed by members of the Research Studio Smart Applications Technologies (https://www.researchstudio.at/studio/smart-applications-technologies/) in the project 'BIM - interoperables Merkmalservice' led by the Österreichische Bautechnik Vereinigung (ÖBV, https://bautechnik.pro).
* This work was supported by Basisprogramm of the Austrian Research Promotian Agency (FFG, https://ffg.at).
* Adds license header
* Uses `RDF4J` instead of `Rdf4J` for class names
* Uses `RDF4JException` as the base class of all custom exceptions
@fkleedorfer
Copy link
Contributor Author

BTW, I don't know what is up with the Eclipse ECA. I signed the agreement using my committer email address, but now when I try to sign it again, just to be sure, I get Error, An error has occured and your Eclipse Contributor Agreement was not submitted. (ECA-04). Maybe that has something to do with the workflow failing?

@abrokenjester
Copy link
Contributor

abrokenjester commented Aug 5, 2021

BTW, I don't know what is up with the Eclipse ECA. I signed the agreement using my committer email address, but now when I try to sign it again, just to be sure, I get Error, An error has occured and your Eclipse Contributor Agreement was not submitted. (ECA-04). Maybe that has something to do with the workflow failing?

Looks like there is a known issue with the ECA database at the moment - see https://bugs.eclipse.org/bugs/show_bug.cgi?id=566859 . Fwiw I can manually check the ECA database and the address on your commits is not currently on file, but if I read that bug report correctly that might be a sync issue. Let's keep an eye on that bugzilla ticket and once that's resolved give it another go.

* fixes minor documentation bugs
* tests reveal that result caching did not work as expected because queries do not report identical hashCodes for identical queries. Fixed, but not perfectly
@abrokenjester abrokenjester linked an issue Aug 8, 2021 that may be closed by this pull request
@abrokenjester
Copy link
Contributor

I have submitted a zip of the current state of this PR as the initial contribution for IP review.

@abrokenjester
Copy link
Contributor

ECA check now succeeds!

@abrokenjester abrokenjester added the ✋ CQ-Pending requires a CQ to be approved label Aug 12, 2021
also renamed and refactored of TransactionData (now TransactionObject)
@fkleedorfer
Copy link
Contributor Author

I think I am done with the first round of the tests now.

A few notes:

I feel like the RDF4JCRUDDao class might need some refacoring still, it feels a bit weird to implement it: all those methods with default implementations that throw exceptions should probably just be removed so subclasses must implement them. However, I don't have the time to do that right now.

The 'reference project' mentioned in the PR description is more like a set of integration tests, but they fulfill the purpose.

A shortcoming of the current implementation is that RDF4JTemplate does not support ASK queries.

I haven't done any work with quads using this framework - we may need to refactor for everything to work with quads smoothly.

@fkleedorfer fkleedorfer marked this pull request as ready for review August 13, 2021 09:23
import org.eclipse.rdf4j.sparqlbuilder.graphpattern.GraphPattern;
import org.eclipse.rdf4j.sparqlbuilder.util.SparqlBuilderUtils;

public class Bind implements GraphPattern {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're keeping this as part of this PR (which I'm fine with) we should move it to the sparqlbuilder module I think.

BindingSet currentBindings = getDelegate().getBindings();
// TODO: this might be pretty slow due to the toString() call. Is there a better way to get
// a hash for a query with minmal risk of collision ?
Integer cacheKey = currentBindings.hashCode() + getDelegate().toString().hashCode();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeenbroekstra maybe you can help me here. I am trying to generate cacheKey in such a way that it is always the same if the query and the bindings are identical. Until today I thought that getDelegate().hashCode() was stable across multiple query instances, but it seems it isn't. (When running the unit tests, the delegate is a SailTupleQuery. getDelegate().toString() is stable, but I looked into the implementation and I think it could be quite slow. Is there a good way to do this? If not, could we devise one, like adding a getQueryHash method in the Query interface?

Copy link
Contributor

@abrokenjester abrokenjester Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkleedorfer I must not have been paying attention before, as I wasn't really aware this PR introduced query result caching in this fashion. Sorry about that, my personal circumstances at the moment make it hard for me to spend as much time on RDF4J as I'd like.

Before we dive into the cache key mechanics: can you elaborate a little on what problem this cache mechanism solves? What exactly are we caching (and why), and what is the scope/duration of that cache? How is this cache controlled (e.g. how do we take care it does not get out of sync with the underlying repository)? Is any of this specific to Spring, or should we consider making this a general, separate component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to explain what it does in org/eclipse/rdf4j/spring/resultcache/package-info.java. The problem it solves is answering multiple identical queries issued by the application over time. The cache is connection-specific and is cleared upon connection close or any writes. If global caching is enabled, a global cache is also kept that is cleared whenever the application (any connection) writes.
The cache is configured via org.eclipse.rdf4j.spring.resultcache.ResultCacheProperties, which provides a bit more insight as to how to control it.

To follow what is happening, look at org.eclipse.rdf4j.spring.resultcache.CachingRepositoryConnection.prepareTupleQuery().

It is not strongly spring-related, but to extract it to another package might uproot other parts of the rdf4j-spring module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The items that are cached are org.eclipse.rdf4j.spring.resultcache.CachedGraphQueryResult and org.eclipse.rdf4j.spring.resultcache.CachedTupleQueryResult. The whole result, basically.

Copy link
Contributor Author

@fkleedorfer fkleedorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move code to sparqlbuilder module?
... See code comments in Bind.java

import org.eclipse.rdf4j.sparqlbuilder.graphpattern.GraphPattern;
import org.eclipse.rdf4j.sparqlbuilder.util.SparqlBuilderUtils;

public class Bind implements GraphPattern {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll gladly move the code. However, there is much more in the constraint package in this module, including the whole propertypath subpackage. I suggest we move all of that to the sparqlbuilder module, keeping the package structure as is. Ok?

Adds
* Bind
* NotIn
* PSequence
* PZeroOrMore
* propertypath package with Property path builder
* Utility/Generator methods to Expressions class
* ExtendedVariable to core
* some tests for the above
@fkleedorfer
Copy link
Contributor Author

Just checking in to see if there is anything I can do at this point to help push this PR over the line...

@abrokenjester abrokenjester removed the ✋ CQ-Pending requires a CQ to be approved label Oct 20, 2021
@abrokenjester
Copy link
Contributor

Just checking in to see if there is anything I can do at this point to help push this PR over the line...

Unfortunately several of us (including myself) have had less time to spend on RDF4J than we'd like, for various reasons - so this PR has probably not got the attention it deserves. I'll try and get some eyes on your latest changes this weekend.

Generally speaking: I'm anxious to get this merged, assuming you've addressed the feedback sofar and we can get the build checks green we should be good.

I'm not sure why it's currently reporting the build checks as canceled/failed, and for some odd reason I seem to have no option to retry the build on my end. Could you push another commit (perhaps do a rebase on develop, or just an empty commit) to trigger another build and ping me with a comment as soon as you've done it? I may need to "unlock" the github action as github still considers you a "new' contributor.

@abrokenjester
Copy link
Contributor

Alternatively, if you give me write access to your fork, I can do a push and take care of this myself as well.

@fkleedorfer
Copy link
Contributor Author

@jeenbroekstra I just gave you write access to the fork. Thanks for taking the time, anyway!

@abrokenjester
Copy link
Contributor

Looks like we have some compilation failures in the Sparqlbuilder code.

@fkleedorfer
Copy link
Contributor Author

Oh? So sorry, will deal with it ASAP!

..forgot to delete their unnecessary usages
@fkleedorfer
Copy link
Contributor Author

The build should work now.

@abrokenjester
Copy link
Contributor

All good - that's what we have the build checks for :) Bit annoying that I still have to approve the build checks manually when it's your commit. As soon as we have something of yours merged in, that should hopefully become a bit easier.

Copy link
Contributor

@abrokenjester abrokenjester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this being merged in, great work @fkleedorfer !

There's probably some followup iterations that need to happen with some further code shifting/cleaning, and we'll need to tackle some user documentation as well - but those things can be taken care of in followup PRs.

@fkleedorfer
Copy link
Contributor Author

Thanks, happy to contribute!

@abrokenjester abrokenjester merged commit 09f313e into eclipse-rdf4j:develop Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Contribution of a Spring/RDF4J integration
2 participants