Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KnarQL Route and Documentation #620

Merged
merged 65 commits into from
Oct 26, 2017
Merged

KnarQL Route and Documentation #620

merged 65 commits into from
Oct 26, 2017

Conversation

tobiasschweizer
Copy link
Contributor

@tobiasschweizer tobiasschweizer commented Sep 29, 2017

  • refactor extended search (now called KnarQL):
    • permission checking for the full query path: suppress whole resource in case of insufficient permissions for at least one resource or value asked for in the WHERE clause
    • if necessary, suppress resource in answer by showing a proxy resource instead
    • specify property values in the CONSTRUCT clause that should be returned in the answer: not all values contained in the WHERE clause have to be returned
  • add ApacheLuceneSupport: Utility that pre-processes a search string so that it supports Apache Lucene Parser Syntax. Different use cases: search as you type (search for rdf:label), fulltext search (use Boolean AND)
  • add documentation (RST and TypeScript)
  • add tests

Attention: this PR may bore you.

TODO:

closes #598
closes #530

relates to #10
relates to #22

…s that pre-process a given search string in order to support Apache Lucene Parser Syntax
@tobiasschweizer tobiasschweizer self-assigned this Sep 29, 2017
@tobiasschweizer
Copy link
Contributor Author

tobiasschweizer commented Oct 2, 2017

If a link property has more than one instance and a query variable is used as an object (represening the resource referred to), the row is repeated with he current design (for the same main resource several rows are returned). Consider the following example:

    PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
    PREFIX incunabula: <http://api.knora.org/ontology/incunabula/simple/v2#>
    
    CONSTRUCT {
        ?linkObj knora-api:isMainResource true .
        
    } WHERE {
        ?linkObj a knora-api:Resource .
        ?linkObj a knora-api:LinkObj .
        
        ?linkObj knora-api:hasLinkTo ?book .
        knora-api:hasLinkTo knora-api:objectType knora-api:Resource .
        
        ?book a knora-api:Resource .
        ?book a incunabula:book . 
        
    }

In Knora, there are three link objects that point to incunabula:books. However, the answer returned by Knora looks like this:

{
  "@type" : "schema:ItemList",
  "schema:itemListElement" : [ {
    "@id" : "http://data.knora.org/881405205304",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Lateinische Übersetzung"
  }, {
    "@id" : "http://data.knora.org/881405205304",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Lateinische Übersetzung"
  }, {
    "@id" : "http://data.knora.org/ab79ffa43935",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "DEMO_"
  }, {
    "@id" : "http://data.knora.org/cb1a74e3e2f6",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Diese drei Texte sind in einem Band zusammengebunden."
  }, {
    "@id" : "http://data.knora.org/cb1a74e3e2f6",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Diese drei Texte sind in einem Band zusammengebunden."
  }, {
    "@id" : "http://data.knora.org/cb1a74e3e2f6",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Diese drei Texte sind in einem Band zusammengebunden."
  } ],
  "schema:numberOfItems" : 3,
  "@context" : {
    "schema" : "http://schema.org/",
    "knora-api" : "http://api.knora.org/ontology/knora-api/v2#",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
  }
}

Looking at the prequery explains this behaviour:

 SELECT DISTINCT ?linkObj ?book
 WHERE {
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?linkObj <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#LinkObj> .
 ?linkObj <http://www.knora.org/ontology/knora-base#hasLinkTo> ?book .
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?book <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/incunabula#book> .
 }
 
 ORDER BY ASC(?linkObj) ASC(?book)
 OFFSET 0
 LIMIT 25

?book represents several dependent resources, not just one (unlike ?linkObj). Hence the result looks like this (repeating the main resource):

?linkObj	?book
<http://data.knora.org/881405205304>	<http://data.knora.org/5e77e98d2603>
<http://data.knora.org/881405205304>	<http://data.knora.org/8be1b7cf7103>
<http://data.knora.org/ab79ffa43935>	<http://data.knora.org/c5058f3a>
<http://data.knora.org/cb1a74e3e2f6>	<http://data.knora.org/21abac2162>
<http://data.knora.org/cb1a74e3e2f6>	<http://data.knora.org/c5058f3a>
<http://data.knora.org/cb1a74e3e2f6>	<http://data.knora.org/e41ab5695c>

Proposed solution: Group By main resource and the sorting criteria (otherwise Order By does not work because the order by criterion is not bound), use Group_Concat to aggregate dependent resource Iris:

SELECT ?linkObj  (GROUP_CONCAT(?book; separator=';') as ?bookColl)
 WHERE {
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?linkObj <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#LinkObj> .
 ?linkObj <http://www.knora.org/ontology/knora-base#hasLinkTo> ?book .
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?book <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/incunabula#book> .
 }
 
 GROUP BY ?linkObj # append more sorting criteria here if given 
 ORDER BY ASC(?linkObj)
 OFFSET 0
 LIMIT 25

Result:

?linkObj	?bookColl
<http://data.knora.org/881405205304>	http://data.knora.org/8be1b7cf7103;http://data.knora.org/5e77e98d2603
<http://data.knora.org/ab79ffa43935>	http://data.knora.org/c5058f3a
<http://data.knora.org/cb1a74e3e2f6>	http://data.knora.org/c5058f3a;http://data.knora.org/e41ab5695c;http://data.knora.org/21abac2162

@tobiasschweizer
Copy link
Contributor Author

tobiasschweizer commented Oct 2, 2017

An incoming query for StillImageRepresentations would look like this:

 SELECT DISTINCT ?page
 WHERE {
 ?page <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?page <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?page <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#StillImageRepresentation> .
 ?page <http://www.knora.org/ontology/knora-base#isPartOf> <http://data.knora.org/8be1b7cf7103> .
 <http://data.knora.org/8be1b7cf7103> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     <http://data.knora.org/8be1b7cf7103> <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?page <http://www.knora.org/ontology/knora-base#seqnum> ?seqnum .
 GRAPH <http://www.ontotext.com/explicit> {
     ?seqnum <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?page <http://www.knora.org/ontology/knora-base#hasStillImageFileValue> ?file .
 GRAPH <http://www.ontotext.com/explicit> {
     ?file <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 GRAPH <http://www.ontotext.com/explicit> {
     ?seqnum <http://www.knora.org/ontology/knora-base#valueHasInteger> ?seqnum__valueHasInteger .
 }
 }
 
 GROUP BY ?page ?seqnum__valueHasInteger
 ORDER BY ASC(?seqnum__valueHasInteger) ASC(?page)
 OFFSET 0
 LIMIT 25

Proposed enhancement: We could try to include the value object Iris in the results (also using Group_Concat since there may be several instances). This would make it possible to return exactly those instances of a property in the answer that match the given criteria without having to repeat this logic.

Actually the presence of these value object Iris would allow for permission checking too. If the user does not have sufficient permission for all given value object Iris, the whole resource should be suppressed, even though the user might have permissions to see the resource.

tobiasschweizer and others added 22 commits October 2, 2017 14:28
…ing a dependent resource may return more than one Iri, thus GROUP_CONCAT has to be used
…by myself in the very close future) restriction of values to value object Iris (returned by the prequery) in CONSTRUCT query
…oes want to be returned (query's Construct clause)

- this approach seems very promising to me at the moment and I feel like a great mind. I hope this makes the person feel better that has to debug this code one day, and I cannot exclude that this will be me.
@benjamingeer
Copy link

We should also explain what a Knora property is.

@mrivoal
Copy link

mrivoal commented Oct 24, 2017

May I ask the reason for supporting schema.org more than any other ontology? Referencing through research engines?
Otherwise is it so widespread within the research data or humanities communities? Because to me, it looks like it is something huge (nearly 600 classes and 850 properties!!) but does it mean it is the most accurate ontology for us? Why schema.org over DBpedia ontology, for example?

And are you not afraid that it will bring (unnecessary?) complexity to the API?

@tobiasschweizer
Copy link
Contributor Author

This is surely a legitimate question. My idea was just that it would be nice to re-use some existing semantics, so we would not have to make up our own response format for a resource request or a query. Of course, we can discuss about whether schema.org makes sense in our case or not.

We basically need something to represent a sequence of resources. Later, we are going to need something for values too (including the history of a value).

@tobiasschweizer
Copy link
Contributor Author

We need to explain that rdfs:label is represented by schema:namein the API, and that we may, in the future, provide other options.

done in d6d8a96

@mrivoal
Copy link

mrivoal commented Oct 24, 2017

This is surely a legitimate question. My idea was just that it would be nice to re-use some existing semantics, so we would not have to make up our own response format for a resource request or a query.

I see, now. And I can only agree.

Of course, we can discuss about whether schema.org makes sense in our case or not.

What would bother me with schema.org (I don't know it well at all, I must confess) is that it is a very generic ontology and it covers a lot of things.

I am not sure of what we would need exactly, but I guess I would be more confortable with ontologies dedicated to data and datasets description, such as DCAT or Void. But I am not sure one of these ontologies would fit all our needs and I am not quite up-to-date on the subject, so I have no better suggestion to make. Besides I am sure you have already delved further into schema.org.

@benjamingeer
Copy link

ForbiddenResource needs to be added to the API ontologies.

@benjamingeer
Copy link

@mrivoal I'm not sure that schema.org is the best option either, but in any case we're only using a little bit of it. The idea is that this could facilitate making Knora query results show up in search engines, which is something I could imagine a project wanting to do. For now we are just using two things from schema.org. Let's see how it goes and revisit this issue later.

Copy link

@benjamingeer benjamingeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really needs a chapter in docs/design-documentation, giving an overview of how SearchResponderV2 works.

/**
* An abstract class with utility methods for v2 responders.
*/
abstract class ResponderV2 extends Responder {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename ResponderV2 to something more specific, like ResourceTextResponder, and have only the responders that really need getMappingsFromQueryResultsSeparated subclass it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 0c13561

@@ -1159,15 +1939,12 @@ class SearchResponderV2 extends Responder {
transformer = triplestoreSpecificQueryPatternTransformerSelect
)

// _ = println(triplestoreSpecificPrequery.toSparql)
_ = println(triplestoreSpecificPrequery.toSparql)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment this out, I forgot to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also fixed in 0c13561

@subotic
Copy link
Collaborator

subotic commented Oct 24, 2017

@mrivoal schema.org (only a small fraction) is used as a description of the container holding the API V2 responses. The actual response content is using another ontology.

We could easily create our own ontology, by only taking the parts we need from schema.org or any other ontology for that matter.

The idea is that this could facilitate making Knora query results show up in search engines, which is something I could imagine a project wanting to do.

The question is if Google and co are going to actually use this data since according to the Google Structured Data Testing Tool our responses are not valid:

screenshot 2017-10-24 17 47 05

@mrivoal
Copy link

mrivoal commented Oct 24, 2017

Thanks @benjamingeer and @subotic for the explanations.

I totally understand the point of reusing parts of an ontology that would fit our needs instead of creating a new one.

The idea is that this could facilitate making Knora query results show up in search engines, which is something I could imagine a project wanting to do.

Sure!

The question is if Google and co are going to actually use this data since according to the Google Structured Data Testing Tool our responses are not valid:

As for this point, I pass :)

@musicEnfanthen
Copy link
Contributor

Regarding schema.org: From my experiences with other DH projects, it could become the next "big thing" in terms of standardisation (even in musicology projects it is discussed already, have a look at the IncipitSearch project of the Akademie der Wissenschaften Mainz). I agree with @mrivoal that it is very flat and not very specific. Another point to be aware of is surely, that it is Google, Amazon and the like, who are the drivers behind schema.org. But on the other side this is also its strength because of a wide dissemination and reuse of the same ontology structures. So, being conform (or at least being able to map) to schema.org wouldn't be the worst thing one can do.

@subotic : The response in Google Structured Data Testing Tool fails only in those parts where the response does not follow the schema.org specification. As far as I understand, the failing properties will be ignored in search algorithms. If you want to get rid of the red crosses, this could be a possible way (but I don't know if this would still be conform with Knora?):

screenshot-2017-10-25 test-tool fur strukturierte daten

There is also a DataCatalog and DataSet in the schema specification. The acknowledgement section says:

This class is based upon W3C DCAT work, and benefits from collaboration around the DCAT, ADMS and VoID vocabularies. See http://www.w3.org/wiki/WebSchemas/Datasets for full details and mappings.

Maybe one could use schema:DataCatalog or schema:DataSet as generic entry point for the metadata description which can then be filled up in depth with the structures from DCAT or VoID.

@benjamingeer
Copy link

@musicEnfanthen Thank you very much for these suggestions, we will have a look.

@subotic
Copy link
Collaborator

subotic commented Oct 26, 2017

@musicEnfanthen Yes, I'm afraid that for Google and Co. to actually use the data, it would need to completely conform to schema.org. This can probably be done, but not for the regular API we use Salsah to communicate with. We could have our routes return our data completely in schema.org or our own schemas, depending on the client. We already have ApiV2WithValueObjects and ApiV2Simple as response schemas. Now we could add a third one ApiV2SchemaOrg (or something like this).

@subotic
Copy link
Collaborator

subotic commented Oct 26, 2017

There is currently an issue with Travis and pulling of images from Docker Hub: "We’re investigating reports of timeouts in builds while pulling images from Docker Hub."

That's why the integration tests on Travis are failing at the moment.

@tobiasschweizer
Copy link
Contributor Author

@subotic I would like to merge this PR. How can I do this (it is blocked).

@tobiasschweizer tobiasschweizer merged commit b93a145 into develop Oct 26, 2017
@tobiasschweizer
Copy link
Contributor Author

@subotic Thanks!

@tobiasschweizer tobiasschweizer deleted the wip/count_query branch October 26, 2017 11:30
SepidehAlassi added a commit that referenced this pull request Oct 31, 2017
* develop:
  fix (webapi): update lucene index on sparql update when using graphdb-free (#633)
  KnarQL Route and Documentation (#620)
  feature (extended search V1): support Boolean value in extended search V1 (#643)
  Make the hostname of project-specific API v2 ontologies configurable (#631)
  build (travis): deactivate browser tests (#640)
  test (salsah): add headless browser testing on Travis (#590)
  Serve an ontology when its IRI is dereferenced (#616)
  fix (webapi): When requested languages aren't available, take the first one in alphabetical order (#627). (#628)
  Use cardinalities to get referenced ontologies for XML import schemas. (#617)
  docs (webapi): add description (#622)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor extended search v2 Implement a DSL for API v2 extended search
5 participants