KnarQL Route and Documentation #620

tobiasschweizer · 2017-09-29T07:21:58Z

refactor extended search (now called KnarQL):
- permission checking for the full query path: suppress whole resource in case of insufficient permissions for at least one resource or value asked for in the WHERE clause
- if necessary, suppress resource in answer by showing a proxy resource instead
- specify property values in the CONSTRUCT clause that should be returned in the answer: not all values contained in the WHERE clause have to be returned
add ApacheLuceneSupport: Utility that pre-processes a search string so that it supports Apache Lucene Parser Syntax. Different use cases: search as you type (search for rdf:label), fulltext search (use Boolean AND)
add documentation (RST and TypeScript)
add tests

Attention: this PR may bore you.

TODO:

add support for phrases: Add support for Apache Lucene Phrases in Fulltext search #641
check that ORDER BY criterion has cardinality 1: check cardinality of ORDER BY Criterion in extended search #642

closes #598
closes #530

relates to #10
relates to #22

…s that pre-process a given search string in order to support Apache Lucene Parser Syntax

…ues for the main resource if the user asks for it

…dent resources if no props are requested for main resource

tobiasschweizer · 2017-10-02T12:08:24Z

If a link property has more than one instance and a query variable is used as an object (represening the resource referred to), the row is repeated with he current design (for the same main resource several rows are returned). Consider the following example:

    PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
    PREFIX incunabula: <http://api.knora.org/ontology/incunabula/simple/v2#>
    
    CONSTRUCT {
        ?linkObj knora-api:isMainResource true .
        
    } WHERE {
        ?linkObj a knora-api:Resource .
        ?linkObj a knora-api:LinkObj .
        
        ?linkObj knora-api:hasLinkTo ?book .
        knora-api:hasLinkTo knora-api:objectType knora-api:Resource .
        
        ?book a knora-api:Resource .
        ?book a incunabula:book . 
        
    }

In Knora, there are three link objects that point to incunabula:books. However, the answer returned by Knora looks like this:

{
  "@type" : "schema:ItemList",
  "schema:itemListElement" : [ {
    "@id" : "http://data.knora.org/881405205304",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Lateinische Übersetzung"
  }, {
    "@id" : "http://data.knora.org/881405205304",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Lateinische Übersetzung"
  }, {
    "@id" : "http://data.knora.org/ab79ffa43935",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "DEMO_"
  }, {
    "@id" : "http://data.knora.org/cb1a74e3e2f6",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Diese drei Texte sind in einem Band zusammengebunden."
  }, {
    "@id" : "http://data.knora.org/cb1a74e3e2f6",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Diese drei Texte sind in einem Band zusammengebunden."
  }, {
    "@id" : "http://data.knora.org/cb1a74e3e2f6",
    "@type" : "knora-api:LinkObj",
    "schema:name" : "Diese drei Texte sind in einem Band zusammengebunden."
  } ],
  "schema:numberOfItems" : 3,
  "@context" : {
    "schema" : "http://schema.org/",
    "knora-api" : "http://api.knora.org/ontology/knora-api/v2#",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#"
  }
}

Looking at the prequery explains this behaviour:

 SELECT DISTINCT ?linkObj ?book
 WHERE {
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?linkObj <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#LinkObj> .
 ?linkObj <http://www.knora.org/ontology/knora-base#hasLinkTo> ?book .
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?book <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/incunabula#book> .
 }
 
 ORDER BY ASC(?linkObj) ASC(?book)
 OFFSET 0
 LIMIT 25

?book represents several dependent resources, not just one (unlike ?linkObj). Hence the result looks like this (repeating the main resource):

?linkObj	?book
<http://data.knora.org/881405205304>	<http://data.knora.org/5e77e98d2603>
<http://data.knora.org/881405205304>	<http://data.knora.org/8be1b7cf7103>
<http://data.knora.org/ab79ffa43935>	<http://data.knora.org/c5058f3a>
<http://data.knora.org/cb1a74e3e2f6>	<http://data.knora.org/21abac2162>
<http://data.knora.org/cb1a74e3e2f6>	<http://data.knora.org/c5058f3a>
<http://data.knora.org/cb1a74e3e2f6>	<http://data.knora.org/e41ab5695c>

Proposed solution: Group By main resource and the sorting criteria (otherwise Order By does not work because the order by criterion is not bound), use Group_Concat to aggregate dependent resource Iris:

SELECT ?linkObj  (GROUP_CONCAT(?book; separator=';') as ?bookColl)
 WHERE {
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?linkObj <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?linkObj <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#LinkObj> .
 ?linkObj <http://www.knora.org/ontology/knora-base#hasLinkTo> ?book .
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?book <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?book <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/incunabula#book> .
 }
 
 GROUP BY ?linkObj # append more sorting criteria here if given 
 ORDER BY ASC(?linkObj)
 OFFSET 0
 LIMIT 25

Result:

?linkObj	?bookColl
<http://data.knora.org/881405205304>	http://data.knora.org/8be1b7cf7103;http://data.knora.org/5e77e98d2603
<http://data.knora.org/ab79ffa43935>	http://data.knora.org/c5058f3a
<http://data.knora.org/cb1a74e3e2f6>	http://data.knora.org/c5058f3a;http://data.knora.org/e41ab5695c;http://data.knora.org/21abac2162

tobiasschweizer · 2017-10-02T12:24:30Z

An incoming query for StillImageRepresentations would look like this:

 SELECT DISTINCT ?page
 WHERE {
 ?page <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     ?page <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?page <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#StillImageRepresentation> .
 ?page <http://www.knora.org/ontology/knora-base#isPartOf> <http://data.knora.org/8be1b7cf7103> .
 <http://data.knora.org/8be1b7cf7103> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
 GRAPH <http://www.ontotext.com/explicit> {
     <http://data.knora.org/8be1b7cf7103> <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?page <http://www.knora.org/ontology/knora-base#seqnum> ?seqnum .
 GRAPH <http://www.ontotext.com/explicit> {
     ?seqnum <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 ?page <http://www.knora.org/ontology/knora-base#hasStillImageFileValue> ?file .
 GRAPH <http://www.ontotext.com/explicit> {
     ?file <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
 }
 GRAPH <http://www.ontotext.com/explicit> {
     ?seqnum <http://www.knora.org/ontology/knora-base#valueHasInteger> ?seqnum__valueHasInteger .
 }
 }
 
 GROUP BY ?page ?seqnum__valueHasInteger
 ORDER BY ASC(?seqnum__valueHasInteger) ASC(?page)
 OFFSET 0
 LIMIT 25

Proposed enhancement: We could try to include the value object Iris in the results (also using Group_Concat since there may be several instances). This would make it possible to return exactly those instances of a property in the answer that match the given criteria without having to repeat this logic.

Actually the presence of these value object Iris would allow for permission checking too. If the user does not have sufficient permission for all given value object Iris, the whole resource should be suppressed, even though the user might have permissions to see the resource.

…ing a dependent resource may return more than one Iri, thus GROUP_CONCAT has to be used

…by in transformer

… input query's WHERE clause

…prop from linking prop variable

…in resource

…query returned results

…ry path

…tter

…by myself in the very close future) restriction of values to value object Iris (returned by the prequery) in CONSTRUCT query

… forbidden resource

…tandoff if necessary

…CONSTRUCT clause of the query

…oes want to be returned (query's Construct clause) - this approach seems very promising to me at the moment and I feel like a great mind. I hope this makes the person feel better that has to debug this code one day, and I cannot exclude that this will be me.

benjamingeer · 2017-10-24T10:27:20Z

We should also explain what a Knora property is.

mrivoal · 2017-10-24T12:06:55Z

May I ask the reason for supporting schema.org more than any other ontology? Referencing through research engines?
Otherwise is it so widespread within the research data or humanities communities? Because to me, it looks like it is something huge (nearly 600 classes and 850 properties!!) but does it mean it is the most accurate ontology for us? Why schema.org over DBpedia ontology, for example?

And are you not afraid that it will bring (unnecessary?) complexity to the API?

tobiasschweizer · 2017-10-24T12:32:16Z

This is surely a legitimate question. My idea was just that it would be nice to re-use some existing semantics, so we would not have to make up our own response format for a resource request or a query. Of course, we can discuss about whether schema.org makes sense in our case or not.

We basically need something to represent a sequence of resources. Later, we are going to need something for values too (including the history of a value).

tobiasschweizer · 2017-10-24T12:46:57Z

We need to explain that rdfs:label is represented by schema:namein the API, and that we may, in the future, provide other options.

done in d6d8a96

mrivoal · 2017-10-24T13:19:24Z

This is surely a legitimate question. My idea was just that it would be nice to re-use some existing semantics, so we would not have to make up our own response format for a resource request or a query.

I see, now. And I can only agree.

Of course, we can discuss about whether schema.org makes sense in our case or not.

What would bother me with schema.org (I don't know it well at all, I must confess) is that it is a very generic ontology and it covers a lot of things.

I am not sure of what we would need exactly, but I guess I would be more confortable with ontologies dedicated to data and datasets description, such as DCAT or Void. But I am not sure one of these ontologies would fit all our needs and I am not quite up-to-date on the subject, so I have no better suggestion to make. Besides I am sure you have already delved further into schema.org.

benjamingeer · 2017-10-24T15:15:16Z

ForbiddenResource needs to be added to the API ontologies.

benjamingeer · 2017-10-24T15:23:26Z

@mrivoal I'm not sure that schema.org is the best option either, but in any case we're only using a little bit of it. The idea is that this could facilitate making Knora query results show up in search engines, which is something I could imagine a project wanting to do. For now we are just using two things from schema.org. Let's see how it goes and revisit this issue later.

benjamingeer

I think this really needs a chapter in docs/design-documentation, giving an overview of how SearchResponderV2 works.

benjamingeer · 2017-10-24T15:40:38Z

webapi/src/main/scala/org/knora/webapi/responders/ResponderV2.scala

+/**
+  * An abstract class with utility methods for v2 responders.
+  */
+abstract class ResponderV2 extends Responder {


Please rename ResponderV2 to something more specific, like ResourceTextResponder, and have only the responders that really need getMappingsFromQueryResultsSeparated subclass it.

fixed in 0c13561

benjamingeer · 2017-10-24T15:41:52Z

webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala

@@ -1159,15 +1939,12 @@ class SearchResponderV2 extends Responder {
                transformer = triplestoreSpecificQueryPatternTransformerSelect
            )

-            // _ = println(triplestoreSpecificPrequery.toSparql)
+            _ = println(triplestoreSpecificPrequery.toSparql)


Please comment this out, I forgot to do so.

also fixed in 0c13561

subotic · 2017-10-24T15:53:35Z

@mrivoal schema.org (only a small fraction) is used as a description of the container holding the API V2 responses. The actual response content is using another ontology.

We could easily create our own ontology, by only taking the parts we need from schema.org or any other ontology for that matter.

The idea is that this could facilitate making Knora query results show up in search engines, which is something I could imagine a project wanting to do.

The question is if Google and co are going to actually use this data since according to the Google Structured Data Testing Tool our responses are not valid:

mrivoal · 2017-10-24T16:08:40Z

Thanks @benjamingeer and @subotic for the explanations.

I totally understand the point of reusing parts of an ontology that would fit our needs instead of creating a new one.

The idea is that this could facilitate making Knora query results show up in search engines, which is something I could imagine a project wanting to do.

Sure!

The question is if Google and co are going to actually use this data since according to the Google Structured Data Testing Tool our responses are not valid:

As for this point, I pass :)

musicEnfanthen · 2017-10-25T10:05:58Z

Regarding schema.org: From my experiences with other DH projects, it could become the next "big thing" in terms of standardisation (even in musicology projects it is discussed already, have a look at the IncipitSearch project of the Akademie der Wissenschaften Mainz). I agree with @mrivoal that it is very flat and not very specific. Another point to be aware of is surely, that it is Google, Amazon and the like, who are the drivers behind schema.org. But on the other side this is also its strength because of a wide dissemination and reuse of the same ontology structures. So, being conform (or at least being able to map) to schema.org wouldn't be the worst thing one can do.

@subotic : The response in Google Structured Data Testing Tool fails only in those parts where the response does not follow the schema.org specification. As far as I understand, the failing properties will be ignored in search algorithms. If you want to get rid of the red crosses, this could be a possible way (but I don't know if this would still be conform with Knora?):

There is also a DataCatalog and DataSet in the schema specification. The acknowledgement section says:

This class is based upon W3C DCAT work, and benefits from collaboration around the DCAT, ADMS and VoID vocabularies. See http://www.w3.org/wiki/WebSchemas/Datasets for full details and mappings.

Maybe one could use schema:DataCatalog or schema:DataSet as generic entry point for the metadata description which can then be filled up in depth with the structures from DCAT or VoID.

…hema:Thing.

…g/Thing

benjamingeer · 2017-10-25T16:12:51Z

@musicEnfanthen Thank you very much for these suggestions, we will have a look.

subotic · 2017-10-26T09:42:31Z

@musicEnfanthen Yes, I'm afraid that for Google and Co. to actually use the data, it would need to completely conform to schema.org. This can probably be done, but not for the regular API we use Salsah to communicate with. We could have our routes return our data completely in schema.org or our own schemas, depending on the client. We already have ApiV2WithValueObjects and ApiV2Simple as response schemas. Now we could add a third one ApiV2SchemaOrg (or something like this).

subotic · 2017-10-26T09:46:21Z

There is currently an issue with Travis and pulling of images from Docker Hub: "We’re investigating reports of timeouts in builds while pulling images from Docker Hub."

That's why the integration tests on Travis are failing at the moment.

tobiasschweizer · 2017-10-26T11:21:04Z

@subotic I would like to merge this PR. How can I do this (it is blocked).

tobiasschweizer · 2017-10-26T11:30:35Z

@subotic Thanks!

* develop: fix (webapi): update lucene index on sparql update when using graphdb-free (#633) KnarQL Route and Documentation (#620) feature (extended search V1): support Boolean value in extended search V1 (#643) Make the hostname of project-specific API v2 ontologies configurable (#631) build (travis): deactivate browser tests (#640) test (salsah): add headless browser testing on Travis (#590) Serve an ontology when its IRI is dereferenced (#616) fix (webapi): When requested languages aren't available, take the first one in alphabetical order (#627). (#628) Use cardinalities to get referenced ontologies for XML import schemas. (#617) docs (webapi): add description (#622)

refactor (create utility for Apache Lucene support): provide function…

84788de

…s that pre-process a given search string in order to support Apache Lucene Parser Syntax

tobiasschweizer self-assigned this Sep 29, 2017

Tobias Schweizer added 4 commits September 29, 2017 10:53

refactor (add some TODOs)

17b8f93

refactor (do not query properties if not requested): only request val…

b2069a0

…ues for the main resource if the user asks for it

refactor (do not query dependent resources if possible): ignore depen…

ae77506

…dent resources if no props are requested for main resource

refactor (make function for property type info collector)

251b776

tobiasschweizer and others added 22 commits October 2, 2017 14:28

Merge branch 'develop' into wip/count_query

b90ae01

feature (test for properties with several instances): add test query

c766682

refactor (group concat dependent resource Iris): a variable represent…

a0479bf

…ing a dependent resource may return more than one Iri, thus GROUP_CONCAT has to be used

Merge branch 'develop' into wip/count_query

0fd7b90

refactor (transform Construct query to Select query): generate group …

bdd3a28

…by in transformer

refactor (access main and dependent resource vars from the transform)

92df517

feature (extended search v2): return Iris of value objects present in…

0e3f696

… input query's WHERE clause

reactor (extended search test queries beol specific)

a5408cb

feature (extended search v2): create variable to restrict link value …

cd87e0c

…prop from linking prop variable

refactor (extended search v2): collect dependent resource Iris per ma…

d2b29dc

…in resource

refactor (extended search v2): only query resources for which the pre…

5757078

…query returned results

refactor (extended search v2): check for the presence of the full que…

45415e0

…ry path

refactor (extended search v2): simplify since property Iris do not ma…

f82052f

…tter

refactor (extended search v2): presumably genius (until proven wrong …

d25e694

…by myself in the very close future) restriction of values to value object Iris (returned by the prequery) in CONSTRUCT query

feature (extended search v2): represent suppressed resources with the…

355a7f9

… forbidden resource

Merge branch 'develop' into wip/count_query

2bd815c

refactor (fulltext search v2): use prequery in order to support paging

e14f2de

refactor (search v2): return standoff with search results

07e415f

refactor (search v2): only include statements for value objects and s…

1b75c1c

…tandoff if necessary

refactor (extended search v2): only return the values present in the …

c7f3d9f

…CONSTRUCT clause of the query

test (extended search v2)

4fb5a8f

docs (rst): fix indentation problem

690c66f

docs (V2): schema.org support

d6d8a96

Benjamin Geer added 2 commits October 24, 2017 17:20

Merge branch 'develop' into wip/count_query

d85f8f5

docs (webapi): Revise KnarQL docs.

cf25783

benjamingeer suggested changes Oct 24, 2017

View reviewed changes

refactor (rename ResponderV2)

0c13561

Benjamin Geer and others added 3 commits October 25, 2017 16:13

docs (webapi): Correct relationship between knora-api:Resource and sc…

7b35152

…hema:Thing.

refactor (knora-api): make knora-api:Resource a subclass of schema.or…

196a6b0

…g/Thing

tests (adapt test data): add schema:Thing

93e45f1

benjamingeer approved these changes Oct 25, 2017

View reviewed changes

tobiasschweizer merged commit b93a145 into develop Oct 26, 2017

tobiasschweizer deleted the wip/count_query branch October 26, 2017 11:30

mrivoal mentioned this pull request Nov 10, 2017

Institution class in project #661

Open

benjamingeer mentioned this pull request Nov 17, 2017

Use rdfs:label instead of schema:name in JSON-LD representing resources #669

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KnarQL Route and Documentation #620

KnarQL Route and Documentation #620

tobiasschweizer commented Sep 29, 2017 •

edited

Loading

tobiasschweizer commented Oct 2, 2017 •

edited

Loading

tobiasschweizer commented Oct 2, 2017 •

edited

Loading

benjamingeer commented Oct 24, 2017

mrivoal commented Oct 24, 2017

tobiasschweizer commented Oct 24, 2017

tobiasschweizer commented Oct 24, 2017

mrivoal commented Oct 24, 2017

benjamingeer commented Oct 24, 2017

benjamingeer commented Oct 24, 2017

benjamingeer left a comment

benjamingeer Oct 24, 2017

tobiasschweizer Oct 25, 2017

benjamingeer Oct 24, 2017

tobiasschweizer Oct 25, 2017

subotic commented Oct 24, 2017

mrivoal commented Oct 24, 2017

musicEnfanthen commented Oct 25, 2017

benjamingeer commented Oct 25, 2017

subotic commented Oct 26, 2017

subotic commented Oct 26, 2017 •

edited

Loading

tobiasschweizer commented Oct 26, 2017

tobiasschweizer commented Oct 26, 2017

KnarQL Route and Documentation #620

KnarQL Route and Documentation #620

Conversation

tobiasschweizer commented Sep 29, 2017 • edited Loading

tobiasschweizer commented Oct 2, 2017 • edited Loading

tobiasschweizer commented Oct 2, 2017 • edited Loading

benjamingeer commented Oct 24, 2017

mrivoal commented Oct 24, 2017

tobiasschweizer commented Oct 24, 2017

tobiasschweizer commented Oct 24, 2017

mrivoal commented Oct 24, 2017

benjamingeer commented Oct 24, 2017

benjamingeer commented Oct 24, 2017

benjamingeer left a comment

Choose a reason for hiding this comment

benjamingeer Oct 24, 2017

Choose a reason for hiding this comment

tobiasschweizer Oct 25, 2017

Choose a reason for hiding this comment

benjamingeer Oct 24, 2017

Choose a reason for hiding this comment

tobiasschweizer Oct 25, 2017

Choose a reason for hiding this comment

subotic commented Oct 24, 2017

mrivoal commented Oct 24, 2017

musicEnfanthen commented Oct 25, 2017

benjamingeer commented Oct 25, 2017

subotic commented Oct 26, 2017

subotic commented Oct 26, 2017 • edited Loading

tobiasschweizer commented Oct 26, 2017

tobiasschweizer commented Oct 26, 2017

tobiasschweizer commented Sep 29, 2017 •

edited

Loading

tobiasschweizer commented Oct 2, 2017 •

edited

Loading

tobiasschweizer commented Oct 2, 2017 •

edited

Loading

subotic commented Oct 26, 2017 •

edited

Loading