Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REST endpoint for mappings [WIP] #803

Merged
merged 5 commits into from
Jan 30, 2019
Merged

Conversation

danmichaelo
Copy link
Contributor

@danmichaelo danmichaelo commented Oct 1, 2018

In the Skosmos UI, the mapping targets are enriched with labels and notations. It would be useful to also expose this data through the REST API, so I made an attempt of implementing this.

Since I think it would be good if the next revision of the API could align with JSKOS (#152), I tried to make a JSKOS-compatible output for this endpoint. It's a little bit verbose since JSKOS can encode more complex mapping relations than SKOS can. The "memberSet" level for instance is great for compound mappings, but a bit superfluous when such mappings are not supported, as is the case with SKOS. Still, it's good if we can align with some kind of standard, and JSKOS seems like a good one to me.

What do you think about adding this method to the v1-api?

At the moment, the endpoint can only return mappings for a single concept, but I guess we could also have it return all mappings when no concept is specified.

Btw. are you ok with adding type hinting? Since Skosmos is now PHP >= 7.0, it should be ok to add type hinting for both classes and primitives, right?

@danmichaelo danmichaelo force-pushed the mappings-api branch 3 times, most recently from 6432aae to 961b64f Compare October 1, 2018 16:39
@kinow
Copy link
Collaborator

kinow commented Oct 1, 2018

Ohh, sounds interesting! I will check out your branch to see what it looks like and give it some testing either today or this week, and then leave some feedback here. Probably on the same day I look at #802 (thanks for doing that btw)

@danmichaelo
Copy link
Contributor Author

One thing that annoys me a little bit is having to specify the vocabulary id in the URL -- it seems a bit redundant to specify both vocid and uri:

/<vocid>/mappings?uri=<uri>

But since the getConceptInfo() method is defined on Vocabulary, I didn't find an easy way to avoid it.

@osma
Copy link
Member

osma commented Oct 2, 2018

@danmichaelo Think of vocid as identifying the named graph where to look for information about the URI. It's possible to have several vocabularies that use the same URI, for example different versions of the same vocabulary. The concept URI alone is not always sufficient to determine the vocabulary, although Skosmos sometimes does that (e.g. guessVocabularyFromURI method).

@danmichaelo danmichaelo changed the title Add REST endpoint for mappings [WIP] Add REST endpoint for mappings Oct 25, 2018
@danmichaelo danmichaelo changed the title Add REST endpoint for mappings Add REST endpoint for mappings [WIP] Oct 25, 2018
@osma
Copy link
Member

osma commented Nov 20, 2018

Thanks for the PR. This is really interesting as we would also need this kind of REST API method to implement more intelligent, dynamic loading of mappings shown on the concept page (see #817).

Regarding JSKOS, I think it makes sense to build on that, even though the rest of the Skosmos REST API predates JSKOS and doesn't conform to it.

It should be no problem to extend the v1 API. The rule is that existing methods should not be changed in incompatible ways, but adding new methods is fine.

Type hinting is very welcome!

@kinow
Copy link
Collaborator

kinow commented Dec 12, 2018

Type hinting is very welcome!

Hooray! 🎉

@kinow
Copy link
Collaborator

kinow commented Dec 19, 2018

Just finished setting up YSO Places on my local environment. Using the concept Eurasia (http://finto.fi/yso-paikat/en/page/p125927), I get the following in my local environment from the current code when it retrieve mappings:

[
  'skos:closeMatch': { # ConceptProperty
    'prop': 'skos:closeMatch',
    'super': null,
    'label': null,
    'values': [
      { # ConceptMappingPropertyValue
        'type': 'skos:closeMatch',
        'clan': 'en',
        'labelcache': [ "http://id.loc.gov/.../sh12345" ],
        'vocab': { ... # Vocabulary },
        'order': [],
        'model': { ... # Model },
        'resource': { ... # EasyRdf/Resource }
      }
    ],
    'is_sorted': false,
    'sort_by_notation': false
  }
]

The twig template uses the following data to render the appendix, with the mapping properties.

ConceptProperty.type
ConceptProperty.description
ConceptProperty.label
ConceptProperty.values (iterate through ConceptMappingPropertyValue)
ConceptMappingPropertyValue.uri
ConceptMappingPropertyValue.label
ConceptMappingPropertyValue.exvocab
ConceptMappingPropertyValue.notation
ConceptMappingPropertyValue.vocabname

The code on this pull request, returns the following instead:

[
  {
    "type": [
      "skos:closeMatch"
    ],
    "from": {
      "memberSet": [
        {
          "uri": "http://www.yso.fi/onto/yso/p125927"
        }
      ]
    },
    "to": {
      "memberSet": [
        {
          "uri": "http://www.wikidata.org/entity/Q5401",
          "prefLabel": {
            "de": "Eurasien"
          }
        }
      ]
    },
    "fromScheme": {
      "uri": "http://www.yso.fi/onto/yso/places/"
    }
  },
  {
    "type": [
      "skos:exactMatch"
    ],
    "from": {
      "memberSet": [
        {
          "uri": "http://www.yso.fi/onto/yso/p125927"
        }
      ]
    },
    "to": {
      "memberSet": [
        {
          "uri": "http://www.yso.fi/onto/allars/Y31177",
          "prefLabel": {
            "sv": "Eurasien"
          }
        }
      ]
    },
    "fromScheme": {
      "uri": "http://www.yso.fi/onto/yso/places/"
    }
  },
  {
    "type": [
      "skos:exactMatch"
    ],
    "from": {
      "memberSet": [
        {
          "uri": "http://www.yso.fi/onto/yso/p125927"
        }
      ]
    },
    "to": {
      "memberSet": [
        {
          "uri": "http://www.yso.fi/onto/ysa/Y125927",
          "prefLabel": {
            "-": "ysa:Y125927"
          }
        }
      ]
    },
    "fromScheme": {
      "uri": "http://www.yso.fi/onto/yso/places/"
    },
    "toScheme": {
      "uri": ""
    }
  }
]

Which contains the exact structure of the mappings. The current page takes around 20 seconds to load in my workstation. Without mapping properties, it takes 1-2 seconds. And this new REST endpoint takes around 10 seconds.

Q/ should we add more information in this endpoint, and return the data necessary for displaying the mapping properties, or should we add this one as-is, and then add another one to retrieve (one by one, or in batch) more data about the concept mapping?

Cheers
Bruno

@osma
Copy link
Member

osma commented Dec 19, 2018

Thanks - especially for the numbers!

It would be good to analyze in a bit more detail why the REST query takes 10 seconds, or why the current page takes 20 seconds. It's probably due to these:

  1. the time it takes to retrieve the entity data from Wikidata
  2. the time it takes to parse the response from Wikidata
  3. the time it takes to serialize the graph (including the triples from Wikidata) into JSON-LD for embedding on the page

Are you using APCu caching? If you do (see InstallTutorial in the wiki for how to enable it), the response from Wikidata should be cached, possibly saving some time, perhaps even a lot, on subsequent requests. It is stored as an EasyRdf Graph object into the APCu cache.

Q/ should we add more information in this endpoint, and return the data necessary for displaying the mapping properties, or should we add this one as-is, and then add another one to retrieve (one by one, or in batch) more data about the concept mapping?

My current idea is that by default, this method should only return the labels (as it currently seems to do), but there should be either a boolean parameter, or a separate method, for returning also the full graph that was retrieved from Wikidata or another external source. This is necessary for example so that the Wikipedia widget can work, as it currently relies on the JSON-LD graph embedded in HTML concept pages.

@kinow
Copy link
Collaborator

kinow commented Dec 20, 2018

Hi @osma

It would be good to analyze in a bit more detail why the REST query takes 10 seconds, or why the current page takes 20 seconds.

Good idea. Enabled xdebug profiler, then analyzed its output with PhpStorm. Here's a short analysis plus screen shots.

  • The REST request to http://localhost:8000/rest/v1/ysopaikat/mappings?uri=http%3A%2F%2Fwww.yso.fi%2Fonto%2Fyso%2Fp125927 took 28.133 seconds.
  • 97.4% of this time was spent in Concept::getMappingProperties()
  • In Concept::getMappingProperties, Model::getResourceFromUri is consuming most of the time, around 77%
  • Half of the total time of this request was spent in EasyRdf\Parser\NTriples::parse, and the rest spread among other smaller methods, but the larger part of the remaining time is on EasyRdf\Http\Client::request

image

Without the getMappingProperties, the total time for the request would be under 2 seconds. With the profiler on, the request takes a bit more, but I believe that's expected.

This behaviour appears to be consistent with the current mapping properties rendered via Twig.

image

77.9% of the total time spent in Concept::getMappingProperties, with near 50% of the total time spent on EasyRdf\Graph::parse, and a considerable amount of time spent on the easyrdf client request, some PHP fgets, etc.

image

Are you using APCu caching?

Not yet. I am planning on tackling the issue without the cache first, measure the enhancements, then confirm that with cache it loads faster once the cache is created, and that everything else works OK.

My current idea is that by default, this method should only return the labels (as it currently seems to do), but there should be either a boolean parameter, or a separate method, for returning also the full graph that was retrieved from Wikidata or another external source. This is necessary for example so that the Wikipedia widget can work, as it currently relies on the JSON-LD graph embedded in HTML concept pages.

So I think there are three cases.

The current work on this pull request, with the relationships and labels.

The case of when the data is rendered with Twig, and Concept::getMappingProperties() returns the structure with label, description, type, and values (which brings more data, like its label, vocab, uri, etc).

And the data used by the Wikipedia widget, which I believe is a combination of the data above, plus the model/graph loaded on the server side (next task is exactly understand how the Wikipedia widget works).

@kinow
Copy link
Collaborator

kinow commented Dec 20, 2018

I think there are a few parts of the code missing update to match changes in constructors. Will use the IDE to see if I find more, but here's one that I got after I installed the wikipedia plugin, and browsed Central Asia concept from YSO Places, which contains Wikipedia data

image

Fixed locally with

diff --git a/model/Concept.php b/model/Concept.php
index 40a853f..5da0615 100644
--- a/model/Concept.php
+++ b/model/Concept.php
@@ -589,7 +589,7 @@ class Concept extends VocabularyDataObject
                         // checking if the property value is not in the current vocabulary
                         $exvoc = $this->model->guessVocabularyFromURI($val->getUri());
                         if ($exvoc && $exvoc->getId() !== $this->vocab->getId()) {
-                            $ret[$prop]->addValue(new ConceptMappingPropertyValue($this->model, $this->vocab, $val, $prop, $this->clang), $this->clang);
+                            $ret[$prop]->addValue(new ConceptMappingPropertyValue($this->model, $this->vocab, $val, null, $prop, $this->clang), $this->clang);
                             continue;
                         }
                         $ret[$prop]->addValue(new ConceptPropertyValue($this->model, $this->vocab, $val, $prop, $this->clang), $this->clang);
diff --git a/model/ConceptMappingPropertyValue.php b/model/ConceptMappingPropertyValue.php
index 133dbb4..46642dc 100644
--- a/model/ConceptMappingPropertyValue.php
+++ b/model/ConceptMappingPropertyValue.php
@@ -23,7 +23,7 @@ class ConceptMappingPropertyValue extends VocabularyDataObject
      * @param string $prop       Mapping property
      * @param ?string $clang     Preferred label language (nullable)
      */
-    public function __construct(Model $model, Vocabulary $vocab, Resource $target, Resource $source, string $prop, $clang = '')
+    public function __construct(Model $model, Vocabulary $vocab, Resource $target, Resource $source = null, string $prop, $clang = '')
     {
         parent::__construct($model, $vocab, $target);
         $this->source = $source;

@osma
Copy link
Member

osma commented Jan 28, 2019

The current work on this pull request, with the relationships and labels.

Right. This PR looks OK so I will merge it soon, but I will wait a bit so we can discuss the next steps.

The case of when the data is rendered with Twig, and Concept::getMappingProperties() returns the structure with label, description, type, and values (which brings more data, like its label, vocab, uri, etc).

I think it should be possible to accommodate the additional information needed for displaying the mappings on the page. I think that the only additional information required for the display of mappings (currently handled by Twig, but should be moved into client side JS) is the name of the target concept scheme, e.g. LCSH or Wikidata. That could easily be added to the toScheme concept scheme object in the JSKOS structure.

And the data used by the Wikipedia widget, which I believe is a combination of the data above, plus the model/graph loaded on the server side (next task is exactly understand how the Wikipedia widget works).

Yes, this is more complicated, but I think we can postpone this a little bit so we at least get the mappings endpoint merged first.

I think there are a few parts of the code missing update to match changes in constructors.

Can you make a separate PR for this?

@kinow
Copy link
Collaborator

kinow commented Jan 29, 2019

I think there are a few parts of the code missing update to match changes in constructors.

Can you make a separate PR for this?

Sure! Will probably prepare the PR after this one gets merged to avoid re-work in case thsi one changes.

@kinow
Copy link
Collaborator

kinow commented Jan 29, 2019

I think it should be possible to accommodate the additional information needed for displaying the mappings on the page. I think that the only additional information required for the display of mappings (currently handled by Twig, but should be moved into client side JS) is the name of the target concept scheme, e.g. LCSH or Wikidata. That could easily be added to the toScheme concept scheme object in the JSKOS structure.

That's done in a branch in my repository - or at least I'm getting the same UI features through JS from what I remember. The only issue I still have, is that some of the information required is not - as far as I could tell - in the JSKOS specification. So if one user/dev wanted to be pedantic, s/he could complain about the extra information polluting the response (I think?).

@osma
Copy link
Member

osma commented Jan 29, 2019

The only issue I still have, is that some of the information required is not - as far as I could tell - in the JSKOS specification. So if one user/dev wanted to be pedantic, s/he could complain about the extra information polluting the response (I think?).

Well JSKOS is just a JSON-LD profile and in JSON-LD, one can add custom fields, so I don't think this should be a problem. Can you open another PR from your branch with those changes? Or should I just merge this first and then let you do a subsequent PR with your additions?

@kinow
Copy link
Collaborator

kinow commented Jan 29, 2019

Or should I just merge this first and then let you do a subsequent PR with your additions?

I think if you and @danmichaelo are OK with the change in this PR, it might be easier to merge this one first. Even though I found one method signature that needs changing, everything else works, and it does not break any existing feature. So +1 from me for merging it.

@osma osma merged commit 38f056a into NatLibFi:master Jan 30, 2019
@osma
Copy link
Member

osma commented Jan 30, 2019

Big thanks to both @danmichaelo and @kinow. Now merged! Let's continue with more PRs as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants