Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Externally controlled vocabulary support #7712

Closed
wants to merge 22 commits into from

Conversation

janvanmansum
Copy link
Contributor

@janvanmansum janvanmansum commented Mar 23, 2021

Contributors:

What this PR does / why we need it: adds support for externally controlled vocabularies to compound-fields with the following four subfields: Vocabulary, Vocabulary URL, Term, Term URL. The vocabulary server must provide a SKOSMOS
interface.

Which issue(s) this PR closes:

Closes #7711

Special notes for your reviewer:

Known problems/open issue:

  • JSON configuration lay-out should be cleaned up, bundling related settings into objects.
  • Add ability to configure a vocabulary URL per vocabulary.
  • If a vocabulary is pre-filled, then it is not possible to leave the entire compound field empty, because the vocabulary URL is always filled in.
  • If you select a term from the list, the Term URL is automatically filled in. After that you can change the term, resulting in a term that does not correspond with the term URL.

Suggestions on how to test this:

How to test this

  1. Clone, built and deploy this war file

  2. Apply sql statements to update Dataverse database:
    psql dvndb -f cv-update.sql
    Warning: dvndb is postgresql database, it can be different in your installation.
    cv-update.sql.zip

  3. Add this custom metadata block (cvocdemo.tsv) to Dataverse
    See: http://guides.dataverse.org/en/latest/admin/metadatacustomization.html for an explanation of the steps to perform.
    cvocdemo.tsv.zip

  4. Enable the 'cvocDemo Metadata' Metadata Block for the dataverse where you are creating the dataset to test with

  5. Configure Dataverse to use the external-cvoc feature on the metdata block. The configuration is in a json file (cvoc-conf.json). Run the following command to load the config into Dataverse

    curl -H "Content-Type: application/json" -X PUT \ 
              -d @cvoc-conf.json http://localhost:8080/api/admin/settings/:CVocConf

    cvoc-conf.json.zip You can change the config and reload it to try different settings. For instance using only the unesco vocabulary with "vocabs":["unesco"], instead of "vocabs":["unesco","stw","agrovoc"],.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

The Dataset creation form will now contain the following extra block. Without the configuration to connect it to the controlled vocabulary service it looks like this:
Screenshot cvocDome-empty

With one vocabulary (unesco), it looks like this:
Screenshot cvocDemo-one-vocab

With several vocabularies:
Screenshot cvocDemo-multi-vocabs

Is there a release notes update needed for this change?: ?

Additional documentation:

Design

image

The cvoc-enabled compound field communicates via the SKOSMOS protocol with a vocabulary server to retrieve matching vocabularies, terms and their correspoding URIs.

Example config

[
   {   
     "vocab-name":"cvocDemo",
     "minChars": 1,
     "cvoc-url":"https://skosmos.dev.finto.fi/",
     "language":"en",
     "js-url":"/resources/js/cvoc-interface.js",
     "protocol": "skosmos",
     "vocab-uri" : "http://skos.um.es/unescothes/CS000", 
     "term-parent-uri": "",
     "vocabs":["unesco","stw","agrovoc"],
     "vocab-codes": ["cvocDemoVocabulary", "cvocDemoTerm", "cvocDemoTermURI", "cvocDemoVocabularyURI"]
   }
]

Description of the configuration settings

  • vocab-name
    Name of the compound metadata field (Need to check usage in the code)

  • cvoc-url
    URL of the external vocabulary service that will provide the terms and URI's in the autocomplete
    Must now be a SKOSMOS service, because that API is the only one supported by the cvoc-interface.js code; see "js-url" property.

  • language
    Specifies the language for the terms (prefLabel in SKOS) to be retrieved.
    By default it's using the language of the server (calling BundleUtil.getDefaultLocale().getLanguage())

  • js-url
    Optional URL of the JavaScript file that does the Ajax request. A default JavaScript is included in Dataverse.

  • protocol
    Note that only "skosmos" is supported in the cvoc-interface.js. The default is also "skosmos".

  • vocab-uri
    The URI of the vocab, this is now only a single value but should be an array corresponding with the "vocabs" array. (Note that it is an URL we want to put in that input field)

  • term-parent-uri
    The URI of the 'parent' concept acting as a filter for the terms to be retrieved allowing only narrower terms. This enables us to use a 'branch' of a large vocabulary without constructing separate vocabularies for subsets.

  • vocabs
    The list of vocabularies. This should be the 'identifying' names in SKOSMOS.

  • vocab-codes
    The input field names as specified in the metadata block. The order of the fields must be; vocabulary, term, termURI, VocabularyURI.

  • minChars
    Number of characters that must be typed in the 'term' field for the AJAX 'search' request to be sent. Use this with large vocabularies and set it to more characters (1 is usually enough) if the server does not respond fast enough. When it is 0 (default), the complete list will dropdown when the up or down arrow keys are typed.

  • readonly
    Specify if the URL fields should be readonly. Also the vocabulary field is readonly when there is only one vocab configured. The default is false, allowing to type anything into the fields.

  • hideReadonlyUrls
    When in 'readonly' the URL fields are now also hidden on the form. This makes the form more compact and simpler. The default is false, showing all the fields.

ekoi and others added 19 commits January 20, 2021 13:27
Merge branch 'develop' into external-cvoc
Fix problem where pressing tab skips some fields.
* - Add Vocabulary URL field
- Add parent term url
- Min search letter 1
- Add scrollbar for long list

* - Add Vocabulary URL field
- Add parent term url
- Min search letter 1
- Add scrollbar for long list

* Remove unused key setting.

Co-authored-by: Paul Boon <paul.boon@dans.knaw.nl>
DD-375 Disable editing of the cvoc URL fields
* Added minChars option to the cvoc configuration, default 0, ignoring negative numbers

* Added log info for minChars option

* Added option to hide cvoc URL fields when readonly and some javascript refactoring
* Uses the vocab-uri parameter from the cvoc config to store in the metadata

* Removed commented-out code fragment
Merge branch 'develop' into external-cvoc
Added the cvoc-interface.js file to the application resources
@janvanmansum janvanmansum marked this pull request as draft March 23, 2021 13:42
Copy link
Contributor

@poikilotherm poikilotherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I overall enjoy the approach shown, let me add some thoughts:

  • It's good that we are only rendering the JS part during edits, as there is no caching etc.
  • We are relying on a new UI library here. IMHO we should try to reuse Primefaces components to stay consistent.
  • There seem to be some architectural drawbacks how the vocabulary configuration is getting into JSF, that could be easily optimized.
  • Instead of using database stored JSON once more, let's go for some MicroProfile Config API... (or leave this as an exercise for later, if we want to move faster)

import edu.harvard.iq.dataverse.engine.command.impl.PublishDataverseCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand;
import edu.harvard.iq.dataverse.engine.command.exception.IllegalCommandException;
import edu.harvard.iq.dataverse.engine.command.impl.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, globbing imports are disregarded - IQSS preferes the explicit style. Looks like this happened (just like the other rearrangements) by accident - many IDEs do this automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekoi, fyi

import edu.harvard.iq.dataverse.util.JsfHelper;
import static edu.harvard.iq.dataverse.util.JsfHelper.JH;
import static edu.harvard.iq.dataverse.util.StringUtil.isEmpty;
import edu.harvard.iq.dataverse.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, comment on globbing imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekoi, fyi

import javax.faces.component.UIInput;

import javax.faces.event.AjaxBehaviorEvent;
import javax.json.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, comment on globbing imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekoi, fyi

import java.io.*;
import java.sql.Timestamp;
import java.text.SimpleDateFormat;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, comment on globbing imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekoi, fyi

import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServletResponse;
import javax.validation.ConstraintViolation;
import java.io.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, comment on globbing imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekoi, fyi

if (cv.termParentUri != "")
termParentUri = "&parent=" + cv.termParentUri;
var result = [];
var tmp = $.ajax({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use standard Web API like https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API, which is cross browser supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poikilotherm, the interface could be implemented using any popular Javascript framework, it should be placed preferably outside of the Dataverse core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@4tikhonov maybe I didn't make my point clear enough: the Fetch API is a browser builtin Javascript API around since ~2015. These days, it's supported everywhere and makes using a framework obsolete.

By using the W3C standardised web API implemented in any browser, you can completely avoid relying on a framework, which makes your JS scripts extremly portable and compatible with any framework you're using. There is no need to reimplement or change the code to reuse it when using different frameworks.

This is in no way using sth. like a Dataverse REST endpoint or anything like it - all of this happens client side in your browser. I hope this gets a bit clearer now. 😸

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cvoc-interface.js is just client side code included with a script element. When placed outside of the Dataverse code it is more like configuration of the connection to the external services (only SKOSMOS for now). We complied it in the war file just to have a default implementation that works without introducing the extra work of serving a json resource on the server, with the added apache conf settings changes. When externalising we would have to do a bit more explaining of the 'interface', which now has to be inferred from the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cvoc-interface.js is just client side code included with a script element.

Aye.

When placed outside of the Dataverse code it is more like configuration of the connection to the external services (only SKOSMOS for now).

Wait - "like configuration"? The configuration this code uses to interface with the external service is stored inside Dataverse (the database). I'm not sure what you mean here then? As far as I understood it, you are trying to follow the code against interface principle, which is great.

We complied it in the war file just to have a default implementation that works without introducing the extra work of serving a json resource on the server, with the added apache conf settings changes.

Yes please, it's good to have a default in place which serves the most usual use cases. Thank you for this!

When externalising we would have to do a bit more explaining of the 'interface', which now has to be inferred from the code.

That would be marvelous! Maybe you could add sth to the developer guide of Dataverse?

From your comment I still don't understand why relying on the jQuery UI framework is a necessity for this AJAX call, when the same functionality is provided by any (recent) browser out there, allowing better reuse and less bundled dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @poikilotherm, just for your information, we've originally placed the interface outside of Dataverse core to make it more flexible https://github.com/Dans-labs/semantic-gateway/blob/main/static/js/interface.js

})
};

function autoexample (request, response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example code should be commented out IMHO.


// autocomplete calls this
function autointerface(request, response, cv, mapping) {
var protocol = cv.protocol;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this makes it extensible. Very nice.

var mapid = "#{cvoc.get(dsfv.datasetField.datasetFieldType.parentDatasetFieldType.name).getMapId()}";
var mapping = { query: mapquery, id: mapid };
// Using jQuery UI Autocomplete for the term input
$("#akmi_#{valCount.index+1}_#{cvoc.get(dsfv.datasetField.datasetFieldType.parentDatasetFieldType.name).getKeys()[1]}").find("input[name*='cv_term_']").autocomplete(
Copy link
Contributor

@poikilotherm poikilotherm Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we reuse a Primefaces Autocomplete component, attaching to their completeEndpoint REST retrieval and/or their onSearch/onFocus/onKeypress/... JS hook?

https://primefaces.github.io/primefaces/10_0_0/#/components/autocomplete

This ensures that we use the same look and feel, do not rely on including another huge JS library and can upgrade more easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primefaces framework works only for JSF. The initial idea with jQuery to make it more generic, "widget like" and force the CV component integration into other popular data repositories and annotation tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you create the interface method for this? During the call you mentioned we could use the terms lookup with React, Vue or other frameworks, too. So why not use the framework in place for Dataverse (Primefaces) and still have the ability to reuse in other places?

This method based on jQuery UI is introducing a different UI component, that needs it's own CSS etc, thus being unaligned with the rest. It's not me deciding if this technical debt is fine or not. If the UI/UX team and lead architect are OK with this for the sake of getting things done faster and improve in another step, that's perfectly fine for me. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love jQuery, but pulling in the UI lib, while it conflicts with the look-and-feel of Primefaces UI is not optimal. We should definitely have a look at the alternative suggested by @poikilotherm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulBoon, sure, please consider how we can release jQuery in the next release, I don't have any objections about that.

if (cvocSetting == null || cvocSetting.isEmpty())
return cvocMap;

JsonReader jsonReader = Json.createReader(new StringReader(settingsService.getValueForKey(SettingsServiceBean.Key.CVocConf)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either make this live inside edu.harvard.iq.dataverse.util.json.JsonParser (meh) or use proper JSON-B. Or just don't use JSON, making use of MicroProfile Config API instead (which is much easier to use in non-classic deployment scenarios).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekoi, something to consider.

@4tikhonov
Copy link
Contributor

While I overall enjoy the approach shown, let me add some thoughts:

  • It's good that we are only rendering the JS part during edits, as there is no caching etc.
  • We are relying on a new UI library here. IMHO we should try to reuse Primefaces components to stay consistent.
  • There seem to be some architectural drawbacks how the vocabulary configuration is getting into JSF, that could be easily optimized.
  • Instead of using database stored JSON once more, let's go for some MicroProfile Config API... (or leave this as an exercise for later, if we want to move faster)

This design choice to store JSON is related to the external application we've developed to manage the configuration of external controlled vocabularies. I'm fine with having MicroProfile Config here but only as extra feature. I'm expecting administrators will use GUI to adjust config parameters.

@poikilotherm poikilotherm mentioned this pull request May 17, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Externally controlled vocabulary support
7 participants