Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8085 decoupling tsv #8086

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 45 additions & 43 deletions doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -364,49 +364,51 @@ Each of the three main sections own sets of properties:
#controlledVocabulary (enumerated) properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+-----------------------+-----------------------+-----------------------+
| **Property** | **Purpose** | **Allowed values and |
| | | restrictions** |
+-----------------------+-----------------------+-----------------------+
| DatasetField | Specifies the | Must reference an |
| | #datasetField to which| existing |
| | this entry applies. | #datasetField. |
| | | As a best practice, |
| | | the value should |
| | | reference a |
| | | #datasetField in the |
| | | current metadata |
| | | block definition. (It |
| | | is technically |
| | | possible to reference |
| | | an existing |
| | | #datasetField from |
| | | another metadata |
| | | block.) |
+-----------------------+-----------------------+-----------------------+
| Value | A short display | Free text |
| | string, representing | |
| | an enumerated value | |
| | for this field. If | |
| | the identifier | |
| | property is empty, | |
| | this value is used as | |
| | the identifier. | |
+-----------------------+-----------------------+-----------------------+
| identifier | A string used to | Free text |
| | encode the selected | |
| | enumerated value of a | |
| | field. If this | |
| | property is empty, | |
| | the value of the | |
| | “Value” field is used | |
| | as the identifier. | |
+-----------------------+-----------------------+-----------------------+
| displayOrder | Control the order in | Non-negative integer. |
| | which the enumerated | |
| | values are displayed | |
| | for selection. | |
+-----------------------+-----------------------+-----------------------+
.. list-table::
:widths: 10 5 40 40 5
:header-rows: 1
:align: left

* - | Property
| (Column header)
- Column index
- Purpose
- Allowed values and restrictions
- Mandatory
* - ``#controlledVocabulary``
- 0
- Intentionally left blank
- (none)
- Y
* - ``DatasetField``
- 1
- References the ``#datasetField`` to which this entry applies.
- Must reference an existing ``#datasetField``.

As a best practice, the value should reference a ``#datasetField`` in the current metadata block definition.

(It is technically possible to reference an existing ``#datasetField`` from another metadata block.)
- Y
* - ``Value``
- 2
- A short display string, representing an enumerated value for this field. If the identifier property is empty, this value is used as the identifier.
- Free text
- Y
* - ``identifier``
- 3
- A string used to encode the selected enumerated value of a field. If this property is empty, the value of the ``Value`` field is used as the ``identifier``.
- Either an URL, an URI or free text matching ASCII characters, digits and ``+``, ``-``, ``_``
- N
* - ``displayOrder``
- 4
- Control the order in which the enumerated values are displayed for selection.
- Non-negative integer
- Y
* - ``altValue``
- 5..n
- Provide alternative values for this entry. Column may be repeated as often as necessary.
- Free text
- N

FieldType definitions
~~~~~~~~~~~~~~~~~~~~~
Expand Down
7 changes: 7 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,13 @@
<scope>provided</scope>
</dependency>

<!-- CSV, TSV, fixed width parsing & bean data binding -->
<dependency>
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
<version>2.9.1</version>
</dependency>

<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion scripts/api/data/metadatablocks/citation.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
originOfSources Origin of Sources For historical materials, information about the origin of the sources and the rules followed in establishing the sources should be specified. textbox 75 FALSE FALSE FALSE FALSE FALSE FALSE citation
characteristicOfSources Characteristic of Sources Noted Assessment of characteristics and source material. textbox 76 FALSE FALSE FALSE FALSE FALSE FALSE citation
accessToSources Documentation and Access to Sources Level of documentation of the original sources. textbox 77 FALSE FALSE FALSE FALSE FALSE FALSE citation
#controlledVocabulary DatasetField Value identifier displayOrder
#controlledVocabulary DatasetField Value identifier displayOrder altValue
subject Agricultural Sciences D01 0
subject Arts and Humanities D0 1
subject Astronomy and Astrophysics D1 2
Expand Down
2 changes: 1 addition & 1 deletion scripts/api/data/metadatablocks/geospatial.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
eastLongitude East Longitude Easternmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -180,0 <= East Bounding Longitude Value <= 180,0. text 8 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
northLongitude North Latitude Northernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= North Bounding Latitude Value <= 90,0. text 9 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
southLongitude South Latitude Southernmost coordinate delimiting the geographic extent of the Dataset. A valid range of values, expressed in decimal degrees, is -90,0 <= South Bounding Latitude Value <= 90,0. text 10 FALSE FALSE FALSE FALSE FALSE FALSE geographicBoundingBox geospatial
#controlledVocabulary DatasetField Value identifier displayOrder
#controlledVocabulary DatasetField Value identifier displayOrder altValue altValue altValue altValue
country Afghanistan 0
country Albania 1
country Algeria 2
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/

package edu.harvard.iq.dataverse;

import com.univocity.parsers.annotations.Parsed;
import com.univocity.parsers.annotations.Validate;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.metadata.Placeholder;
import org.apache.commons.lang3.StringUtils;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Comparator;
import java.util.Locale;
import java.util.Objects;
import java.util.logging.Logger;
import java.util.MissingResourceException;
import java.util.stream.Collectors;
import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
Expand All @@ -38,11 +37,41 @@ public class ControlledVocabularyValue implements Serializable {

private static final Logger logger = Logger.getLogger(ControlledVocabularyValue.class.getCanonicalName());

public static final Comparator<ControlledVocabularyValue> DisplayOrder = new Comparator<ControlledVocabularyValue>() {
@Override
public int compare(ControlledVocabularyValue o1, ControlledVocabularyValue o2) {
return Integer.compare( o1.getDisplayOrder(), o2.getDisplayOrder() );
}};
/**
* Identifiers are used to match either URLs (Term), URIs (PID) or string containing only A-Z, a-z, 0-9, _, + and -
* (If no identifier is set, the value will be used, so it may contain spaces in the end. But IF you provide
* an identifier, you do it for good reasons. Any real identifiers out there don't contain whitespace for a reason)
*/
public static final String IDENTIFIER_MATCH_REGEX = "^(\\w+:(\\/\\/)?[\\w\\-+&@#/%?=~|!:,.;]*[\\w\\-+&@#/%=~|]|[\\w\\-\\+]+)$";
public static final Comparator<ControlledVocabularyValue> DisplayOrder = Comparator.comparingInt(ControlledVocabularyValue::getDisplayOrder);

public enum Headers {
DATASET_FIELD(Constants.DATASET_FIELD),
VALUE(Constants.VALUE),
IDENTIFIER(Constants.IDENTIFIER),
DISPLAY_ORDER(Constants.DISPLAY_ORDER),
ALT_VALUES(Constants.ALT_VALUES);

public static final class Constants {
public final static String DATASET_FIELD = "DatasetField";
public final static String VALUE = "Value";
public final static String IDENTIFIER = "identifier";
public final static String DISPLAY_ORDER = "displayOrder";
public final static String ALT_VALUES = "altValue";
}

private final String key;
Headers(String key) {
this.key = key;
}
public String key() {
return this.key;
}

public static String[] keys() {
return Arrays.stream(values()).map(Headers::key).collect(Collectors.toUnmodifiableList()).toArray(new String[]{});
}
}

public ControlledVocabularyValue() {
}
Expand Down Expand Up @@ -71,9 +100,11 @@ public void setId(Long id) {
public String getStrValue() {
return strValue;
}

@Parsed(field = Headers.Constants.VALUE)
@Validate
public void setStrValue(String strValue) {
this.strValue = strValue;

}

private String identifier;
Expand All @@ -82,15 +113,29 @@ public String getIdentifier() {
return identifier;
}

@Parsed(field = Headers.Constants.IDENTIFIER)
@Validate(nullable = true, matches = IDENTIFIER_MATCH_REGEX)
public void setIdentifier(String identifier) {
this.identifier = identifier;
}



private int displayOrder;
public int getDisplayOrder() { return this.displayOrder;}
public void setDisplayOrder(int displayOrder) {this.displayOrder = displayOrder;}
public int getDisplayOrder() {
return this.displayOrder;
}
public void setDisplayOrder(int displayOrder) {
this.displayOrder = displayOrder;
}
/**
* Set display order value from String. Allow only positive integers >= 0.
* @param displayOrder
*/
@Parsed(field = Headers.Constants.DISPLAY_ORDER)
@Validate(matches = "^\\d+$")
public void setDisplayOrder(String displayOrder) {
this.displayOrder = Integer.parseInt(displayOrder);
}


@ManyToOne
Expand All @@ -102,6 +147,13 @@ public DatasetFieldType getDatasetFieldType() {
public void setDatasetFieldType(DatasetFieldType datasetFieldType) {
this.datasetFieldType = datasetFieldType;
}

@Parsed(field = Headers.Constants.DATASET_FIELD)
@Validate(matches = DatasetFieldType.FIELD_NAME_REGEX)
private void setDatasetFieldType(String datasetFieldType) {
this.datasetFieldType = new Placeholder.DatasetFieldType();
this.datasetFieldType.setName(datasetFieldType);
}

@OneToMany(mappedBy = "controlledVocabularyValue", cascade = {CascadeType.REMOVE, CascadeType.MERGE, CascadeType.PERSIST}, orphanRemoval=true)
private Collection<ControlledVocabAlternate> controlledVocabAlternates = new ArrayList<>();
Expand All @@ -113,6 +165,23 @@ public Collection<ControlledVocabAlternate> getControlledVocabAlternates() {
public void setControlledVocabAlternates(Collection<ControlledVocabAlternate> controlledVocabAlternates) {
this.controlledVocabAlternates = controlledVocabAlternates;
}

/**
* A hacky workaround to allow arbitrary numbers of "altValue" columns in the TSV file, providing
* alternative values for the controlled vocabulary value.
* @param alternative
*/
@Parsed(field = Headers.Constants.ALT_VALUES)
@Validate(nullable = true, allowBlanks = true)
private void addControlledVocabAlternates(String alternative) {
if (alternative == null || alternative.isBlank()) {
return;
}
ControlledVocabAlternate alt = new Placeholder.ControlledVocabAlternate();
alt.setControlledVocabularyValue(this);
alt.setStrValue(alternative);
this.controlledVocabAlternates.add(alt);
}

public String getLocaleStrValue() {
return getLocaleStrValue(null);
Expand Down
Loading