Improve identifier: sourceSystemId #54

hurni · 2024-11-19T14:14:20Z

Identifiertype contains ID and sourcesystemId.
SourceSystemID accepts a token of length 50. Can we improve? For example: Use a code list, demand an UID (for CH-sources) or UID?

https://github.com/blw-ofag-ufag/eCH-0261/blob/2e70dda84d9971ca7fe699da64985de70d643e9a/src/eCH-0261-1-0.xsd#L373C1-L384C17

AFoletti · 2024-11-20T05:43:00Z

identifierType defines a really generic way to handle identifiers (which is good, since we use it across all our standards).

I am not sure I fully understand what you mean by "improve" in this context, but I can react to your proposals:

To me, a code list is not really feasible since we want to point to objects and not categories. I would not go that way
dc:source is semantically not ideal, since has kind of a "derived from" meaning, which is not our usecase.
dc:publisher can be a good pick but, due to the generic usage of our identifierType, will only be correct in a few instances (where we actually point to a publisher, which is by far not always the case)

My take is that, due to the extreme flexibility we need for this identiferType, the current implementation is an OK one. I am however fully for improvements should we find some.

hurni · 2024-11-20T09:30:31Z

sadly, I agree... was hoping you had a magical solution.

IdentifierType contains id and sourceSystemId. Dream scenario: sourceSystemID is an URI (hence no need for a curated list). However, not every source system has an URI atm and forseeable future.
Will propose a lower level angle directly linked to zoologicalAnimalType and botanicalPlantType

montanajava · 2024-11-26T05:02:18Z

Hi all. What do you think of this?

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Define the enum type for sourceSystemId -->
    <xs:simpleType name="sourceSystemEnumType">
        <xs:restriction base="xs:token">
            <xs:enumeration value="SYSTEM_A"/>
            <xs:enumeration value="SYSTEM_B"/>
            <xs:enumeration value="SYSTEM_C"/>
        </xs:restriction>
    </xs:simpleType>

    <!-- Define an id token type with a length restriction -->
    <xs:simpleType name="idToken">
        <xs:restriction base="xs:token">
            <xs:maxLength value="50"/>
        </xs:restriction>
    </xs:simpleType>

    <!-- Define the union type that combines enum and restricted-length token -->
    <xs:simpleType name="sourceSystemUnionType">
        <xs:union memberTypes="sourceSystemEnumType idToken"/>
    </xs:simpleType>

    <!-- Example element using the union type -->
    <xs:element name="record">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="id" type="xs:string"/>
                <xs:element name="sourceSystemId" type="sourceSystemUnionType"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

This way, you could define common known source systems in sourceSystemEnumType, while allowing freeform IDs for all other systems.

You would have to get consensus from the working group for the entries that would go into sourceSystemEnumType. Subsequent expansion of those entries would be a minor change to the spec. Anything else would be a major change.

The proposed solution is backwards-compatible with current systems, i.e., anything goes.

So where is the improvement?

The improvement comes for consuming systems that wish to constrain allowed values: There, you can create validators that insist that only the entries of the enumeration are used. In the Java world, a rudimentary example might look something like this:

Java Validator Example

public class SourceSystemIdValidator {

    // The values would be read from the XSD ...
    private static final List<String> VALID_ENUMS = Arrays.asList("SYSTEM_A", "SYSTEM_B", "SYSTEM_C");
    
    public static void validate(Record record) throws IllegalArgumentException {
        if (!VALID_ENUMS.contains(record.getSourceSystemId())) {
            throw new IllegalArgumentException("Invalid sourceSystemId value: " + record.getSourceSystemId());
        }
    }
}

An implementation at the DB level could look something like this:

SQL Implementation Example

CREATE TABLE my_agricultural_imported_data (
    id VARCHAR(255) NOT NULL,
    source_system_id VARCHAR(10) NOT NULL,
    CONSTRAINT chk_source_system_id CHECK (
        source_system_id IN ('SYSTEM_A', 'SYSTEM_B', 'SYSTEM_C')
    )
);

AFoletti · 2024-11-26T05:34:22Z

@montanajava
Hey! Thanks for the proposal.
I may be missing something, but I fail to understand how you can implement a real check if the sourceSystemId is both an enumType AND freeform at the same time.

   <xs:simpleType name="sourceSystemEnumType">
        <xs:restriction base="xs:token">
            <xs:enumeration value="SYSTEM_A"/>
            <xs:enumeration value="SYSTEM_B"/>
            <xs:enumeration value="SYSTEM_C"/>
        </xs:restriction>
    </xs:simpleType>

What if I enter "SYSTEM_a"? Is it a typo and should really be "SYSTEM_A" or is it actually another system and I am using the freeform flexibility given to me? There is no way to differenciate.

montanajava · 2024-11-28T09:29:50Z

You are exactly right. The proposal is a compromise. The check can be performed by the provisioning and/or by the consuming system, but not with the XSD -- it would have to be done with another XSD or another technology. The XSD here specifies what is _possible_. And what we are making possible is one of three things, depending upon the need: a. anything goes. The enum buys you nothing here. b. restricted. Two parties can agree that only those entries in the enum are valid. This approach would be predicated on having requisite system Ids registered in the enum. c. restricted with an option for freeform entries in exceptional cases. This will be few architect's favoured approach, but it is a viable option for those situations where a "contract" between provisioning and consuming parties can only be partially agreed upon.

…

On Tue, Nov 26, 2024 at 6:34 AM Ambrogio Foletti ***@***.***> wrote: @montanajava <https://github.com/montanajava> Hey! Thanks for the proposal. I may be missing something, but I fail to understand how you can implement a real check if the sourceSystemId is both an enumType AND freeform at the same time. <xs:simpleType name="sourceSystemEnumType"> <xs:restriction base="xs:token"> <xs:enumeration value="SYSTEM_A"/> <xs:enumeration value="SYSTEM_B"/> <xs:enumeration value="SYSTEM_C"/> </xs:restriction> </xs:simpleType> What if I enter "SYSTEM_a"? Is it a typo and should really be "SYSTEM_A" or is it actually another system and I am using the freeform flexibility given to me? There is no way to differenciate. — Reply to this email directly, view it on GitHub <#54 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPTNWGLBAOHIN5AVKD6TG32CQB7HAVCNFSM6AAAAABSCG6JPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJZG4YDCMBXGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

AFoletti · 2024-11-28T09:48:56Z

Understood.
To me, this build unnecessary complexity in the model. Either we enforce a codelist (not possible...) or we leave it freeform, and if someone can/wants to define a set of valid values for a very specific usecase, he is still free to do so.

This could of course result in data that conforms to the eCH standard but not to the tigher restrictions posed by the specific usecase, which is in my opinion perfectly acceptable.

hurni added enhancement New feature or request v2 jira Tag to activate JIRA integration major labels Nov 19, 2024

hurni closed this as completed Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve identifier: sourceSystemId #54

Improve identifier: sourceSystemId #54

hurni commented Nov 19, 2024

AFoletti commented Nov 20, 2024

hurni commented Nov 20, 2024

montanajava commented Nov 26, 2024

AFoletti commented Nov 26, 2024

montanajava commented Nov 28, 2024 via email

AFoletti commented Nov 28, 2024 •

edited

Loading

Improve identifier: sourceSystemId #54

Improve identifier: sourceSystemId #54

Comments

hurni commented Nov 19, 2024

AFoletti commented Nov 20, 2024

hurni commented Nov 20, 2024

montanajava commented Nov 26, 2024

AFoletti commented Nov 26, 2024

montanajava commented Nov 28, 2024 via email

AFoletti commented Nov 28, 2024 • edited Loading

AFoletti commented Nov 28, 2024 •

edited

Loading