Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate FoLiA Set Definition scheme to RDF #14

Closed
proycon opened this issue Aug 5, 2016 · 6 comments
Closed

Migrate FoLiA Set Definition scheme to RDF #14

proycon opened this issue Aug 5, 2016 · 6 comments
Assignees
Labels
enhancement ready Implemented but not released yet
Milestone

Comments

@proycon
Copy link
Owner

proycon commented Aug 5, 2016

The role of FoLiA Set Definitions is:

  • to define which classes are valid in a set
  • to define which subsets and classes are valid in "features" in a set
  • to constrain which subsets+classes may co-occur in an annotation of the set
  • to allow enumeration over classes and subsets
  • to assign human-readable labels to symbolic classes
  • to relate classes to external resources defining them (data category registries)
  • to define a hierarchy/taxonomy of classes

Using set definitions a FoLiA document can be validated on a deep level, i.e.
the validity of the used classes can be tested. Set definitions provide
semantics to the FoLiA documents that use them and are an integral part of FoLiA.

Set definitions are not in widespread use yet, most people simply don't bother
or care for such a level of abstraction and formality. One tool, FLAT,
does rely heavily on set definitions to populate options in selection fields.

Set definitions are currently described in a simple XML format, distinct from
FoLiA itself. The format is limited and not strongly established.

Considering the highly semantic nature of set definitions, the binding role
they play between the FoLiA document on one hand and data category registries
on the other hand, and the advent of linked open data, I propose describing
the set definitions themselves in RDF in future versions of FoLiA. I'm working
on a scheme for this.

The current set definitions will remain supported for backwards compatibility
of course, and may also act as an intermediate step in producing the RDF data.

@proycon proycon added this to the v1.4 milestone Aug 5, 2016
@proycon proycon self-assigned this Aug 5, 2016
proycon added a commit to proycon/pynlpl that referenced this issue Oct 12, 2016
…-based set definitions (just started, skeleton, not functional yet), issue #19 and issue proycon/folia#14
proycon added a commit to proycon/pynlpl that referenced this issue Oct 12, 2016
proycon added a commit that referenced this issue Oct 12, 2016
@proycon proycon added the ready Implemented but not released yet label Nov 8, 2016
@proycon
Copy link
Owner Author

proycon commented Nov 10, 2016

See if we can derive our set definition model from SKOS, facilitating interoperability

@proycon proycon removed the ready Implemented but not released yet label Nov 10, 2016
proycon added a commit to proycon/pynlpl that referenced this issue Nov 14, 2016
@proycon
Copy link
Owner Author

proycon commented Nov 15, 2016

Summary of the current solution, the aim was to use as much of SKOS as possible, with as few as possible non-SKOS solutions:

  • Sets and subsets are modelled as skos:Collection
  • Relation between (sub)sets and classes is modelled using skos:member
  • Relation between sets and subsets is modelled using skos:member as well (only one level of nesting supported for our model, though SKOS has no such limit)
  • Classes are modelled as skos:Concept
  • Labels for (sub)sets and classes use skos:prefLabel
  • IDs for (sub)sets and classes use skos:notation, their use is mandatory and there can be only one.
  • Hierarchy in classes is expressed through skos:broader .

Minor custom-made extensions:

  • fsd:open and fsd:empty are boolean properties to explicitly indicate whether a set is open or empty (it is a closed set otherwise).
  • fsd:sequenceNumber is used for explicit ordering (skos:OrderedCollection not supported yet). Ordering is alphabetic on label when not specified.

Set definitions are completely agnostic about concept schemes. Relating concepts to external resources can be done through usual SKOS mechanisms, or other vocabularies; FoLiA set definition implementations don't use this information yet. A constraint mechanism for which subsets can be used together given what classes has not been defined nor implemented yet either.

FoLiA Set definitions have to be complete and publicly retrievable from the web, a set definition should consist of one and only one SKOS collection that acts as the primary set (i.e. it is not a subset). Furthermore, all classes and subsets need to be defined as stated above, referring to foreign resources using e.g. skos:member is not sufficient.

Example sets:

More details and examples are in the FoLiA documentation.

@mhkuu
Copy link

mhkuu commented Nov 15, 2016

I personally quite like the XML format for its simplicity, but understand the motivations for the move to RDF and I think you have done a great job (again ;-)). It would be good to add to your bullet list above (which I suppose will end up in the documentation) how things used to be done in XML to provide guidance for developers who would want to migrate.

One thing I noticed when working with setdefinitions is that linking to blob on a GitHub repository is not a good idea as those blobs seem to be cached (and if you push a newer version a rerun the validation the old blob is still fetched). Have you experienced the same or is this just me? :-)

@proycon
Copy link
Owner Author

proycon commented Nov 17, 2016

Thanks for the feedback! The legacy XML set definition format will remain supported for backward compatibility in any case (at least in the python library, which is the only library implementing FoLiA Set Definition support anyway). Legacy XML is translated on the fly to RDF. The foliasetdefinition tool can be used if you want to explicitly read old XML and output new RDF.

The cache issue is an interesting point indeed, but I must say I haven't really encountered many problems with this in my tests yet. In practise it will probably not be much of an issue and only cause a small update delay perhaps.

@proycon
Copy link
Owner Author

proycon commented Nov 17, 2016

For the sake of clarification and possible discussion, I'm adding an example screenshots of how complex FoLiA set definitions (with features/subsets) are used by FLAT (https://github.com/proycon/flat) to populate fields in the annotation editor:

screenshot

@proycon
Copy link
Owner Author

proycon commented Nov 17, 2016

And an example of deep validator invocation and output:

$ foliavalidator -d example.deep.xml
Loaded set https://raw.githubusercontent.com/LanguageMachines/uctodata/folia1.4/setdefinitions/tokconfig-nld.foliaset.ttl (119 triples)
Loaded set https://raw.githubusercontent.com/proycon/folia/folia1.4/setdefinitions/frog-mbpos-cgn (1780 triples)
Loaded legacy set https://raw.githubusercontent.com/proycon/folia/folia1.4/setdefinitions/frog-mblem-nl (3 triples)
Loaded legacy set https://raw.githubusercontent.com/proycon/folia/folia1.4/setdefinitions/frog-chunker-nl (137 triples)
Loaded legacy set https://raw.githubusercontent.com/proycon/folia/folia1.4/setdefinitions/frog-ner-nl (32 triples)
Loaded legacy set https://raw.githubusercontent.com/proycon/folia/folia1.4/setdefinitions/frog-mwu-nl (3 triples)
Loaded legacy set https://raw.githubusercontent.com/proycon/folia/folia1.4/setdefinitions/frog-depparse-nl (177 triples)
Validated successfully: example.deep.xml

@proycon proycon closed this as completed Dec 9, 2016
proycon added a commit to proycon/foliapy that referenced this issue Sep 6, 2018
…-based set definitions (just started, skeleton, not functional yet), issue #19 and issue proycon/folia#14
proycon added a commit to proycon/foliapy that referenced this issue Sep 6, 2018
proycon added a commit to proycon/foliapy that referenced this issue Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ready Implemented but not released yet
Projects
None yet
Development

No branches or pull requests

2 participants