Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of Data Packages #31

Closed
nielsklazenga opened this issue Feb 1, 2019 · 8 comments
Closed

Use of Data Packages #31

nielsklazenga opened this issue Feb 1, 2019 · 8 comments

Comments

@nielsklazenga
Copy link
Member

@mdoering mentioned the use of Data Packages in issue #1, which is now closed.

Just elevating this to a separate issue, to keep it on the radar.

@mdoering, still interested in this?

@mdoering
Copy link

mdoering commented Feb 1, 2019

I am very much interested in an application of the standard that allows to exchange data in CSV files as we can with DwC. Data packages appear to be the strongest candidate for an existing standard in that area to jump on to. But if TCS NG becomes a RDF only standard I am frankly rather disappointed.

@baskaufs
Copy link

baskaufs commented Feb 1, 2019

@mdoering I just gave Data Packages a look and it's really interesting. Here are a few relevant points:

  • Section 4 of the Standards Documentation Specification is very careful to say that the machine-readable metadata for a standard MAY be expressed as RDF, but that other methods can be used as long as the same relationships are expressed in a machine-processable way. So RDF is not required.

  • I believe strongly that we should try to keep our standards definitions simple enough that they can be expressed as CSV tables. At this point, all of the existing TDWG vocabulary standards ARE simple enough to be expressed as CSV tables. I don't think that people have paid much attention to the rs.tdwg.org repo but it contains all of the information required to describe TDWG vocabularies from CSV data and to turn those CSV data into machine-readable RDF. In each of the folder, there's one core CSV file (like this one for the dwc: terms) and other files that describe how to map the table columns to well-known properties (like this one). I just took a look at the Table Schema information for Data Package and all of the information in the "other files" I just mention could be expressed as a Table Schema JSON file. So the Data Package system could be used to create CSV machine-readable files that are directly translatable to RDF and that would contain equivalent information.

  • Guid-O-Matic is the software that I wrote to turn CSV files into RDF serializations. I have been thinking of making a version 3.0 in Python, so maybe the Data Package specification would be the way to describe the CSV files. I see that they have a Python library, but didn't investigate what all it can do yet. One thing I don't know is how widely adopted Data Package is. Do you know?

  • I said earlier that all existing TDWG standards can be easily expressed in CSV tables. However, some of the models we are talking about so far in TNC are getting complicated to the point where that might be difficult. We should keep that in mind as we try to balance our desire to express complex ideas in the standard.

@nielsklazenga
Copy link
Member Author

nielsklazenga commented Feb 2, 2019

@mdoering, that is not at all the intention; we intent to make a specification that is broadly applicable, and not just because that's what the Vocabulary Maintenance Specification requires from TDWG standards. Your use case is definitely a very important one and is very much on our radar. If a lot of the examples were in Turtle that is just because it is easy to read and write and useful to quickly get an idea across. Speaking for myself, when I think about data models I think tables in a database and if I were to produce RDF with real data, it would be something else first (database tables) and there would be something else again (JSON) between the database table and the RDF..

The models we discussed so far may look complicated, but we really have been talking about only two classes/tables, so they aren't really (and I think everything you get into a database structure you can get into CSV). I am pretty sure I could shoehorn everything we discussed so far into the Darwin Core Taxon class if I had a good crack at it. I might just do that (not right now). The use I see for Label objects is in the interface between identifications and taxonomic names.

If I recall correctly, you were the one who suggested we should look at a domain model. We are definitely have a much closer look at serialization. I created this issue because I thought it would be good to have on the radar for the time when we really start looking at that (and because I had a look when you first mentioned it and it looked promising), but maybe it turns out timely to keep our eyes on the ball. Will try to use more different ways to show examples. To be fair, @baskaufs had CSV examples in the document in which he was spruiking the use of SKOS-XL.

@nielsklazenga
Copy link
Member Author

nielsklazenga commented Feb 2, 2019

Sorry @baskaufs, I had a better look at your post just now and see that you had already addressed pretty much everything that I just did.

If I provide you with a set of CSV files with data from a taxonomic revision, perhaps even in the form of a Data Package, would you be interested to do your Guid-O-Matic thing on it? I would be interested to see the result, both the CSV and the RDF.

@baskaufs
Copy link

baskaufs commented Feb 4, 2019

Sure, I can give it a go. We just need to do some mapping of column headers to property URIs. The URIs can just be made-up; it doesn't matter if they are "real" or not.

@nielsklazenga
Copy link
Member Author

Thanks @baskaufs. The time it took me to create my example for issue #30 made me realise it will take some time for me to get all the data together.

@baskaufs
Copy link

baskaufs commented Feb 5, 2019

No problem. Just let me know...

@nielsklazenga
Copy link
Member Author

I have added an example Data Package to the examples in this repository: /examples/datapackage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants