Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about generated meta.xml #5

Closed
cgendreau opened this issue Dec 1, 2014 · 12 comments
Closed

Question about generated meta.xml #5

cgendreau opened this issue Dec 1, 2014 · 12 comments

Comments

@cgendreau
Copy link
Contributor

When writing a Dwc-A, the dwca-reader automatically generates the meta.xml file.

For a Taxon rowType, this meta.xml is generated:

<archive xmlns="http://rs.tdwg.org/dwc/text/">
  <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Taxon">
    <files>
      <location>taxon.txt</location>
    </files>
    <id index="0" />
    <field index="1" term="http://purl.org/dc/terms/modified"/>
...

Why no field entry is created for taxonID?
e.g.

<archive xmlns="http://rs.tdwg.org/dwc/text/">
  <core encoding="utf-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Taxon">
    <files>
      <location>taxon.txt</location>
    </files>
    <id index="0" />
    <field index="0" term="http://rs.tdwg.org/dwc/terms/taxonID" />
    <field index="1" term="http://purl.org/dc/terms/modified"/>
...

Thanks

@mdoering
Copy link
Member

mdoering commented Dec 1, 2014

We were debating about that (also occurrenceID) quite a bit, but decided that the taxon/occurrenceID might in some cases be different from the (local) identifier used to link the data in the archive. If it is automatically mapped you cannot deal with these cases. On the other hand you can always manually map the taxonID to the same column

@timrobertson100
Copy link
Member

IIRC correctly the key argument against it was because a row in a CSV file might not represent a single taxon "concept". The example being row 1 might detail the accepted name and the following rows the synonyms and where they were published, but together they define the definition of the taxon.

@cgendreau
Copy link
Contributor Author

@mdoering I agree it should not be set by default to taxonID. How can I map the taxonID to the same column? In other words how to say to the writer 'this is my id column' ?
@timrobertson100 Agree, even if we(Canadensys) have taxonID for synonyms which is not without issues: Canadensys/vascan#19

@cgendreau
Copy link
Contributor Author

<id index="0" />
<field index="0" term="http://rs.tdwg.org/dwc/terms/taxonID" />
<field index="1" term="http://purl.org/dc/terms/modified"/>

or

<id index="0" term="http://rs.tdwg.org/dwc/terms/taxonID />
<field index="1" term="http://purl.org/dc/terms/modified"/>

@timrobertson100
Copy link
Member

is certainly the correct way to say the first column is the internal
ID and it is also the taxonID.

Does that answer the question though?

On Wed, Dec 3, 2014 at 5:32 PM, Christian Gendreau <notifications@github.com

wrote:

or

<id index="0" term="http://rs.tdwg.org/dwc/terms/taxonID />


Reply to this email directly or view it on GitHub
#5 (comment).

@cgendreau
Copy link
Contributor Author

yes it does and this is how I implemented it so, the next logical question would be when it makes sense to use:

<id index="0" term="http://rs.tdwg.org/dwc/terms/taxonID />

or why can we set a 'term' to the id?

@mdoering
Copy link
Member

mdoering commented Dec 3, 2014

The xsd schema does not allow the id field to have any other attribute than index:
https://github.com/tdwg/dwc/blob/master/text/tdwg_dwc_text.xsd#L82

@cgendreau
Copy link
Contributor Author

true, that makes sense.
Was confused by ArchiveField class but all fine.
Pull Request is coming.

@timrobertson100
Copy link
Member

I am not sure I understand what you are trying to do here, sorry. Can you
please elaborate on what the problem is?

The following is the correct way to write a meta.xml where a column is a
core and also used as a term according to the standard, and any deviation
would likely need an update to the standard itself:
http://rs.tdwg.org/dwc/terms/guides/text/.

On Wed, Dec 3, 2014 at 5:49 PM, Christian Gendreau <notifications@github.com

wrote:

true, that makes sense.
Was confused by ArchiveField class but all fine.
Pull Request is coming.


Reply to this email directly or view it on GitHub
#5 (comment).

@cgendreau
Copy link
Contributor Author

The problem was that it was not possible to generate such meta.xml using the DwcaWriter.

<id index="0" />
<field index="0" term="http://rs.tdwg.org/dwc/terms/taxonID" />

@timrobertson100
Copy link
Member

Thanks.

I had assumed the IPT would use that but I guess not. It can produce that
kind of meta.xml
https://code.google.com/p/gbif-providertoolkit/source/browse/trunk/gbif-ipt/src/main/java/org/gbif/ipt/task/GenerateDwca.java#9

On Wed, Dec 3, 2014 at 6:59 PM, Christian Gendreau <notifications@github.com

wrote:

The problem was that it was not possible to generate such metal.xml using
the DwcaWriter.


Reply to this email directly or view it on GitHub
#5 (comment).

@cgendreau
Copy link
Contributor Author

No the IPT doesn't use the DwcaWriter, it handles that case by himself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants