-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support structured funding information #266
Comments
While I like the idea of importing CrossRef's |
I think putting the funding information inside the EML in like this is great as I'm a fan of putting as much as possible inside a single EML doc. And making it look like or be imported from the fundref schema is great too. Is there another option here where we include a fundref XML doc in the package instead of this or in addition to this? |
Added first version of a new `award` field to the eml-project schema to add info that can be used to reference funding information for the projet. This provides the first implementation of the `award` field, including the award title, identifier, and open funder registry identifier. These proposed additions are partial implementations for issue #266.
Draft |
Draft award sample has now been added to the eml-sample.xml document in sha 5ce01c9, and it validates. |
See related discussions for CodeMeta and schema.org on how to incorporate funding info: |
@mfenner outlines the schema.org proposal in codemeta/codemeta#160 describing the structure they will use as:
Thus, we could consider modifying the EML structure to be compatible, which mainly means shifting to camelCase to match both EML and schema.org conventions, and shifting field name conventions to match schema.org. Thus, the new EML structure for awards would look like this: <project>
...
<funding><para>General description goes here for backwards compatibility</para></funding>
<award>
<funderName>National Science Foundation</funderName >
<funderIdentifier>https://doi.org/10.13039/00000001<funderIdentifier>
<awardNumber>1546024</awardNumber>
<title>Scientia Arctica: A Knowledge Archive for Discovery and Reproducible Science in the Arctic</title>
<awardURL>https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546024</awardURL>
</award>
<!-- Note award is repeatable -->
</project> Alternatively, we could also make a new <project>
...
<funding><para>General description goes here for backwards compatibility</para></funding>
<award>
<funder>
<name>National Science Foundation</name >
<identifier>https://doi.org/10.13039/00000001<identifier>
</funder>
<awardNumber>1546024</awardNumber>
<title>Scientia Arctica: A Knowledge Archive for Discovery and Reproducible Science in the Arctic</title>
<awardURL>https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546024</awardURL>
</award>
<!-- Note award is repeatable -->
</project> The advantage of the former is that it is shallower and easier to parse. The advantage of the latter is that it allows funder to become a type of Organization, and disambiguates the name and identifier fields. |
To fully match the schema.org structures, we could continue to deviate from FundingData field names, and use schema.org names, and reorder fields to correspond with the DataCite proposal. This would look like: <project>
...
<funding><para>General description goes here for backwards compatibility</para></funding>
<award>
<name>Scientia Arctica: A Knowledge Archive for Discovery and Reproducible Science in the Arctic</name>
<identifier>1546024</identifier>
<url>https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546024</url>
<funder>
<name>National Science Foundation</name >
<identifier>https://doi.org/10.13039/00000001<identifier>
</funder>
</award>
<!-- Note award is repeatable -->
</project> This schema deviates from EML semantics, as it would be more consistent to type So, it seems we have three possible options: Option 1: Follow FundingData naming conventionsThis is how things are currently implemented in the 2.2 branch. It is also consistent with how many journals are providing funding info for their journal articles. Option 2: Follow schema.org conventionsThis is what is represented in the DataCite examples. As schema.org doesn't have an award type yet, this is really making a totally new path. But schema.org is a nice, generic vocabulary that Google recognizes. Option 3: Follow EML naming conventionsIn this case, we re-use existing EML types when they exist (namely for Comments please on these pros and cons so we can wrap this up and make a decision. @cboettig, @csjx, @mobb, @amoeba? |
Thanks for laying out the three options. I'm a fan of Option 3 at this point. I'd like to see us make as much use of existing EML types as possible when adding features to the spec in an effort to keep EML looking like EML. This has downstream benefits in terms of application development. I think this preference is different than my previous preference. With the other solutions, even though we'd be making use of names from other standards, it seems like a crosswalk between the metadata standards would still be needed, no matter how similar we are. If we really want to use another schema, I'd rather see us embed the entire FundRef metadata record in the EML (which is probably super gross). I think as long as we maintain a low-loss semantic equivalent between EML's funding information and the currently used and not-yet-invented schemas, we're good. |
How ugly is it to embed the full FundRef? Or a subset of FundRef properties (without altering the nesting structure). There seems to be significant momentum behind improving schema.org award support; since the crossref folks are already thinking along these lines I imagine they'll define a mapping between FundingData and whatever schema.org modifications for Award emerge, but not worth us hacking this now. |
Its pretty ugly in its normal form, and specific to crossref's approach to assertions. I can't see importing the fundref.xsd directly. Check it out. I think I agree with @amoeba that whatever we choose will require an EML conversion anyways, because the fields won't be in either the fundref or scehma.org namespaces. So I think all we need is isomorphism; exact name matches are not critical. |
Sounds good to me. 👍 |
I've re-implemented this following the guidelines of Option 3, and checked it in SHA eb4ed60. Here's an example funding section that validates.
Unless there are further requests for modification, this enhancement is complete, and I will close this ticket. |
I agree with @amoeba , on option 3 - follow EML naming and typing conventions. Although, at present, I don't see the EML ResponsiblePartyType containing funderName and ID (in Branch_2_2). As was said, there would need to be a crosswalk anyway, between these elements and other systems. The extra structure of using that Type) isn't much of a hardship, since code for any responsibleParty can be reused to insert it, rather than having to create a function just for funderName and funderID. |
EML 2.0.1 supports a funding field, but it is completely unstructured, and prevents effective linking to awards due to lack of standardization. The current
funding
field in eml-project is of typetxt:TextType
, which allows it to have sections and paragraphs. Some groups use multiple paragraphs for different awards, but there is still no structure to understand which agency provided the funding, nor a well-delimited award number, nor links to web services, fundRefIDs, etc. I propose to extend EML to support this structured information to allow more parseable funding information, particularly:I also considered adding Principal investigators, but chose to not do so as they are in the containing project description. Its arguable that award title should simply be in the containing project
title
. Given that 'funding' already exists as a description of the funding, I propose that a new optional and repeatableaward
element be added that can list the machine parseable part of the funding info. Here's a proposed new structure for a funding section.I chose the
funder_name
,funder_identifier
, andaward_number
fields (despite not following EML naming conventions) to match the CrossRef fields from their Funding Data initiative, which includes their Funder Registry with formal identifiers for funding programs and a CrossRef API /funders endpoint, which can be linked to papers and resources following their overview for inclusion in CrossRef. I matched their particular fundref.xsd schema (see documentation), but we could also consider importing it directly and using their namespace.Please add thoughts/feedback to this ticket and we'll work up a proposal.
The text was updated successfully, but these errors were encountered: