Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export/Harvesting: Leaving "Name" under "Contact" blank while creating a dataset results in no contact email address in the DDI export #3802

Closed
pdurbin opened this issue Apr 27, 2017 · 5 comments

Comments

@pdurbin
Copy link
Member

pdurbin commented Apr 27, 2017

While investigating #3443 I noticed that if you don't fill in "Name" under "Contact" when creating a dataset, the email address of the dataset contact is not available in the DDI export. I expect to see <contact email="philip_durbin@harvard.edu"></contact> rather than nothing. I'm going to label this as a bug. The REST Assured XmlPath expression I test for is codeBook.stdyDscr.stdyInfo.contact.@email.

Dataset with Contact Name left blank (email absent from DDI)

screen shot 2017-04-27 at 4 02 02 pm

screen shot 2017-04-27 at 4 03 02 pm

screen shot 2017-04-27 at 4 03 13 pm

Dataset with Contact Name (email present in DDI)

Note that <contact affiliation="Harvard University" email="philip_durbin@harvard.edu">Durbin, Philip</contact> is highlighted below.

screen shot 2017-04-27 at 4 04 40 pm

I know what causes this and I added a TODO at

// TODO: Since datasetContactEmail is a required field but datasetContactName is not consider not checking if datasetContactName is empty so we can write out datasetContactEmail.
that reads:

// TODO: Since datasetContactEmail is a required field but datasetContactName is not consider not checking if datasetContactName is empty so we can write out datasetContactEmail.
if (!datasetContactName.isEmpty()){
    xmlw.writeStartElement("contact"); 
    if(!datasetContactAffiliation.isEmpty()){
       writeAttribute(xmlw,"affiliation",datasetContactAffiliation); 
    } 
    if(!datasetContactEmail.isEmpty()){
       writeAttribute(xmlw,"email",datasetContactEmail); 
    } 
    xmlw.writeCharacters(datasetContactName);
    xmlw.writeEndElement(); //AuthEnty
}

I showed this to @landreev and @sekmiller and we agreed that it's out of scope for #3443. We think the DDI spec will allow an empty element with just an attribute like <contact email="philip_durbin@harvard.edu"></contact> but there is an open question of if we can harvest this.

@pdurbin
Copy link
Member Author

pdurbin commented May 17, 2017

@philippconzett just noted at https://groups.google.com/d/msg/dataverse-community/v6jgyewGqlI/f7RKzoIdBAAJ the following:

I just got the same DataCite related error message again when trying to publish another dataset. After a lot of trial and failure I found out that the metadata field "Contact Name" has to be filled in in order for the dataset to be published. I don't know the reason for this - I couldn't find any information about this field to be obligatory in DataCite Metadata Scheme V 4.0. By default, the contact email address is required by Dataverse, but not the contact name. We have now added this field as obligatory in our Dataverses. Maybe the field should be required by Dataverse.

In the testing above, I was using EZID, not DataCite, as my :DoiProvider: http://guides.dataverse.org/en/4.6.1/installation/config.html#doiprovider

Maybe he's right. Maybe contactName should be a required field.

@philippconzett
Copy link
Contributor

Thanks Phil. I just created an issue for the contact name problem:
#3839

Best,
Philipp

@pdurbin
Copy link
Member Author

pdurbin commented Jun 28, 2017

This issue is better tracked at #3839. Closing.

@pdurbin pdurbin closed this as completed Jun 28, 2017
@jggautier
Copy link
Contributor

jggautier commented Jul 14, 2017

The two issues, this one and #3839, are related but different enough to be in different github issues. Each issue could be solved with different solutions. And from what I understand, one solution can solve both issues only if we decide that contactName is always filled out. In #3839, we talked about two ways to do this, either by:

  • Making sure that when the contactName field is blank, Dataverse fills it in with one of DataCite's "values for unknown information" (I mentioned in Contact Name should be required metadata field in Dataverse? #3839). (Just noticed that DataCite recommends using these values when required information isn't available, and DataCite doesn't require contactName, so it might be a misuse.)
  • Or making the contactName field mandatory

I don't think any of those solutions solve what I think are the larger issues, the first of which is unique to #3839:

  • There might be something wrong with the metadata Dataverse is sending to DataCite (when I asked on the DataCite forums if EZID might be doing something differently with dataset metadata when registering DOIs, I was told no and they recommended we look more deeply at what Dataverse might be doing.)
  • And some depositors don't want to include a contact name and we don't know why. Is forcing them to fill in that field better than making it easier for them to fill in that field with more appropriate metadata?

What I'm hoping to say is that solving the remaining open issue, #3839, may not solve this one. And details about this closed issue aren't in #3839.

Solving this issue should involve making sure that contactEmail is included in all datasets' DDI XML even when a contactName is missing.

@pdurbin
Copy link
Member Author

pdurbin commented Jun 28, 2018

Let's work on this when we get around to working on #3839. Meanwhile, I'm closing it.

@pdurbin pdurbin closed this as completed Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants