Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbJSON provenance object should map to metadataInfo #244

Open
dkarthur opened this issue Mar 29, 2023 · 6 comments
Open

sbJSON provenance object should map to metadataInfo #244

dkarthur opened this issue Mar 29, 2023 · 6 comments

Comments

@dkarthur
Copy link

dkarthur commented Mar 29, 2023

The sbJSON reader is currently mapping the sbJSON provenance object to resource citation object of the internal translator data format.

Dates and contacts associated with the metadata record itself and not the ScienceBase item being referenced are being translated incorrectly. Dates are being mapped from the sbJSON “provenance” to mdJson “resourceInfo,” while contacts are not being translated at all. Addressing this issue is a critical need for NGGDPP and ReSciColl developers in order to provide appropriate metrics for USGS and external ReSciColl users and stakeholders.

sbJSON Example:

"provenance": {
		"dateCreated": "2023-01-10T17:39:42Z",
		"lastUpdated": "2023-01-10T19:41:42Z",
		"lastUpdatedBy": "vcrystal@usgs.gov",
		"createdBy": "vcrystal@usgs.gov"
}

Current mdJSON translation:

"resourceInfo": {
	"citation": {
		"title": "Carlsbad Cores Collection",
		"date": [
			{
				"date": "2023-01-10T17:39:42+00:00",
				"dateType": "creation"
			},
			{
				"date": "2023-01-10T19:41:42+00:00",
				"dateType": "lastUpdate"
			},
			{
				"date": "2023-01-10",
				"dateType": "creation",
				"description": "Creation"
			}
		],
		"responsibleParty": [
			{
				"role": "owner",
				"party": [
					{
						"contactId": "40fff240-e50e-49a7-9e6a-259326e5e866"
					}
				]
			}
		]
	}
...

Desired translation:

metadataInfo > metadataDate

  • sbJSON “dateCreated” -> mdJSON dateType = "creation"
  • sbJSON "lastUpdated" -> mdJSON dateType = "lastUpdate"

metadataInfo > metadataContact

  • sbJSON "createdBy" -> mdJSON role = "author"
  • sbJSON "lastUpdatedBy" -> mdJSON role = "editor"

The “createdBy” and “lastUpdatedBy” properties in the sbJSON “provenance” section are currently not found anywhere in the mdJSON output from mdTranslator. They should be mapped to “metadatainfo”: “metadataContact” with “role” of “author” (or "curator") and “editor” accordingly.

@hmaier-fws
Copy link

hmaier-fws commented Apr 1, 2023

@dkarthur what do you mean by "contacts are not being translated at all"? The code snippet you provided seems to display a responsibleParty.

Regarding "...with “role” of “author” (or "curator")" I would probably recommend "author" as this is an ISO code described as "party who authored the resource". "Curator" is an ADIwg extended code defined as "party who serves as curator for specimens deposited in a repository". There is also an "originator" (party that created the resource); which might be applicable if the metadata was "authored" by one party but then uploaded to the system by a second party?

@hmaier-fws hmaier-fws changed the title mdTranslator Mapping Changes Required for sbJSON-mdJSON Translation sbJSON reader should map provenance object to metadataInfo Apr 1, 2023
@hmaier-fws hmaier-fws changed the title sbJSON reader should map provenance object to metadataInfo sbJSON provenance object should map to metadataInfo Apr 1, 2023
@dkarthur
Copy link
Author

dkarthur commented Apr 3, 2023

@hmaier-fws Perhaps I should've phrased it as: Contacts from sbJSON provenance object are not being mapped to mdJSON. When not using mdEditor, I don't know how to resolve the mdJSON responsibleParty code. It doesn't appear to map to the ScienceBase user who created the metadata record there, and it's that information that doesn't appear to be coming through the translator at all.

Also, to be sure I understand your comment on the second part, when you refer to "resource," are you referring to whatever it is to which the metadata refers, not the metadata record itself, or are you referring to the metadata?

@chris-macdermaid
Copy link
Collaborator

chris-macdermaid commented Apr 3, 2023

The module_provenance.rb only handles the "dateCreated" and "lastUpdated" fields. The "lastUpdatedBy" and "createdBy" fields are dropped by the sbJson reader.

In addition to the above, the sbJson "dates" field is also added to the resourceInfo section in module_date.rb. The result is that there can be 2 creation dates in the resourceInfo section

mdJson
image

sbJson
Screenshot from 2023-04-03 11-19-32

Screenshot from 2023-04-03 11-20-11

@chris-macdermaid
Copy link
Collaborator

A snapshot from ScienceBase's documentation.

Provenance:

Datatype: Provenance object
The ScienceBase Provenance attribute is an open text field that is used to describe the origin of an item, especially in terms of how the item came to be introduced to ScienceBase. It can be used to describe the full provenance of some form of data that may have been through a number of derivations.​​​​​​​

provenance Object
annotation
Datatype: String
The text of the provenance.

dataSource
Datatype: String
Where the item came from. If this item was created by a person in ScienceBase it will be "Input Directly". If it was harvested from an external source this will show that instead.

dateCreated
Datatype: DateTime
The date and time the item was created.

createdBy
Datatype: String
The person or organization who created the item.

lastUpdated
Datatype: DateTime
The date and time the item was last updated.

lastUpdatedBy
Datatype: String
The last person or organization to update the item.

"provenance":
{
"annotation":"Provenance1",
"dataSouce":"Input directly",
"dateCreated":"2015-11-09T19:02:45Z",
"lastUpdated":"2015-11-09T19:02:45Ze",
"lastUpdatedBy":"abc@usgs.gov",
"createdBy":"abc@usgs.gov",
"fileProcess": ???,
"linkProcess": ???
}​​​​​​​

@dwalt
Copy link
Collaborator

dwalt commented Apr 5, 2023

Verified createdBy and lastUpdatedBy not being populated in ScienceBase, and not a factor of sbJSON-mdJSON translation. In addition, dataSource is not populated either. How, when or whether it is currently used by ScienceBase is unknown. @dkarthur will run use case tests to help us understand how and when provenance is created and updated as follows:

  1. Created using the ReSciCol Dashboard app
  2. Created using ScienceBase
  3. Created using mdEditor

For each create example, test update in mdEditor and re-publish to ScienceBase (update item) to help us determine if update processes have different logic than create processes regarding writes to provenance.

Test update in ScienceBase regardless of create method, update in mdEditor and re-publish to ScienceBase.

Request to ScienceBase team:

  1. ScienceBase API scripts will need to be updated to populate createdBy, updatedBy
  2. Relative to test findings, ScienceBase API scripts may need additional changes

Agreement with @dkarthur to:

  1. map createdBy and lastUpdatedBy to: [schema{ } > metadata{ } > metadataInfo{ } > metadataDate[ ] > object{ } > description]
  2. Accept proposal to remap sbJSON>provenance dateCreated, dateUpdated to metadataDate>date, with "creation", and "lastUpdate" dateType as is appropriate

@dwalt
Copy link
Collaborator

dwalt commented Jun 24, 2023

I think we have agreed on a different proposal. Can this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants