Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Compact"/Make complete the native API output #3068

Closed
raprasad opened this issue Apr 7, 2016 · 13 comments
Closed

"Compact"/Make complete the native API output #3068

raprasad opened this issue Apr 7, 2016 · 13 comments
Labels

Comments

@raprasad
Copy link
Contributor

raprasad commented Apr 7, 2016

This is probably worth a separate FRD and includes (certainly not limited to) items such as:

This is not a "massive" project but modifications on the existing code.

@raprasad raprasad added Type: Feature a feature request Status: Dev Feature: API Type: Bug a defect Component: Code Infrastructure formerly "Feature: Code Infrastructure" labels Apr 7, 2016
@raprasad
Copy link
Contributor Author

raprasad commented Apr 7, 2016

Example of metadata blocks for this dataset: http://hdl.handle.net/1902.29/10220

Current version (12,172 bytes)

"metadataBlocks": {
            "citation": {
                "displayName": "Citation Metadata", 
                "fields": [
                    {
                        "typeName": "title", 
                        "multiple": false, 
                        "typeClass": "primitive", 
                        "value": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976"
                    }, 
                    {
                        "typeName": "author", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "authorName": {
                                    "typeName": "authorName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "State Center for Health Statistics"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "datasetContact", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "datasetContactName": {
                                    "typeName": "datasetContactName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "David Sheaves"
                                }, 
                                "datasetContactEmail": {
                                    "typeName": "datasetContactEmail", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "david_sheaves@unc.edu"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "dsDescription", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "dsDescriptionValue": {
                                    "typeName": "dsDescriptionValue", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "<p>The North Carolina State Center for Health Services (SCHS) collects yearly vital statistics. The Odum Institute holds vital statistics beginning in 1968 for births, fetal deaths, deaths, birth/infant deaths, marriages and divorce. Public marriage and divorce data are available through 1999 only.</p><p>We have created a consolidated birth/infant death file that contains records of deaths occurring during the first year of life. Each such death record has been matched with a corresponding birth\nrecord creating a composite record containing information about both events. Users of these consolidated files should be aware that the file year of these data sets refers to the year of birth, not the year of death. For example, the 1970 consolidated birth/infant death file contains records of births occurring during 1970 that ended in an infant death either in 1970 or 1971. For this reason, the number of infant deaths for a particular year as obtained from the consolidated file will not be the same as the number obtained\nfrom the death file for that same year. This difference should especially be kept in mind when using this file in conjunction with the publication Vital Statistics, volume 1. This study focuses on North Carolina birth/infant deaths for 1976. It includes data on the age, education level and marital status of the parents; sex and race of the child; prenatal medical care received; county and hospital of birth; information on the mother's reproductive history including number of previous pregnancies and live births; as well as statistics on the newborn and autopsy information.\n</p> <p>The data is strictly numerical, there is no identifying information given about the parents or child.</p>"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "keyword", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "keywordValue": {
                                    "typeName": "keywordValue", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "Births"
                                }, 
                                "keywordVocabulary": {
                                    "typeName": "keywordVocabulary", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "ODUM:INDEX.TERMS"
                                }
                            }, 
                            {
                                "keywordValue": {
                                    "typeName": "keywordValue", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "Infant death"
                                }, 
                                "keywordVocabulary": {
                                    "typeName": "keywordVocabulary", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "ODUM:INDEX.TERMS"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "notesText", 
                        "multiple": false, 
                        "typeClass": "primitive", 
                        "value": "Version Date: 1976Version Text: Birth/Infant Death"
                    }, 
                    {
                        "typeName": "producer", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "producerName": {
                                    "typeName": "producerName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "State Center for Health Statistics"
                                }, 
                                "producerAbbreviation": {
                                    "typeName": "producerAbbreviation", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "SCHS"
                                }, 
                                "producerURL": {
                                    "typeName": "producerURL", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "http://www.schs.state.nc.us/SCHS/"
                                }, 
                                "producerLogoURL": {
                                    "typeName": "producerLogoURL", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "http://www.schs.state.nc.us/SCHS/images/schslogo2.gif"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "productionDate", 
                        "multiple": false, 
                        "typeClass": "primitive", 
                        "value": "1977"
                    }, 
                    {
                        "typeName": "distributor", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "distributorName": {
                                    "typeName": "distributorName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "Odum Institute for Research in Social Science"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "timePeriodCovered", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "timePeriodCoveredStart": {
                                    "typeName": "timePeriodCoveredStart", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "1976-01-01"
                                }, 
                                "timePeriodCoveredEnd": {
                                    "typeName": "timePeriodCoveredEnd", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "1976-12-31"
                                }
                            }
                        ]
                    }, 
                    {
                        "typeName": "kindOfData", 
                        "multiple": true, 
                        "typeClass": "primitive", 
                        "value": [
                            "Numeric"
                        ]
                    }, 
                    {
                        "typeName": "series", 
                        "multiple": false, 
                        "typeClass": "compound", 
                        "value": {
                            "seriesName": {
                                "typeName": "seriesName", 
                                "multiple": false, 
                                "typeClass": "primitive", 
                                "value": "North Carolina Vital Statistics"
                            }
                        }
                    }
                ]
            }, 
            "geospatial": {
                "displayName": "Geospatial Metadata", 
                "fields": [
                    {
                        "typeName": "geographicCoverage", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "country": {
                                    "typeName": "country", 
                                    "multiple": false, 
                                    "typeClass": "controlledVocabulary", 
                                    "value": "United States"
                                }
                            }, 
                            {
                                "otherGeographicCoverage": {
                                    "typeName": "otherGeographicCoverage", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "North Carolina"
                                }
                            }
                        ]
                    }
                ]
            }
        }

Clean version (3,446 bytes)

{
    "citation": {
        "title": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976", 
        "author": [
            {
                "authorName": "State Center for Health Statistics"
            }
        ], 
        "datasetContact": [
            {
                "datasetContactName": "David Sheaves", 
                "datasetContactEmail": "david_sheaves@unc.edu"
            }
        ], 
        "dsDescription": [
            {
                "dsDescriptionValue": "<p>The North Carolina State Center for Health Services (SCHS) collects yearly vital statistics. The Odum Institute holds vital statistics beginning in 1968 for births, fetal deaths, deaths, birth/infant deaths, marriages and divorce. Public marriage and divorce data are available through 1999 only.</p><p>We have created a consolidated birth/infant death file that contains records of deaths occurring during the first year of life. Each such death record has been matched with a corresponding birth\nrecord creating a composite record containing information about both events. Users of these consolidated files should be aware that the file year of these data sets refers to the year of birth, not the year of death. For example, the 1970 consolidated birth/infant death file contains records of births occurring during 1970 that ended in an infant death either in 1970 or 1971. For this reason, the number of infant deaths for a particular year as obtained from the consolidated file will not be the same as the number obtained\nfrom the death file for that same year. This difference should especially be kept in mind when using this file in conjunction with the publication Vital Statistics, volume 1. This study focuses on North Carolina birth/infant deaths for 1976. It includes data on the age, education level and marital status of the parents; sex and race of the child; prenatal medical care received; county and hospital of birth; information on the mother's reproductive history including number of previous pregnancies and live births; as well as statistics on the newborn and autopsy information.\n</p> <p>The data is strictly numerical, there is no identifying information given about the parents or child.</p>"
            }
        ], 
        "keyword": [
            {
                "keywordValue": "Births", 
                "keywordVocabulary": "ODUM:INDEX.TERMS"
            }, 
            {
                "keywordValue": "Infant death", 
                "keywordVocabulary": "ODUM:INDEX.TERMS"
            }
        ], 
        "notesText": "Version Date: 1976Version Text: Birth/Infant Death", 
        "producer": [
            {
                "producerName": "State Center for Health Statistics", 
                "producerAbbreviation": "SCHS", 
                "producerURL": "http://www.schs.state.nc.us/SCHS/", 
                "producerLogoURL": "http://www.schs.state.nc.us/SCHS/images/schslogo2.gif"
            }
        ], 
        "productionDate": "1977", 
        "distributor": [
            {
                "distributorName": "Odum Institute for Research in Social Science"
            }
        ], 
        "timePeriodCovered": [
            {
                "timePeriodCoveredStart": "1976-01-01", 
                "timePeriodCoveredEnd": "1976-12-31"
            }
        ], 
        "kindOfData": [
            "Numeric"
        ]
    }, 
    "geospatial": {
        "geographicCoverage": [
            {
                "country": "United States"
            }, 
            {
                "otherGeographicCoverage": "North Carolina"
            }
        ]
    }
}

@bencomp
Copy link
Contributor

bencomp commented Apr 8, 2016

To me this looks related to #2357, as separating the field definitions from the API output would allow the API output to contain only values.

@pdurbin
Copy link
Member

pdurbin commented Apr 8, 2016

Making sure the API is complete.

@evelynPM pointed out that license and terms of access information not appearing in json in #2794

@pdurbin
Copy link
Member

pdurbin commented Nov 9, 2017

I just mentioned this issue at https://groups.google.com/d/msg/dataverse-community/4XsA0Px2H8Q/CgO9OmkMAgAJ and now I realize that it seems to be about JSON output rather than input. Presumably we're want to support the same format in and out.

#3599 is related, having to do with simple edits.

#3859 is also related because people struggle so much with the current complex JSON need to create a dataset with rich metadata. That issue is about at least providing a full example in the API Guide.

@raprasad
Copy link
Contributor Author

raprasad commented Nov 9, 2017

@pdurbin : tangentially related

A simplified JSON output is available through miniverse and is fairly fast*. The original goal of that experiment was to also have it go back into input. (* queries are minimized and results are cached)

Example

The following API endpoints give JSON for the dataset here:

If you have a DOI or dataset id, the JSON should be available for any published dataset.

JSON for that dataset

swagger info

(Caveat: code 2+ years old so somewhat incomplete)

@pdurbin
Copy link
Member

pdurbin commented Nov 9, 2017

@raprasad cool. Did you ever figure out how to validate your JSON format with JSON Schema or similar in either Python or Java?

@raprasad
Copy link
Contributor Author

raprasad commented Nov 9, 2017

I never did it for dataset JSON format in particular but there are many tools around to do it: http://json-schema.org/implementations.html#validators

The repo for converting Dataverse TSV metadata into JSON schemas with validation is here:

The Jeremy Dorn links at the top are useful for getting started in creating a schema or your choice.

@pdurbin
Copy link
Member

pdurbin commented Nov 9, 2017

@raprasad thanks, I just opened this issue: IQSS/json-schema-test#1

@pdurbin
Copy link
Member

pdurbin commented Jul 12, 2018

Great idea. This issue doesn't have a champion. Closing.

@pdurbin pdurbin closed this as completed Jul 12, 2018
@scolapasta scolapasta reopened this Jul 12, 2018
@scolapasta
Copy link
Contributor

This something that we still want and in my opinion, I'd like us to find a wa y to prioritize - it would make our APIs significantly more useful and we have the related new work in #3083 for importing datasets.

@raprasad
Copy link
Contributor Author

raprasad commented Jun 26, 2019

Note: a cleaned up version of the sample code at the top of this ticket* could be dropped into pyDataverse as an option. Ticket added here: https://github.com/AUSSDA/pyDataverse/issues/23

This could be especially helpful for programmatic data discovery, Jupyter notebook users, etc.

@pdurbin
Copy link
Member

pdurbin commented Oct 3, 2022

Now that we have the semantic API, should we close this?

https://guides.dataverse.org/en/5.11.1/developers/dataset-semantic-metadata-api.html

@pdurbin
Copy link
Member

pdurbin commented Oct 23, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants