-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better detection test for whether a file is ingested #113
Comments
My first thought is to get https://demo.dataverse.org/api/search?q=fileId:1734017 and find a |
@pdurbin this looks promising. Our function Here is the query confirming that at least the even though this is a search, for the purpose of this issue I'd need to have it be a strict match on the file id. (that is, return a single entry if the file id exists, and return 0 results if the file id does not exist). |
Add-on Re:
We would also need to have a method that can get the dataset JSON with only the file DOI (persistentID) in hand. (to use in |
@kuriwaki huh. https://dataverse.harvard.edu/api/search?q=id:datafile_3123547 |
I don't think that "MHDB0O" file is indexed. https://dataverse.harvard.edu/api/search?q=id:datafile_1734017 should find it but it doesn't. Can you please open an issue in https://github.com/IQSS/dataverse.harvard.edu/issues about this? For a file that is properly indexed, like the CCES file we've been talking about ( https://dataverse.harvard.edu/api/search?q=id:datafile_3123547 ), you should be able to search for it by DOI like this (not the quotes around the DOI): https://dataverse.harvard.edu/api/search?q=filePersistentId:%22doi:10.7910/DVN/GDF6Z0/JPMOZZ%22 |
(for numeric id's)
This is great. The following three examples work as intended - they give me the single entry. I will try implementing it on library(dataverse)
# rds
dataverse_search(id = "datafile_1734017", server = "demo.dataverse.org", type = "file")$name
# CCES problematic dta
dataverse_search(id = "datafile_3123547", server = "dataverse.harvard.edu", type = "file")$name
# other dataverse
dataverse_search(id = "datafile_204446", server = "dataverse.nl", type = "file")$name |
That actually came from the demo dataverse, not Harvard dataverse. This one works great: https://demo.dataverse.org/api/search?q=id:datafile_1734017
Thank you. This seems to work in the two examples below, with the quotes escaped # CCES
dataverse_search(filePersistentId = "\"doi:10.7910/DVN/GDF6Z0/JPMOZZ\"", server = "dataverse.harvard.edu")$name
# demo.dataverse
dataverse_search(filePersistentId = "\"doi:10.70122/FK2/HXJVJU/SA3Z2V\"", server = "demo.dataverse.org")$name |
The current method to detect whether something
is_ingested
, introduced in v0.3.0 is problematic: It only checks if there is a metadata file associated with the fileid. But I guess some files, e.g. those that have ingestion warnings, don't have a metadata file. This can cause the wrong download format as in #80.If I have a dataset id or name, I now know how to check whether something is ingested: check if the entry
originalFileFormat
exists (e.g. this JSON).However, in the particular stage of the client, I sometimes don't have a dataset identifier, only the numeric fileid + server. This happens for example with
get_*_by_doi
where the user only provides a file DOI. @landreev pointed out that the Dataverseapi/files
API apparently does not contain info likeoriginalFileFormat
, perhaps for legacy reasons.For now, what is the best way to access the parent dataset JSON with only the numeric file in hand? (@pdurbin ?). In the above example, how would I obtain the dataset id
doi:10.70122/FK2/PPIAXE
only by knowing fileid=1734017
andserver = demo.dataverse.org
?The text was updated successfully, but these errors were encountered: