-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulkload publication citations #1565
Comments
Agree! I have a similar list that I have been plowing through as time permits (Ha! Ha!), which means they never get done...A bulkloader would be fabulous! |
I need two things:
Here's the target.
Code tables are
The NOT NULL things are required. Agents are optional (so NOT NULL is "required if there's an agent"). We do have a "require at least one" rule in the UI, I believe implemented to facilitate search. (I can't find the Issue - maybe it's in some AWG notes?) DOIs are critical in detecting duplicates and linking to funding and all that jazz. I'm not sure how successfully I could extract them from those example data - there is a LOT of variation in formatting. I checked a few that don't have DOIs, and the publications all seem to have DOI. Can someone run this through some bibliography tool and see if there's any magic there? @mkoo Dealing with duplicate publications is a huge mess; they almost inevitable each end up holding part of the citations and authors never quite line up and etc. I'm not quite sure how to avoid that with a bulkloader. Given DOIs I can pull publication details from CrossRef as well. It's also a mess, but it's the mess everyone uses. In the data above:
(Note "Ôwhite-headedÕ" - there's some sort of characterset conversion failure in these data.) vs. from CrossRef via DOI:
I think the first step is probably experimenting with bibliography tools - can anything deal with these data, and how does it format the output from that? |
One way to potentially reduce duplicates would be to parse out some of the citation. We could leave the full citation field, but if we added something like: PUBLICATION_TITLE Maybe it would be easier to pick out when someone is attempting to add a duplicate? |
We had that structure WAY back in the day and got rid of it for simplicity. I don't think it ever did anything very useful (there are still about 800 ways to represent most titles, especially those with formatting), it was constant work to add to the code tables (journal name etc), and even that got duplicates fairly often, there were long discussions about publishers changing names and what to do with "gray literature," putting it back together into a "citation" that might be found outside of Arctos was near impossible, etc. I'm less than enthusiastic about reintroducing any of that. Those kinds of data from CrossRef are often a mess too, but with DOIs that's also (mostly) irrelevant - the DOI itself gets you where you need to be. I REALLY like DOIs, and I really dislike dealing with publications without them. Maybe "we" (whoever that is!) should explore a partnership with BHL. They obviously have some relationship with crossref, they're assigning DOIs to old publications (https://doi.org/10.5962/bhl.title.327), perhaps we could enter (maybe via webservice - I have no idea what's possible) publications without DOIs there and require DOIs on the Arctos side? |
I like this idea a lot, unfortunately "we" are already overtaxed! This seems like a natural partnership though. I will see if I can find a BHL contact and just ask the question... |
It's a good idea, but I've spoken to BHL representatives twice over the
past year about help with various projects, and their answer has always
been "great, but we can't provide any funding or support." I would contact
them, but not have that be our only option.
…On Thu, Jun 14, 2018 at 8:59 AM, Teresa Mayfield ***@***.***> wrote:
Maybe "we" (whoever that is!) should explore a partnership with BHL
I like this idea a lot, unfortunately "we" are already overtaxed! This
seems like a natural partnership though. I will see if I can find a BHL
contact and just ask the question...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1565 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOH0hDvm8qbJ4u61b7KVvIB0iwvMHe5Hks5t8npugaJpZM4UlGsq>
.
|
We could do a joint grant application to IMLS.... Email sent today: Thank you, Teresa Mayfield-Meyer |
A data entry form (at BHL - could be webservice, API, form, WHATEVER - I don't think that matters at all) that returns a DOI would be amazing. I'd absolutely push for that to be our only option. (See below - I'd push harder now!) Duplicates can still happen, but from there it's crossref's problem (we'd just have multiple DOIs that point to the same publication - not ideal, but still works). Some of those ~6K DOIless publications are field notes and such, but I think we could offer them at least a couple thousand publications. All we'd need from them is access to whatever they use to apply DOIs to old publications, and as above I think there's a great deal of flexibility in how we do that. Recent publications are a complete mess. DOIs aren't being entered - the proportion with keeps shrinking.
I found https://arctos.database.museum/publication/10006284 yesterday - I have no idea what's going on there, but it shouldn't have happened. https://arctos.database.museum/publication/10007115 has no DOI, but a (flaky) URL to a page with a DOI in "storage location." Annnd there are hundreds more, including malformed DOIs, good DOIs, things that lead to but are not DOIs, and even some actual storage location data! Can we talk about who has access to publications, or training, or something? This should not happen. I'm scared to look in remarks. I'm not sure if a bulkloader might make it better or worse. (Unless BHL saves the day, maybe we should route everything through a bulkloader - like specimen records - so we can check this sort of thing before we let it in??)
More problem pubs at #1570 |
Is there still interest in this? I still need what's requested in #1565 (comment) if so, if not please close. |
If this gets revived, it should not proceed before ArctosDB/dev#41. |
It would be great if we could bulkload publication.
And import publications from a bibliography.
Attached is an example of the publications we need to get entered. Some have a DOI listed and some do not.
Papers using UAM birds by date working Apr 2018.txt
The text was updated successfully, but these errors were encountered: