-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rOpenGov/estc?? #15
Comments
Hi ! Yes the repository was permanently moved to http://github.com/COMHIS/estc very recently and we are still updating all cross-linkings. Apoloiogies for the hassle. Let us know if we can provide support, |
However, note that this code and analyses relies on data that is not public. We got the data via confidential collaboration agreement. Therefore, the estc repository itself has mostly information value but does not allow reproducing the analysis in the paper, unless you have your own copy of the data. |
Thanks,
I have the complete ESTC already--UC Riverside was kind enough to provide
me with it. I appreciate you sending me the new link to the repo. I've
written some pretty kludgey Perl scripts (
https://github.com/bhughesshelton/ESTC/blob/master/ESTCmeta.pl) to handle
the data, but am looking forward to seeing what your code can do. Any
suggestions for normalizing historical name data?
Cheers,
Barry
…On Mon, Nov 13, 2017 at 11:31 AM, Leo Lahti ***@***.***> wrote:
However, note that this code and analyses relies on data that is not
public. We got the data via confidential collaboration agreement.
Therefore, the estc repository itself has mostly information value but does
not allow reproducing the analysis in the paper, unless you have your own
copy of the data.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARJm_SvpzfFwlE_y3mFtabbFd6SKXccCks5s2G71gaJpZM4QcCS7>
.
|
Thanks for your interest! We are now reorganizing the code and the complete workflow is at the moment not replicable for various technical reasons. The aim is to really get this set up for the complete data cleaning process and we are working on it. If you are interested in specific fields, I can see what we could do. Do you refer to historical person names, place names, or something else ? We would like You to kindly cite the work where appropriate. |
I've been trying to find some way to deal with personal names, specifically
normalizing the spelling of the proper names that I'm grepping out of the
MARC 260 fields in the ESTC. I just spent a few hours doing it by hand in
an excel sheet, and finished the period up to 1641--everything covered by
the original STC. Even though I work mostly in Perl, I'm pretty good with R
as well, so let me know if you guys ever need any help. I'd be glad to
contribute in any way I can.
…On Mon, Nov 13, 2017 at 6:03 PM, Leo Lahti ***@***.***> wrote:
Thanks for your interest! We are now reorganizing the code and the
complete workflow is at the moment not replicable for various technical
reasons. The aim is to really get this set up for the complete data
cleaning process and we are working on it.
If you are interested in specific fields, I can see what we could do. Do
you refer to historical person names, place names, or something else ?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARJm_Xc5SCSnAJnpJ2YbVtFhmPDX_eY-ks5s2Mq7gaJpZM4QcCS7>
.
|
I'm actually working on this aspect of the ESTC right now. Out of curiosity, what is your goal of normalizing spelling? Having unique identifiers for each author? |
Great to hear! Might be useful to compare the matchings up to 1641 at least as our procedure is largely automated whereas yours seems to be manual. This would provide some quality control. It would also be helpful to check through our lists to spot possible mistakes. This is now ongoing and presumably ready rather soon. |
Well, the names of authors are already (mostly) standardized in the
ESTC--what I've done is extract and standardize the names of the other
people associated with each text: printers, publishers and booksellers. And
yes, I then assigned everyone a UUID, finally tying that number back to the
UUID for each text, i.e. the ESTC number.
…On Wed, Nov 15, 2017 at 10:44 AM, drmhill ***@***.***> wrote:
I'm actually working on this aspect of the ESTC right now. Out of
curiosity, what is your goal of normalizing spelling? Having unique
identifiers for each author?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARJm_aESqpMy-ISWffEMXao6v3yG-ogiks5s2wb6gaJpZM4QcCS7>
.
|
Right, developing an automated process for historical name disambiguation
would be almost impossible (and you'd end up with loads of mistakes). What
I've done is run some fancy pattern matching routines across the MARC
records to extract the information I was interested in, and drive into a
relational db. Then I was able to open an JDBC connection between that db
and a spreadsheet where I could sort the names and copy/paste the most
frequent spelling over the less frequent ones.
…On Wed, Nov 15, 2017 at 10:46 AM, Leo Lahti ***@***.***> wrote:
Great to hear! Might be useful to compare the matchings up to 1641 at
least as our procedure is largely automated whereas yours seems to be
manual. This would provide some quality control. It would also be helpful
to check through our lists to spot possible mistakes. This is now ongoing
and presumably ready rather soon.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARJm_WbHUMSgjVKdTtl29_NRuO0Ugdm5ks5s2wdEgaJpZM4QcCS7>
.
|
Yes that's the key & what we do as well: automate as much as possible, and do the rest by hand. But some degree of automation is crucial here. |
Hi,
I was just wondering what happened to your repo over at http://github.com/rOpenGov/estc. I'm doing some computational bibliography and found your article, "A Quantitative Study of History in the English Short-Title Catalogue (ESTC), 1470-1800." I had a look at the source code a few weeks ago, but the repo seems to be gone now. Is there any way you can send me the src or allow me to fork the repo? Apologies if this isn't the right venue for this kind of question.
The text was updated successfully, but these errors were encountered: