Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add further tests for sn, sl, nd abbreviations #7767

Merged
merged 4 commits into from
Apr 12, 2023

Conversation

hornc
Copy link
Collaborator

@hornc hornc commented Apr 6, 2023

Closes #

Using test data supplied by @tfmorris in #7652 (comment)

Technical

The original test-publish-sn-sl.mrc test record has the place $a and publisher name $b swapped. I "corrected" it in the test expectations. I should make it case-insensitive, but I don't think we should have the swapped fields in the test example. I was planning to swap them back.

Original MARC:

00764cam a2200241Ia 4500
001 5415173
005 20221110033004.0
008 050902s190u    xx            000 0 eng d
035    $a (OCoLC)ocm61406084
035    $a (NNC)5415173
035    $a 5415173
040    $a ZCU $c ZCU
043    $a a-tu---
050  4 $a BV3170 $b .B65 1900
100 1  $a Bliss, E. E.
245 10 $a Indirect results of missionary labor in northern Turkey / $c by E.E. Bliss, D.D., of Constantinople.
260    $a [S.n.] : $b [s.l.], $c [between 1900 and 1909]
300    $a 7 pages ; $c 24 cm
336    $a text $b txt $2 rdacontent
337    $a unmediated $b n $2 rdamedia
500    $a Caption title.
650  0 $a Missions $z Turkey. $0 http://id.loc.gov/authorities/subjects/sh2010102169
852 80 $b uts,mrlxxp $h 1884

I plan to change the 260 line to:
260 $a [s.l.] : $b [S.n.], $c [between 1900 and 1909]

So it becomes a correct field inupt, but case insensitive, case.

The other test case skips the date entirely, but that seems to be existing behaviour. I think it's checking for a date in 008, but I'll need to figure out exactly how this works. The current behaviour of not adding a date at all (because 008 is empty) doesn't seem entirely wrong. Maybe no date is ok in this case?

Testing

Screenshot

Stakeholders

@tfmorris

@tfmorris
Copy link
Contributor

tfmorris commented Apr 6, 2023

Thanks!

I plan to change the 260 line to:
260 $a [s.l.] : $b [S.n.], $c [between 1900 and 1909]

Oops, sorry about that! I was just scanning MARC files for my search strings and didn't cast a critical eye over the results. I agree with your correction.

I think this is the offending code for the 008:

if publish_date.isdigit() and publish_date != '0000':
edition["publish_date"] = publish_date

It is rejecting the value "190u" and the corresponding 260$c of "[between 1900 and 1909]" isn't being picked either. One or the other should be included.

In particular, .isdigit() should be replaced with the regex r"^[0-9]+u*$"

@hornc
Copy link
Collaborator Author

hornc commented Apr 11, 2023

@tfmorris Thanks for taking a look -- I've implemented your regex to extract the 190u date and the abbreviations are now case insensitive. This is ready for final review.

Copy link
Contributor

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@hornc hornc merged commit 757ee73 into internetarchive:master Apr 12, 2023
@hornc hornc deleted the sn-sl-nd branch April 12, 2023 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants