Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

880 alternate script handling #7652

Merged
merged 66 commits into from
Apr 6, 2023

Conversation

hornc
Copy link
Collaborator

@hornc hornc commented Mar 15, 2023

Closes #7264

  • Adds test for Don't import MARC 250$6 as part of edition name #7617
  • Extracts alternate script titles to Edition title, and transliterations to other_titles (justification: edition record is for the item as printed -- the alternate script is likely what is printed on the book, this is the most unambiguous identifier)
  • Extracts alternate script Author names to alternate_names, with the transliteration as name (justification: the author record is an abstraction representing an individual who may have used multiple names and forms over time, which stands independently of how their name was presented on a specific book. We need to pick an identifier, and using the catalog language is not inappropriate)
  • Uses standalone 880 fields to obtain a publisher name if no transliterated one exists. (part of Alternate script fields (880) not extracted from MARC imports #7264)
  • De-duplicates series after stripping punctuation
  • Refactoring around the MarcBinary and MarcXml classes
  • Adds functionality, tests, and documentation, while deleting lots of code ;)

To add:

Examples from #7264

Technical

Testing

Screenshot

Stakeholders

@tfmorris

@hornc hornc force-pushed the 880_alternate_scripts branch from 5fc32b2 to 3a51ad8 Compare March 15, 2023 04:56
@hornc
Copy link
Collaborator Author

hornc commented Mar 15, 2023

RESOLVED

@cclauss I'm not sure about these Ruff PLC1901 messages, e.g.

PLC1901 `i.attrib['code'] == ''` can be simplified to `not i.attrib['code']` as an empty string is falsey

I don't want to change those blindly as 0, False, and None are all falsey too, but aren't empty strings.

@hornc hornc force-pushed the 880_alternate_scripts branch from 225755c to e14a259 Compare March 15, 2023 07:37
@hornc hornc changed the title 880 alternate script handling WIP: 880 alternate script handling Mar 15, 2023
@tfmorris
Copy link
Contributor

@mekarpeles @hornc This PR shouldn't have been closed when the other PR mentioning it was merged.

@cclauss cclauss reopened this Mar 15, 2023
@hornc hornc force-pushed the 880_alternate_scripts branch from e14a259 to 23317e2 Compare March 15, 2023 21:54
@hornc hornc force-pushed the 880_alternate_scripts branch from 8d262f0 to f8878e4 Compare March 16, 2023 02:16
@hornc hornc force-pushed the 880_alternate_scripts branch from 628964c to ceb26ff Compare March 16, 2023 04:13
Co-authored-by: Christian Clauss <cclauss@me.com>
@hornc hornc changed the title WIP: 880 alternate script handling 880 alternate script handling Mar 22, 2023
@hornc hornc marked this pull request as ready for review March 22, 2023 03:47
Copy link
Contributor

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I'm sorry I don't have time to do the review justice, but I tried to comment on a few things that caught my eye. I'll try to circle back later and do a more thorough review of the new test cases (which look awesome!)

Copy link
Contributor

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placate mypy...

hornc and others added 4 commits March 24, 2023 19:56
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
@mekarpeles mekarpeles self-assigned this Mar 27, 2023
@hornc hornc requested a review from cclauss March 29, 2023 20:29
Copy link
Contributor

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hornc
Copy link
Collaborator Author

hornc commented Mar 30, 2023

@tfmorris, thanks for your reviews so far. I've tried to address everything, and split of separate pieces into their own issues to be worked on as follow ups. Is there anything else I should address immediately of this PR?

Copy link
Contributor

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to dig up some additional test data to cover s.l., etc but I think it's fine to merge as is and handle that later. It's a great improvement and I really appreciate all the work.

@tfmorris
Copy link
Contributor

Here are a couple of test cases for the latinized publish notation, in case they're still useful
Archive.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternate script fields (880) not extracted from MARC imports
5 participants