-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use author identifiers in import API #10110
base: master
Are you sure you want to change the base?
feat: use author identifiers in import API #10110
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ver hardcoded IDs
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open questions:
key
vsol_id
in author import recordremote_ids
vsidentifiers
in author import record- ^ For both of these, since there are subtle differences between eg
remote_ids
(authors, Dict[str, str]) andidentifiers
(works/editions, Dict[str, list[str]]), I think it might be easiest if we re-use the shape of our existing open library records. Soremote_ids: dict[str,str]
for authors, andkey
to hold the open library key.
- ^ For both of these, since there are subtle differences between eg
- Should any identifier conflicts cause import error?
- As a first stab, let's err on precaution, and error on any identifier conflicts.
return authors | ||
|
||
# Look for OL ID first. | ||
if (key := author.get("ol_id")) and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to name this one as key
to be consistent with our book/thing records. Having the import endpoint mirror the shape of our core book records is convenient.
if (key := author.get("ol_id")) and ( | |
if (key := author.get("key")) and ( |
) | ||
): | ||
# Always match on OL ID, even if remote identifiers don't match. | ||
return get_redirected_authors([web.ctx.site.get(k) for k in reply]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use get_many
here to get them all in one db request. You might have to wrap the result in list()
.
return get_redirected_authors([web.ctx.site.get(k) for k in reply]) | |
return get_redirected_authors(list(web.ctx.site.get_many(reply)) |
matched_authors = [] | ||
# Get all the authors that match any incoming identifier. | ||
for identifier, val in identifiers.items(): | ||
queries.append({"type": "/type/author", f"remote_ids.{identifier}~": val}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this piece we want the exact match for the identifier; the ~
will parse things like *
as wildcards. eg identifier~=foo*
will find all authors with IDs starting with foo
.
queries.append({"type": "/type/author", f"remote_ids.{identifier}~": val}) | |
queries.append({"type": "/type/author", f"remote_ids.{identifier}": val}) |
for query in queries: | ||
if reply := list(web.ctx.site.things(query)): | ||
matched_authors.extend( | ||
get_redirected_authors([web.ctx.site.get(k) for k in reply]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another spot for get_many
get_redirected_authors([web.ctx.site.get(k) for k in reply]) | |
get_redirected_authors(list(web.ctx.site.get_many(reply)) |
matched_authors.extend( | ||
get_redirected_authors([web.ctx.site.get(k) for k in reply]) | ||
) | ||
matched_authors = uniq(matched_authors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how Thing
implements hashable (since uniq
puts the items in a set
), but this might be slightly more performant.
matched_authors = uniq(matched_authors) | |
matched_authors = uniq(matched_authors, key=lambda thing: thing.key) |
"death_date": author.death_date, | ||
**( | ||
{"birth_date": author.birth_date} | ||
if author.birth_date is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if author.birth_date is not None | |
if author.birth_date |
), | ||
**( | ||
{"death_date": author.death_date} | ||
if author.death_date is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if author.death_date is not None | |
if author.death_date |
{ | ||
"identifiers": author.identifiers, | ||
} | ||
if len(author.identifiers.keys()) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(author.identifiers.keys()) > 0 | |
if author.identifiers |
if len(author.identifiers.keys()) > 0 | ||
else {} | ||
), | ||
**({"ol_id": author.ol_id} if author.ol_id is not None else {}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**({"ol_id": author.ol_id} if author.ol_id is not None else {}), | |
**({"ol_id": author.ol_id} if author.ol_id else {}), |
val = obj[id]["value"] | ||
if id == "youtube" and val[0] != "@": | ||
val = f'@{val}' | ||
contributor.identifiers[id] = obj[id]["value"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contributor.identifiers[id] = obj[id]["value"] | |
contributor.identifiers[id] = val |
This should be squash merged
Corresponding model update pr: internetarchive/openlibrary-client#419
This strictly expands the import schema.
It is not a breaking change.
Import records that don't include author IDs will continue to work as they currently do.
Closes #9448
Closes #9411
Technical
Issues:
Importing books is successful and matching authors are being found and used as expected, however navigating to the author's page from that new book's page does not show that new book on the author's page.Solr updater delay, it appeared after a while!Testing
I put the entire output of the wikisource script into /import/batch/new.
Stakeholders
@cdrini @Freso
Attribution Disclaimer: By proposing this pull request, I affirm to have made a best-effort and exercised my discretion to make sure relevant sections of this code which substantially leverage code suggestions, code generation, or code snippets from sources (e.g. Stack Overflow, GitHub) have been annotated with basic attribution so reviewers & contributors may have confidence and access to the correct context to evaluate and use this code.