Update OpenAlex translators #3379

thebluepotato · 2024-10-24T16:34:38Z

In OpenAlex JSON.js:

Fixed the JSON translator expecting keys that were sometimes null (such as title or primary_location).
Saves the OpenAlex identifier in the Extra field instead of the whole URL
Saves the item's URL as well
Adds a DOI fallback in case the item's data was seriously lacking

In OpenAlex.js:

Add support for detecting and parsing multiple identifiers at once
Improve OpenAlex ID matching and cleaning

AbeJellinek · 2024-11-19T18:11:09Z

.ci/eslint-plugin-zotero-translator/lib/rules/test-cases.js

@@ -114,7 +114,7 @@ module.exports = {
 					}
 					else if (testCase.type === 'search') {
 						// console.log(JSON.stringify(testCase.input))
-						const expected = ['DOI', 'ISBN', 'PMID', 'arXiv', 'identifiers', 'contextObject', 'adsBibcode', 'ericNumber', 'openAlex'];
+						const expected = ['DOI', 'ISBN', 'PMID', 'arXiv', 'identifiers', 'contextObject', 'adsBibcode', 'ericNumber', 'OpenAlex'];


Why are we changing this back?

I thought I'd align with the way it's usually written and displayed. For instance, arXiv is not lowercase, DOI is uppercase, etc. I'm fine with anything in the end, but wouldn't it be better if the label was case insensitive?

The non-camelcase key names for DOI, ISBN, and PMID have caused us some grief, which is why we haven't done that with adsBibcode (ADS Bibcode) or ericNumber (ERIC number).

AbeJellinek · 2024-11-19T18:12:12Z

OpenAlex.js

+		if (items[i].OpenAlex && (oaID = cleanOpenAlexID(items[i].OpenAlex))) {
+			oaIDs.push(oaID);
+		}
+		else if (items[i].openAlex && (oaID = cleanOpenAlexID(items[i].openAlex))) {
+			oaIDs.push(oaID);
+		}
+		else if (typeof items[i] == 'string' && (oaID = cleanOpenAlexID(items[i]))) {
+			oaIDs.push(oaID);
+		}


Absolutely no need to support multiple capitalizations or string inputs. Callers just need to pass valid input, which is of the form { openAlex: '...' }

I'm happy to support only one capitalisation (preferably OpenAlex). However for just the string, I believe the arXiv translator at least also supports strings in addition to objects.

However for just the string, I believe the arXiv translator at least also supports strings in addition to objects.

Some search translators that have been around for a very long time do, but arXiv doesn't, at least as of the current version. Zotero will always pass a single object, so there's no need to handle anything but single objects anymore.

(The only exception is if we want to allow another translator to call the search translator in some specific nonstandard way, but I think there's usually a way to design around that.)

OpenAlex.js

AbeJellinek · 2024-11-19T18:14:32Z

OpenAlex JSON.js

+	let openAlexID = data.ids.openalex.match(/W\d+$/i)[0];
+	item.setExtra("OpenAlex", openAlexID);
+	item.libraryCatalog = "OpenAlex";


Reasoning for this? I think we settled on using openalex.org/ URLs as canonical identifiers.

I don't know what you settled on in the past, but my suggestion is to focus on the identifier. Sure, OpenAlex stores them with the URL, but even their documentation shows that the actual identifier for works is just the part that starts with W. In the end, it's the same reason why we canonically store just the identifier for arXiv or DOI, even though you can build the URLs with them. Lastly, it's just more terse and legible IMO. In any event, the translator supports searching with both.

The reason we used the version with domain is that that's defined as the OpenAlex ID by OpenAlex:
https://docs.openalex.org/how-to-use-the-api/get-single-entities#the-openalex-id

The OpenAlex ID is the primary key for all entities. It's a URL shaped like this: https://openalex.org/<OpenAlex_key>. Here's a real-world example:

https://openalex.org/W2741809807

I'm not supper opposed to handling this differently, and I can see this looks like it's inconsistent, but for DOIs, PMIDs, and arXiv IDs, the ID explicitly does not include the URL schema, so this isn't the same.

Would it help if we internally agreed that we're not storing the OpenAlex ID per se but the OpenAlex key?

That's what we would be doing, but I'm not convinced going with the key is the right choice. Legibility is obviously a matter of taste, but storing identifiers in a format in which they're self-identifying seems like a good idea: DOIs and ISBNs are to by virtue of their format; arXiv IDs canonically include the arxiv: namespace (yes, we do store the openAlex: label in Extra, but that's as part of a key:value pair, not as part of the identifier). PMIDs are a constant problem. OpenAlex keys are marginally better because of the W, but only marginally so.

Let's stick with the URLs for backwards compatibility and disambiguation. We can allow searching by bare W[...] IDs, of course.

AbeJellinek · 2024-11-19T18:15:12Z

Thanks! Most looks good, but I'm confused about a few changes (above).

thebluepotato · 2024-11-22T08:48:57Z

Regarding the label: to give more context: I contributed to the diegodlh/zotero-cita#300 extension, where we fetch and store identifiers and references from multiple sources, including OpenAlex. There, we display the identifiers (both "officially" supported ones like arXiv, DOI, ISBN and unofficial ones like OpenAlex, Semantic Scholar's CorpusID, etc.) in editable fields, with a button to open the corresponding URL (just like the standard DOI field). These extra identifiers are stored in the Extra field, and there, capitalization should be somewhat uniform. ~~My eyes bleed~~ I'm slightly put off seeing openAlex: xxx there instead of OpenAlex: xxx.

Regarding storing the URL, the documentation is sometimes a little vague about the key/ID distinction. As they do consider the Key the "unique primary key that identifies a given resource in our database", I think the value stored should definitely be that one. For ease-of-use of the URL, as I mentioned in our use case, the extension constructs the link automatically (just like Zotero does for the DOI). As to the "recognizability" of the identifier, I see your point, but I'm not sure that I agree with regards to "disregarding" the field label. If I know nothing of English literature, seeing the field value "Jane Eyre" I would identify it as an author, and not a book title. Not to go all too hermeneutical, but context is not only helpful to give meaning, but necessary. In other words, I believe the label to be essential to give meaning to most field values, and not just some internal key in a dictionary. Lastly, while it's true that W isn't much of a distinguishing factor, it is one and as soon as you see one or two (with the field label), the association is made in your head that this is an OpenAlex key.

Sorry if I'm sounding/being a bit anal, in the end it won't change the world whatever you/we decide, I just wanted to clarify my reasoning. I'll perform the other changes requested ASAP

AbeJellinek · 2024-11-22T19:23:25Z

These extra identifiers are stored in the Extra field, and there, capitalization should be somewhat uniform. My eyes bleed I'm slightly put off seeing openAlex: xxx there instead of OpenAlex: xxx.

Totally fine - we can store the ID in Extra as OpenAlex: [...] while still using openAlex as the search field. We use Title Case in Extra and camelCase in the field name itself.

thebluepotato added 4 commits October 24, 2024 18:28

Update OpenAlex translators

628b570

Merge remote-tracking branch 'upstream/master' into openalex

5b855de

Cleanup stray comments

3b440ae

Remove DOI fallback due to lack of benefit

f7e71be

AbeJellinek requested changes Nov 19, 2024

View reviewed changes

Cleanup

b95bc2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update OpenAlex translators #3379

Update OpenAlex translators #3379

thebluepotato commented Oct 24, 2024

AbeJellinek Nov 19, 2024

thebluepotato Nov 19, 2024

AbeJellinek Nov 22, 2024

AbeJellinek Nov 19, 2024

thebluepotato Nov 19, 2024

AbeJellinek Nov 22, 2024 •

edited

Loading

AbeJellinek Nov 19, 2024

thebluepotato Nov 19, 2024

adam3smith Nov 19, 2024

thebluepotato Nov 19, 2024

adam3smith Nov 19, 2024

AbeJellinek Nov 22, 2024

AbeJellinek commented Nov 19, 2024

thebluepotato commented Nov 22, 2024

AbeJellinek commented Nov 22, 2024 •

edited

Loading

Update OpenAlex translators #3379

Are you sure you want to change the base?

Update OpenAlex translators #3379

Conversation

thebluepotato commented Oct 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AbeJellinek Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AbeJellinek commented Nov 19, 2024

thebluepotato commented Nov 22, 2024

AbeJellinek commented Nov 22, 2024 • edited Loading

AbeJellinek Nov 22, 2024 •

edited

Loading

AbeJellinek commented Nov 22, 2024 •

edited

Loading