-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TIND Repository translator #3387
base: master
Are you sure you want to change the base?
Add TIND Repository translator #3387
Conversation
Thanks! A little hard to evaluate this without test cases. Could you use Scaffold (Tools -> Developer -> Translator Editor) and add some? |
@AbeJellinek done! |
TIND Repository.js
Outdated
@@ -42,7 +42,7 @@ const SCHEMA_ORG_ZOTERO_TYPE_MAPPING = { | |||
* @param {string} url The page url | |||
*/ | |||
function detectWeb(doc, url) { | |||
const schemaOrgElement = /** @type {HTMLScriptElement | null} */ ( | |||
const schemaOrgElement = /** @type {HTMLScriptElement | null} */ ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use tabs for indentation, and Scaffold should be saving with tabs.
TIND Repository.js
Outdated
@@ -79,10 +79,58 @@ function doWeb(doc, url) { | |||
return; | |||
} | |||
|
|||
const translator = Zotero.loadTranslator("import"); | |||
translator.setTranslator("802aa72e-80dd-459d-8712-131f6eeccd4c"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be an import translator. Right now this would break all JSON imports (e.g. CSL JSON). Import translators are for data formats shared between multiple translators, like Primo Normalized XML, or data interchange formats that sites might provide exports in or that people might want to import from disk, like BibTeX.
I'm not sure if you turned this into an import translator in order to add tests, but web translators can already have tests.
Can just undo these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you turned this into an import translator in order to add tests
Yes, this is exactly the reason. But I don't have any urls available to add for tests, so how would I go about testing the web translators?
@AbeJellinek I have revised this with the following changes:
|
Sorry, I think there was a misunderstanding. We don't need an import translator here at all, separate or otherwise. We do want a Schema.org translator eventually, and we're working on one (#1092), but it's going to have to be much more complicated than what we have here. This translator should just parse the JSON-LD inline.
Aren't https://cicero.tind.io/record/71 and https://cicero.tind.io/record/128 (from your import tests) URLs we could use for web tests? |
I see, that makes sense.
Yes, these currently work but there is no guarantee of their persistence - they might change content or stop working. I will investigate if we can set up some persistent static pages that could be used for tests, and then consolidate the import translator and the web translator into a single web translator as you request! |
Is there no public-facing production instance anywhere on the internet that we could use as a test case? You said that "There may be many institutions running TIND" - who are they? And which of the four products (ILS, DA, IR, RDM) listed on TIND's homepage does this target? |
If it's IR ("Repository"), that page links to a bunch of instances, like the UN Digital Library. This translator's URL target matches catalog pages on that site, but there's no I'm thoroughly confused! |
@AbeJellinek I can see why this is confusing, sorry about that! This translator will be interesting for TIND sites with Schema.org data on the detailed record page, which is the case for the ILS, IR and RDM flavors of the TIND product. At the moment, the Schema.org metadata is embedded in a script element that looks like this: <script type="application/ld+json">
{
"@context": "https://schema.org",
"@id": "https://digitallibrary.un.org/record/4068194"
}
</script> The script element does not have any id attribute as you point out, but we are in the process of rolling out a new release which adds the id attribute that this translator is using. If you go e.g. here: https://digitallibrary.un.org/record/406819, open the browser dev console, and search for an element that matches the css selector Here is another example record that has a more elaborate Schema.org embedded metadata: https://knowledge.uchicago.edu/record/14212
I can certainly come up with some production instance pages that could be used - I just assumed that it might be problematic from a stability point of view. Although they should be fairly stable, and so this concern is more hypothetical: it could happen that there is downtime, the customer site deletes a record or makes it restricted, the customer changes the metadata in such a way that the test no longer passes, etc. As an alternative, if it works for you, we could also set up some records on an instance that is not in use by a customer, and so we would have more control over the persistence of the metadata. |
Sorry, I'm just not convinced that this is the right approach. We'll have a Schema.org translator eventually, which will handle the fallback case that this is designed for. We already have a TIND ILS translator (which actually matches other TIND products), and we can trivially extend it to use other export formats (MARC, RIS, ...) depending on each repository's most native representation, and to use bits of Schema.org metadata to do things like detect the item type better. (Re instances, we want to test on production instances, not a test instance. The point of the tests is to test a bunch of actual pages on sites that people will want to add from. We want imperfect metadata and edge cases. It's OK if some tests eventually break - nothing happens automatically when a test starts failing, we'll just replace it.) |
@AbeJellinek what you're saying makes a lot of sense. I will try to describe where we are coming from: There are a number of cases where the current translator for TIND sites ends up with an unexpected item type.
Most of the time, the reason is probably that the MARC data does not fully correspond to what the MARC translator expects. This is a fact we have to accept to some degree, while in other cases we might be able to adjust the MARC data model. But the idea in this PR was to use the Schema.org metadata we have, which can more flexibly map from various MARC fields on an instance-by-instance basis. I am on board with this approach not being the right one, however I am still interested in improving the accuracy of item types if possible. You mention:
Would it be of interest to see an addition to the current Library Catalog (TIND ILS) so that it can e.g. use the embedded Schema.org metadata to capture the item type? Essentially, to use everything as-is but have an additional step that looks for Schema.org to discover the item type. |
Hi @AbeJellinek, did you have time to consider my previous comment? |
TIND provides a SaaS solution for libraries and digital repositories to host various materials. There may be many institutions running TIND. This PR introduces a simple translator that should work for most institutions.
There is already an existing translator for the ILS flavor of TIND:
https://github.com/zotero/translators/blob/bc846072a55a679866608b0d16009d2e4cfb8f0b/Library%20Catalog%20(TIND%20ILS).js
That translator uses the MARCXML content directly. The downside to that is that it makes it harder to customize the mapping to the Zotero data model. TIND provides a Schema.org representation of records, which is much more customizable by the institution, and which I think is a more suitable target for translating into the Zotero data model.
Disclaimer: I work for TIND.