Add TIND Repository translator #3387

thms-rmb · 2024-11-15T10:09:56Z

TIND provides a SaaS solution for libraries and digital repositories to host various materials. There may be many institutions running TIND. This PR introduces a simple translator that should work for most institutions.

There is already an existing translator for the ILS flavor of TIND:

https://github.com/zotero/translators/blob/bc846072a55a679866608b0d16009d2e4cfb8f0b/Library%20Catalog%20(TIND%20ILS).js

That translator uses the MARCXML content directly. The downside to that is that it makes it harder to customize the mapping to the Zotero data model. TIND provides a Schema.org representation of records, which is much more customizable by the institution, and which I think is a more suitable target for translating into the Zotero data model.

Disclaimer: I work for TIND.

AbeJellinek · 2024-11-19T18:20:34Z

Thanks! A little hard to evaluate this without test cases. Could you use Scaffold (Tools -> Developer -> Translator Editor) and add some?

thms-rmb · 2024-11-20T09:55:32Z

Thanks! A little hard to evaluate this without test cases. Could you use Scaffold (Tools -> Developer -> Translator Editor) and add some?

@AbeJellinek done!

AbeJellinek · 2024-11-21T16:57:27Z

TIND Repository.js

@@ -42,7 +42,7 @@ const SCHEMA_ORG_ZOTERO_TYPE_MAPPING = {
 * @param {string} url The page url
 */
 function detectWeb(doc, url) {
-	const schemaOrgElement = /** @type {HTMLScriptElement | null} */ (
+    const schemaOrgElement = /** @type {HTMLScriptElement | null} */ (


We use tabs for indentation, and Scaffold should be saving with tabs.

AbeJellinek · 2024-11-21T17:03:11Z

TIND Repository.js

@@ -79,10 +79,58 @@ function doWeb(doc, url) {
        return;
    }

+    const translator = Zotero.loadTranslator("import");
+    translator.setTranslator("802aa72e-80dd-459d-8712-131f6eeccd4c");


This shouldn't be an import translator. Right now this would break all JSON imports (e.g. CSL JSON). Import translators are for data formats shared between multiple translators, like Primo Normalized XML, or data interchange formats that sites might provide exports in or that people might want to import from disk, like BibTeX.

I'm not sure if you turned this into an import translator in order to add tests, but web translators can already have tests.

Can just undo these changes.

I'm not sure if you turned this into an import translator in order to add tests

Yes, this is exactly the reason. But I don't have any urls available to add for tests, so how would I go about testing the web translators?

thms-rmb · 2024-12-04T10:19:52Z

@AbeJellinek I have revised this with the following changes:

Use tabs for indentation
TIND Repository is purely a web translator, not an import translator anymore
Added a separate Schema.org import translator with actual translation logic and tests
Schema.org import translator looks specifically for Schema.org @type property and specific values in it, so should not break other JSON imports
TIND Repository is now a small wrapper that - apart from detectWeb - simply uses Schema.org import translator

AbeJellinek · 2024-12-04T18:05:24Z

Sorry, I think there was a misunderstanding. We don't need an import translator here at all, separate or otherwise. We do want a Schema.org translator eventually, and we're working on one (#1092), but it's going to have to be much more complicated than what we have here. This translator should just parse the JSON-LD inline.

But I don't have any urls available to add for tests

Aren't https://cicero.tind.io/record/71 and https://cicero.tind.io/record/128 (from your import tests) URLs we could use for web tests?

thms-rmb · 2024-12-04T18:42:40Z

I see, that makes sense.

Aren't https://cicero.tind.io/record/71 and https://cicero.tind.io/record/128 (from your import tests) URLs we could use for web tests?

Yes, these currently work but there is no guarantee of their persistence - they might change content or stop working.

I will investigate if we can set up some persistent static pages that could be used for tests, and then consolidate the import translator and the web translator into a single web translator as you request!

AbeJellinek · 2024-12-04T20:50:03Z

Is there no public-facing production instance anywhere on the internet that we could use as a test case? You said that "There may be many institutions running TIND" - who are they?

And which of the four products (ILS, DA, IR, RDM) listed on TIND's homepage does this target?

AbeJellinek · 2024-12-04T20:54:36Z

If it's IR ("Repository"), that page links to a bunch of instances, like the UN Digital Library. This translator's URL target matches catalog pages on that site, but there's no #detailed-schema-org, so detectWeb() fails.

I'm thoroughly confused!

thms-rmb · 2024-12-05T14:24:00Z

@AbeJellinek I can see why this is confusing, sorry about that!

This translator will be interesting for TIND sites with Schema.org data on the detailed record page, which is the case for the ILS, IR and RDM flavors of the TIND product.

At the moment, the Schema.org metadata is embedded in a script element that looks like this:

<script type="application/ld+json">
    {
        "@context": "https://schema.org",
        "@id": "https://digitallibrary.un.org/record/4068194"
    }
</script>

The script element does not have any id attribute as you point out, but we are in the process of rolling out a new release which adds the id attribute that this translator is using. If you go e.g. here: https://digitallibrary.un.org/record/406819, open the browser dev console, and search for an element that matches the css selector [type='application/ld+json'] you should be able to see the Schema.org data even now.

Here is another example record that has a more elaborate Schema.org embedded metadata: https://knowledge.uchicago.edu/record/14212

Is there no public-facing production instance anywhere on the internet that we could use as a test case

I can certainly come up with some production instance pages that could be used - I just assumed that it might be problematic from a stability point of view. Although they should be fairly stable, and so this concern is more hypothetical: it could happen that there is downtime, the customer site deletes a record or makes it restricted, the customer changes the metadata in such a way that the test no longer passes, etc.

As an alternative, if it works for you, we could also set up some records on an instance that is not in use by a customer, and so we would have more control over the persistence of the metadata.

AbeJellinek · 2024-12-05T17:29:24Z

Sorry, I'm just not convinced that this is the right approach. We'll have a Schema.org translator eventually, which will handle the fallback case that this is designed for. We already have a TIND ILS translator (which actually matches other TIND products), and we can trivially extend it to use other export formats (MARC, RIS, ...) depending on each repository's most native representation, and to use bits of Schema.org metadata to do things like detect the item type better.

(Re instances, we want to test on production instances, not a test instance. The point of the tests is to test a bunch of actual pages on sites that people will want to add from. We want imperfect metadata and edge cases. It's OK if some tests eventually break - nothing happens automatically when a test starts failing, we'll just replace it.)

thms-rmb · 2024-12-06T14:10:35Z

@AbeJellinek what you're saying makes a lot of sense. I will try to describe where we are coming from:

There are a number of cases where the current translator for TIND sites ends up with an unexpected item type.

For example, https://socialmediaarchive.org/record/60 should be a dataset but ends up being a book
https://arodes.hes-so.ch/record/14994 should be a conference paper but ends up being a journal article
https://knowledge.uchicago.edu/record/13629 should be a presentation but ends up being a book

Most of the time, the reason is probably that the MARC data does not fully correspond to what the MARC translator expects. This is a fact we have to accept to some degree, while in other cases we might be able to adjust the MARC data model. But the idea in this PR was to use the Schema.org metadata we have, which can more flexibly map from various MARC fields on an instance-by-instance basis.

I am on board with this approach not being the right one, however I am still interested in improving the accuracy of item types if possible. You mention:

depending on each repository's most native representation [...] use bits of Schema.org metadata to do things like detect the item type better.

Would it be of interest to see an addition to the current Library Catalog (TIND ILS) so that it can e.g. use the embedded Schema.org metadata to capture the item type? Essentially, to use everything as-is but have an additional step that looks for Schema.org to discover the item type.

thms-rmb · 2025-01-08T07:50:24Z

Hi @AbeJellinek, did you have time to consider my previous comment?

thms-rmb added 2 commits November 15, 2024 11:00

Add TIND Repository translator

5f73bb7

Whitespace, adjust minVersion

905c633

Reformat, split importer/web, add tests

7e278c5

AbeJellinek requested changes Nov 21, 2024

View reviewed changes

Tabbed, split Schema.org importer, fix detectImport

870a8e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TIND Repository translator #3387

Add TIND Repository translator #3387

thms-rmb commented Nov 15, 2024

AbeJellinek commented Nov 19, 2024

thms-rmb commented Nov 20, 2024

AbeJellinek Nov 21, 2024

AbeJellinek Nov 21, 2024

thms-rmb Nov 27, 2024

thms-rmb commented Dec 4, 2024

AbeJellinek commented Dec 4, 2024 •

edited

Loading

thms-rmb commented Dec 4, 2024

AbeJellinek commented Dec 4, 2024

AbeJellinek commented Dec 4, 2024

thms-rmb commented Dec 5, 2024

AbeJellinek commented Dec 5, 2024

thms-rmb commented Dec 6, 2024

thms-rmb commented Jan 8, 2025

Add TIND Repository translator #3387

Are you sure you want to change the base?

Add TIND Repository translator #3387

Conversation

thms-rmb commented Nov 15, 2024

AbeJellinek commented Nov 19, 2024

thms-rmb commented Nov 20, 2024

AbeJellinek Nov 21, 2024

Choose a reason for hiding this comment

AbeJellinek Nov 21, 2024

Choose a reason for hiding this comment

thms-rmb Nov 27, 2024

Choose a reason for hiding this comment

thms-rmb commented Dec 4, 2024

AbeJellinek commented Dec 4, 2024 • edited Loading

thms-rmb commented Dec 4, 2024

AbeJellinek commented Dec 4, 2024

AbeJellinek commented Dec 4, 2024

thms-rmb commented Dec 5, 2024

AbeJellinek commented Dec 5, 2024

thms-rmb commented Dec 6, 2024

thms-rmb commented Jan 8, 2025

AbeJellinek commented Dec 4, 2024 •

edited

Loading