Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add translator for Jewish Currents #3182

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AbeJellinek
Copy link
Member

@AbeJellinek AbeJellinek commented Nov 6, 2023

Tries to distinguish magazine articles from online-only articles, and makes a rudimentary attempt at handling podcast episodes (without creators).


let item = new Zotero.Item(itemType);
item.title = doc.title;
item.date = ZU.strToISO(text(doc, '.feature .absolute')); // Bad!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand what you mean here and I see you. The logically cleaner solution seems to be parsing the Schema.org JSON-LD <script> element and extract the datePublished property.

item.runningTime = text(doc, '.podcasts .total');
}

let authors = doc.querySelectorAll('.lockup a[href*="/author/"]');
Copy link
Collaborator

@zoe-translates zoe-translates Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this selector, I presume. The class name lockup is a bit cryptic, and it seems to be a design jargon referring to an exception to responsive design for certain elements. But it's not actually used in the CSS (so not really locked up), and in your words it may not be "very stable".

This one may be better semantically: .bioblock strong a (locating the author links after the article's end).

else if (doc.querySelector('audio#podcast')) {
return 'podcast';
}
else if (url.includes('/results') && getSearchResults(doc, true)) {
Copy link
Collaborator

@zoe-translates zoe-translates Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think we can also URL-match the issue pages (example issue) and category pages (example category). The selectors in getSearchResults() will still work. The following tests the paths /results (with query), /archive ("All Articles", with query or hash but maybe unnecessary), and paths with /issue/ or /category/ as parent

Suggested change
else if (url.includes('/results') && getSearchResults(doc, true)) {
else if (/\/(results\?|archive($|[#?])|issue\/.+|category\/.+)/.test(url) && getSearchResults(doc, true)) {

The selectors don't work for the list of podcast episodes https://jewishcurrents.org/podcast though.

@zoe-translates
Copy link
Collaborator

Hello @AbeJellinek,

In addition to the inline comments, here's a few more observations about the idiosyncrasies -

  • Creator names in podcasts -- I did this for The Atlantic translator and it was rather painful. I'm not very sure about its usefulness now.
  • Components of creator names -- we have names like Reda Abu Assi (article by Abu Assi) and Dana El Kurd (article by El Kurd), and they'll be rather poorly understood by ZU.cleanAuthor(). I wonder if it's worth the effort to treat a few common patterns as special cases (patronymics, nisba, kunya, etc.)?
  • Language identification -- I haven't thought of a nice way. They do have articles in German but the LD-JSON data gives the wrong language ID.
  • Review articles -- example. We have the creator type reviewedAuthor but again, the page content may be too 'free-form' to allow their straightforward identification.

I don't think these are high-priority ones while it would be nice to have some way to deal with a few of them (the author-name component maybe?)

And thanks for the patience.

@zoe-translates
Copy link
Collaborator

And possibly letter itemType for letters from readers? https://jewishcurrents.org/letters/on-the-hindu-nationalists-using-the-pro-israel-playbook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants