-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite ACM translator for new site #2101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! This looks fine in general and it is the most important that it works in the broswer extension. However, the cookies should usually been sended also in Scaffold and I would assume that it should work there also. What I have seen is that you are using a GET call where in the browser I see a POST call. Usually, it can help to make the call in Scaffold to look as much the same as in your browser.
I have some smaller comments and suggestions. But they all should be easy to adress.
ACM Digital Library.js
Outdated
"items": "multiple" | ||
}, | ||
{ | ||
"type": "web", | ||
"url": "http://dl.acm.org/citation.cfm?id=3029062", | ||
"url": "https://dl.acm.org/toc/interactions/2016/24/1", | ||
"items": [ | ||
{ | ||
"itemType": "book", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is recognized as multiple
which looks also better.
|
||
// attr()/text() v2 | ||
function attr(docOrElem,selector,attr,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.getAttribute(attr):null;}function text(docOrElem,selector,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.textContent:null;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you delete these functions and the instead used doc.querySelector
in your code? It is common in Zotero translators to use the text
and attr
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misremembered the discussion in another issue re. those functions as getting rid of them entirely rather than incorporating them as globals. Thanks for the heads up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem and you remember correctly, that this is the goal. However, AFAIK we are not yet there to drop these lines. Anyways it should be easy to do this then for a lot of translators at once, when it is ready.
ACM Digital Library.js
Outdated
} else if (url.includes("/citation.cfm")) { | ||
return getArticleType(doc); | ||
if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) { | ||
let extractedContext = doc.querySelector('meta[name=pbContext]').content; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attr('meta[name=pbContext]', 'content')
ACM Digital Library.js
Outdated
if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) { | ||
let extractedContext = doc.querySelector('meta[name=pbContext]').content; | ||
let subtypeRegex = /csubtype:string:(\w+)/; | ||
let subtype = extractedContext.match(subtypeRegex)[1].toLowerCase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us first test that it matches and only in this case continue with a value.
ACM Digital Library.js
Outdated
if (subtype == 'conference') { | ||
return 'conferencePaper'; | ||
} | ||
else if (subtype == 'journal' || subtype == 'magazine' || subtype == 'newsletter') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to check this a little further. Can you provide examples classified here as magazine or newsletter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a magazine (corresponding test); here is a newsletter (corresponding test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, they don't have an submit area for authors and are maybe otherwise a little different. However, I see that dblp is classify them also as journals. Thus, I am okay with this now.
ACM Digital Library.js
Outdated
// e.g. https://dl.acm.org/doi/abs/10.5555/3336323.C5474411 | ||
let authorElements = doc.querySelectorAll('div.citation span.loa__author-name'); | ||
authorElements.forEach(function (element) { | ||
item.creators.push(ZU.cleanAuthor(element.textContent)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 This function cleanAuthor
needs a second argument for the creatorType!
ACM Digital Library.js
Outdated
} | ||
|
||
let pagesElement = doc.querySelector('div.pages-info span'); | ||
if (pagesElement) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (pagesElement) { | |
if (pagesElement && !item.numPages) { |
ACM Digital Library.js
Outdated
return getSearchResults(doc, true) ? 'multiple' : false; | ||
} else if (url.includes("/citation.cfm")) { | ||
return getArticleType(doc); | ||
if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also give an example/test case for a /do/
url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/do/
includes computer programs and datasets. However, it looks like the CSL endpoint currently gives a 500 when trying to retrieve information for the DOIs corresponding to these types of entry (I checked a few). This also breaks the 'export citation' functionality on the website itself, though I recall it working before. Not sure if it's worth disabling /do/
in the mean time (I'm not sure how long it's been broken, but have just emailed about it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the examples. No, we don't need to disable this here. As far as we know this should work on the Zotero side. Thank you also for reaching out to them (metadata matters!).
@GuyAglionby Let me know if you can work on this within the next days. Otherwise, I can maybe try to work a little further here. |
Sorry about the delay here -- I should have time tomorrow to have a look |
Thanks for the comments -- have made changes as requested. The tests started working for me in Scaffold (without any changes, not sure what happened) so I updated the already-present ones, and added new ones to cover both new item types and some of the additional scraping that is done. One thing |
Also save the journalAbbreviation.
Okay, great! Thank you very much! I have only some small nits pushed directly to your branch. I was not able to use Scaffold for testing. The additional field for the creators should be okay. |
Merged now! 🚀 Thank you very much! 🙇♂️ |
Thanks! |
Supports the new ACM site (#1811), which has changed over already. The tests don't run (see below), but I've manually tested that it all works.
A couple of notes: I return that magazines and newsletters are
journalArticle
as this is the type exported by their CSL. I also wasn't sure what the best way of dealing with ISBN was in cases where it is provided but the appropriate item type doesn't have a field for it, so put it in the extras.With regard to the conversation in #2097, a conference proceedings page will bring up a dialog for users to select papers in that proceedings rather than download the proceedings information. This is easily changed depending on what we decide to be most reasonable.
There is a slight problem with the new site in that it does not work without the cookie
cookiePolicy=accept
set: it just redirects to https://dl.acm.org/action/cookieAbsent. As a result I've not been able to run the translator in Scaffold, either manually or with the tests. Manually setting the headers indoGet
to include the cookie doesn't work either, as it looks like XMLHttpRequest doesn't allow that. In any case I don't think that would address the problem with the tests.