Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ACM translator for new site #2101

Merged
merged 4 commits into from
Jan 12, 2020
Merged

Conversation

GuyAglionby
Copy link
Contributor

@GuyAglionby GuyAglionby commented Dec 31, 2019

Supports the new ACM site (#1811), which has changed over already. The tests don't run (see below), but I've manually tested that it all works.

A couple of notes: I return that magazines and newsletters are journalArticle as this is the type exported by their CSL. I also wasn't sure what the best way of dealing with ISBN was in cases where it is provided but the appropriate item type doesn't have a field for it, so put it in the extras.

With regard to the conversation in #2097, a conference proceedings page will bring up a dialog for users to select papers in that proceedings rather than download the proceedings information. This is easily changed depending on what we decide to be most reasonable.

There is a slight problem with the new site in that it does not work without the cookie cookiePolicy=accept set: it just redirects to https://dl.acm.org/action/cookieAbsent. As a result I've not been able to run the translator in Scaffold, either manually or with the tests. Manually setting the headers in doGet to include the cookie doesn't work either, as it looks like XMLHttpRequest doesn't allow that. In any case I don't think that would address the problem with the tests.

@zuphilip zuphilip added the Improvements Pull requests that are improving existing translators label Dec 31, 2019
Copy link
Contributor

@zuphilip zuphilip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! This looks fine in general and it is the most important that it works in the broswer extension. However, the cookies should usually been sended also in Scaffold and I would assume that it should work there also. What I have seen is that you are using a GET call where in the browser I see a POST call. Usually, it can help to make the call in Scaffold to look as much the same as in your browser.

I have some smaller comments and suggestions. But they all should be easy to adress.

"items": "multiple"
},
{
"type": "web",
"url": "http://dl.acm.org/citation.cfm?id=3029062",
"url": "https://dl.acm.org/toc/interactions/2016/24/1",
"items": [
{
"itemType": "book",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is recognized as multiple which looks also better.


// attr()/text() v2
function attr(docOrElem,selector,attr,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.getAttribute(attr):null;}function text(docOrElem,selector,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.textContent:null;}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you delete these functions and the instead used doc.querySelector in your code? It is common in Zotero translators to use the text and attr functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misremembered the discussion in another issue re. those functions as getting rid of them entirely rather than incorporating them as globals. Thanks for the heads up!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem and you remember correctly, that this is the goal. However, AFAIK we are not yet there to drop these lines. Anyways it should be easy to do this then for a lot of translators at once, when it is ready.

} else if (url.includes("/citation.cfm")) {
return getArticleType(doc);
if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) {
let extractedContext = doc.querySelector('meta[name=pbContext]').content;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attr('meta[name=pbContext]', 'content')

if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) {
let extractedContext = doc.querySelector('meta[name=pbContext]').content;
let subtypeRegex = /csubtype:string:(\w+)/;
let subtype = extractedContext.match(subtypeRegex)[1].toLowerCase();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us first test that it matches and only in this case continue with a value.

if (subtype == 'conference') {
return 'conferencePaper';
}
else if (subtype == 'journal' || subtype == 'magazine' || subtype == 'newsletter') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to check this a little further. Can you provide examples classified here as magazine or newsletter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a magazine (corresponding test); here is a newsletter (corresponding test).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, they don't have an submit area for authors and are maybe otherwise a little different. However, I see that dblp is classify them also as journals. Thus, I am okay with this now.

ACM Digital Library.js Show resolved Hide resolved
// e.g. https://dl.acm.org/doi/abs/10.5555/3336323.C5474411
let authorElements = doc.querySelectorAll('div.citation span.loa__author-name');
authorElements.forEach(function (element) {
item.creators.push(ZU.cleanAuthor(element.textContent));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 This function cleanAuthor needs a second argument for the creatorType!

ACM Digital Library.js Show resolved Hide resolved
}

let pagesElement = doc.querySelector('div.pages-info span');
if (pagesElement) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (pagesElement) {
if (pagesElement && !item.numPages) {

return getSearchResults(doc, true) ? 'multiple' : false;
} else if (url.includes("/citation.cfm")) {
return getArticleType(doc);
if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also give an example/test case for a /do/ url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/do/ includes computer programs and datasets. However, it looks like the CSL endpoint currently gives a 500 when trying to retrieve information for the DOIs corresponding to these types of entry (I checked a few). This also breaks the 'export citation' functionality on the website itself, though I recall it working before. Not sure if it's worth disabling /do/ in the mean time (I'm not sure how long it's been broken, but have just emailed about it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the examples. No, we don't need to disable this here. As far as we know this should work on the Zotero side. Thank you also for reaching out to them (metadata matters!).

@zuphilip
Copy link
Contributor

zuphilip commented Jan 8, 2020

@GuyAglionby Let me know if you can work on this within the next days. Otherwise, I can maybe try to work a little further here.

@GuyAglionby
Copy link
Contributor Author

Sorry about the delay here -- I should have time tomorrow to have a look

@GuyAglionby
Copy link
Contributor Author

Thanks for the comments -- have made changes as requested. The tests started working for me in Scaffold (without any changes, not sure what happened) so I updated the already-present ones, and added new ones to cover both new item types and some of the additional scraping that is done.

One thing ZU.itemFromCSLJSON does is return creators with a creatorTypeID, which doesn't seem necessary although I'm not sure it's doing any harm. Can remove that field if necessary.

Also save the journalAbbreviation.
@zuphilip
Copy link
Contributor

Okay, great! Thank you very much! I have only some small nits pushed directly to your branch. I was not able to use Scaffold for testing. The additional field for the creators should be okay.

@zuphilip zuphilip merged commit 20bf8fd into zotero:master Jan 12, 2020
@zuphilip
Copy link
Contributor

Merged now! 🚀 Thank you very much! 🙇‍♂️

@GuyAglionby
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvements Pull requests that are improving existing translators
Development

Successfully merging this pull request may close these issues.

2 participants