Rewrite ACM translator for new site #2101

GuyAglionby · 2019-12-31T14:54:39Z

Supports the new ACM site (#1811), which has changed over already. The tests don't run (see below), but I've manually tested that it all works.

A couple of notes: I return that magazines and newsletters are journalArticle as this is the type exported by their CSL. I also wasn't sure what the best way of dealing with ISBN was in cases where it is provided but the appropriate item type doesn't have a field for it, so put it in the extras.

With regard to the conversation in #2097, a conference proceedings page will bring up a dialog for users to select papers in that proceedings rather than download the proceedings information. This is easily changed depending on what we decide to be most reasonable.

There is a slight problem with the new site in that it does not work without the cookie cookiePolicy=accept set: it just redirects to https://dl.acm.org/action/cookieAbsent. As a result I've not been able to run the translator in Scaffold, either manually or with the tests. Manually setting the headers in doGet to include the cookie doesn't work either, as it looks like XMLHttpRequest doesn't allow that. In any case I don't think that would address the problem with the tests.

zuphilip

Thank you very much! This looks fine in general and it is the most important that it works in the broswer extension. However, the cookies should usually been sended also in Scaffold and I would assume that it should work there also. What I have seen is that you are using a GET call where in the browser I see a POST call. Usually, it can help to make the call in Scaffold to look as much the same as in your browser.

I have some smaller comments and suggestions. But they all should be easy to adress.

zuphilip · 2020-01-01T12:54:44Z

ACM Digital Library.js

 		"items": "multiple"
 	},
 	{
 		"type": "web",
-		"url": "http://dl.acm.org/citation.cfm?id=3029062",
+		"url": "https://dl.acm.org/toc/interactions/2016/24/1",
 		"items": [
 			{
 				"itemType": "book",


That is recognized as multiple which looks also better.

zuphilip · 2020-01-01T13:39:52Z

ACM Digital Library.js


-// attr()/text() v2
-function attr(docOrElem,selector,attr,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.getAttribute(attr):null;}function text(docOrElem,selector,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.textContent:null;}


Why did you delete these functions and the instead used doc.querySelector in your code? It is common in Zotero translators to use the text and attr functions.

I misremembered the discussion in another issue re. those functions as getting rid of them entirely rather than incorporating them as globals. Thanks for the heads up!

No problem and you remember correctly, that this is the goal. However, AFAIK we are not yet there to drop these lines. Anyways it should be easy to do this then for a lot of translators at once, when it is ready.

zuphilip · 2020-01-01T13:42:53Z

ACM Digital Library.js

-	} else if (url.includes("/citation.cfm")) {
-		return getArticleType(doc);
+	if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) {
+		let extractedContext = doc.querySelector('meta[name=pbContext]').content;


attr('meta[name=pbContext]', 'content')

zuphilip · 2020-01-01T13:46:06Z

ACM Digital Library.js

+	if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) {
+		let extractedContext = doc.querySelector('meta[name=pbContext]').content;
+		let subtypeRegex = /csubtype:string:(\w+)/;
+		let subtype = extractedContext.match(subtypeRegex)[1].toLowerCase();


Let us first test that it matches and only in this case continue with a value.

zuphilip · 2020-01-01T13:49:35Z

ACM Digital Library.js

+		if (subtype == 'conference') {
+			return 'conferencePaper';
+		}
+		else if (subtype == 'journal' || subtype == 'magazine' || subtype == 'newsletter') {


I would like to check this a little further. Can you provide examples classified here as magazine or newsletter?

Here is a magazine (corresponding test); here is a newsletter (corresponding test).

So, they don't have an submit area for authors and are maybe otherwise a little different. However, I see that dblp is classify them also as journals. Thus, I am okay with this now.

ACM Digital Library.js

zuphilip · 2020-01-01T20:10:54Z

ACM Digital Library.js

+			// e.g. https://dl.acm.org/doi/abs/10.5555/3336323.C5474411
+			let authorElements = doc.querySelectorAll('div.citation span.loa__author-name');
+			authorElements.forEach(function (element) {
+				item.creators.push(ZU.cleanAuthor(element.textContent));


🐛 This function cleanAuthor needs a second argument for the creatorType!

ACM Digital Library.js

zuphilip · 2020-01-01T20:14:42Z

ACM Digital Library.js

+		}
+
+		let pagesElement = doc.querySelector('div.pages-info span');
+		if (pagesElement) {


Suggested change

if (pagesElement) {

if (pagesElement && !item.numPages) {

zuphilip · 2020-01-01T20:20:43Z

ACM Digital Library.js

-		return getSearchResults(doc, true) ? 'multiple' : false;
-	} else if (url.includes("/citation.cfm")) {
-		return getArticleType(doc);
+	if ((url.includes('/doi/') || url.includes('/do/')) && !url.includes('/doi/proceedings')) {


Can you also give an example/test case for a /do/ url?

/do/ includes computer programs and datasets. However, it looks like the CSL endpoint currently gives a 500 when trying to retrieve information for the DOIs corresponding to these types of entry (I checked a few). This also breaks the 'export citation' functionality on the website itself, though I recall it working before. Not sure if it's worth disabling /do/ in the mean time (I'm not sure how long it's been broken, but have just emailed about it).

Thank you for the examples. No, we don't need to disable this here. As far as we know this should work on the Zotero side. Thank you also for reaching out to them (metadata matters!).

zuphilip · 2020-01-08T20:43:00Z

@GuyAglionby Let me know if you can work on this within the next days. Otherwise, I can maybe try to work a little further here.

GuyAglionby · 2020-01-08T20:45:19Z

Sorry about the delay here -- I should have time tomorrow to have a look

GuyAglionby · 2020-01-09T22:28:17Z

Thanks for the comments -- have made changes as requested. The tests started working for me in Scaffold (without any changes, not sure what happened) so I updated the already-present ones, and added new ones to cover both new item types and some of the additional scraping that is done.

One thing ZU.itemFromCSLJSON does is return creators with a creatorTypeID, which doesn't seem necessary although I'm not sure it's doing any harm. Can remove that field if necessary.

Also save the journalAbbreviation.

zuphilip · 2020-01-12T23:23:56Z

Okay, great! Thank you very much! I have only some small nits pushed directly to your branch. I was not able to use Scaffold for testing. The additional field for the creators should be okay.

zuphilip · 2020-01-12T23:26:19Z

Merged now! 🚀 Thank you very much! 🙇‍♂️

GuyAglionby · 2020-01-12T23:32:34Z

Thanks!

Rewrite ACM translator for new site

19205e0

zuphilip added the Improvements Pull requests that are improving existing translators label Dec 31, 2019

zuphilip reviewed Jan 1, 2020

View reviewed changes

koppor mentioned this pull request Jan 2, 2020

Rewrite ACM fetcher, because of ACM web site changes JabRef/jabref#5804

Closed

GuyAglionby added 2 commits January 9, 2020 16:52

Fix existing tests

b71d91d

ACM translator fixes

a853a19

ACM: Handle article numbers + delete callNumbers

4df1f26

Also save the journalAbbreviation.

zuphilip merged commit 20bf8fd into zotero:master Jan 12, 2020

dstillman mentioned this pull request Apr 15, 2020

ACM: Journal article saved as document #2162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite ACM translator for new site #2101

Rewrite ACM translator for new site #2101

GuyAglionby commented Dec 31, 2019 •

edited

Loading

zuphilip left a comment

zuphilip Jan 1, 2020

zuphilip Jan 1, 2020

GuyAglionby Jan 9, 2020

zuphilip Jan 12, 2020

zuphilip Jan 1, 2020

zuphilip Jan 1, 2020

zuphilip Jan 1, 2020

GuyAglionby Jan 9, 2020

zuphilip Jan 12, 2020

zuphilip Jan 1, 2020

zuphilip Jan 1, 2020

zuphilip Jan 1, 2020

GuyAglionby Jan 9, 2020

zuphilip Jan 12, 2020

zuphilip commented Jan 8, 2020

GuyAglionby commented Jan 8, 2020

GuyAglionby commented Jan 9, 2020

zuphilip commented Jan 12, 2020

zuphilip commented Jan 12, 2020

GuyAglionby commented Jan 12, 2020


		// attr()/text() v2
		function attr(docOrElem,selector,attr,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.getAttribute(attr):null;}function text(docOrElem,selector,index){var elem=index?docOrElem.querySelectorAll(selector).item(index):docOrElem.querySelector(selector);return elem?elem.textContent:null;}

Rewrite ACM translator for new site #2101

Rewrite ACM translator for new site #2101

Conversation

GuyAglionby commented Dec 31, 2019 • edited Loading

zuphilip left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuphilip commented Jan 8, 2020

GuyAglionby commented Jan 8, 2020

GuyAglionby commented Jan 9, 2020

zuphilip commented Jan 12, 2020

zuphilip commented Jan 12, 2020

GuyAglionby commented Jan 12, 2020

GuyAglionby commented Dec 31, 2019 •

edited

Loading