-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Update CNKI translator for new CNKI #3191
Conversation
It looks like you added Windows newlines to all the of the test lines. Can you revert those? You should be editing the translator with Scaffold ("Translator Editor") from within Zotero, and you should be making sure these tests are updated and passing. What did you determine about the Referer header? Did you need to install the Zotero Connector beta for Firefox for this to work? I can't get CNKI to fully load in Firefox at all, and I still get a 404 for ShowExport in Chrome with this updated translator, but it's possible that foreign visitors always get 404 for that. |
CNKI.js
Outdated
{ | ||
method: "POST", | ||
body: postData, | ||
responseType: "application/x-www-form-urlencoded", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a valid value for an XHR response. What are you trying to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for that. I added this code because it just works. I tested the API in tools like Postman. I used to set Content-Type: application/x-www-form-urlencoded
, but it did not work. So I searched the Code in Zotero translator to find some solutions. I tried this responseType: "application/x-www-form-urlencoded"
and it works.
But I update the code, the below is more reasonable
let reftext = await request(
'https://kns.cnki.net/dm/api/ShowExport',
{
method: "POST",
body: postData,
headers: {
Referer: refer
}
}
);
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responseType
doesn't have anything to do with a Content-Type
request header, though. application/x-www-form-urlencoded
is just invalid for responseType
. That's not used in any translators and definitely wasn't doing anything for you.
Hi @dstillman ,
I used VScode to remove some debug code from the translator. Sorry for the mistake. I will convert the line breaks.
It seems like CNKI has updated its URL schema. The old URLs contain can open out of China. The webpage can be converted from Chinese to English automatically. The new URLs containing long mystical strings can not open out of China. Maybe the foreign visitors should use CNKI oversea. I did not install or test in Zotero Connector beta. The installed version in Firefox is 5.0.114. The new CNKI URL schema will trigger a capture when doing test. I hope I can make myself clear😀 |
Then is passing Referer even doing anything? I think we determined yesterday that a custom Referer from the translator wasn't actually being passed by browsers without the fix we just rolled out in the beta, but maybe it was. Did you test the new translator in Chrome? If the Referer line isn't necessary, we should just take it out. Can you update the translator with current URLs that work? |
I just check the request, the passing Referer can not work. The referer in the request headers is the current URL. I think the default request headers can work. I figure out what is the problem. When testing in Scaffold you shoud pass Referer in request headers. But in real browser the connector requests have the default headers coming from current page. So I think we should keep this headers in the translator code. |
No, now there are a huge number of whitespace changes. You should do a git reset to the commit before your changes and use |
Or undo the last commit and fix from there. I'm afraid this patch is no longer usable at all as is — you can see from the diff what a mess it is. |
… to find CNKI database keyword for saving multiple items. Update the matching relationship between dbcode and item type
Hi @dstillman , sorry for the late reply. I just forch push a new commit. I hope this will make the code modification more clear. I found this translator can not work in CNKI search page, but worked in Scaffold. I think the key to this difference is that: CNKI search page will take time to load the search result table. And the translator code can not detect the result table. Using URL matching might be a good idea. |
Hi, is there anything I can do to improve this translator? I see quite a lot of posts about "Error saving CNKI items" in Zotero Forums. |
Merged so that we get this out to people, but I'd like to resolve the issue of the tests. If the current tests aren't all passing for you, they should be updated with current data or new tests. Particularly since we can't seem to test this from outside China, it's very helpful to see the sample data being saved, since that lets us identify metadata issues that should possibly be changed, and it obviously helps future maintainers make sure they're not breaking things. Would you mind updating those in a new PR? Thank you for the quick fix! |
Because of the anti-crawler, some tests can not run in the test tab. But the test can be added by As I mentioned, this translator can not work on the CNKI search page. Do I need to update the code in |
Can you say more? What exactly is happening? Note that, if a page is generated via JS, you may need to add For search results, you might be looking for |
Thank you for your advice, I will learn how to use |
Oh, a CAPTCHA? If that happens randomly it could be a problem. But you might be able to solve it once from the Browser tab and then run the tests. Depends how zealous it is. |
@dstillman Yes, that's a CAPTCHA.
I've tried this before, but the tests still failed. This captcha makes it hard to pass the test. I gave up in fighting with the captcha, because of it will take a lot of time to investigate and test. |
I made some changes to pass the ESLint test and take some time updating the test cases.
|
Another scraping method, based on another API (provided by the "Reference" button on the single item page instead of bulk exporting page), with less code, may be the preferred way to scrape individual item. see my repository for details. |
@jiaojiaodubai Nice work 👍. Can you make a new pull request for these test updates? |
All done, PR#3193. |
Hi,
A Few days ago, CNKI updated its URL schema, reference export API, and HTML page structure. In this PR, I updated the code for exporting RefWorks text and batch-saving items on the search page.
It’s getting late now in China, and the code may have some minor issues. I need your assistance in fixing them.