Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Unicode Support in script Summary #678

Closed
TimidScript opened this issue Jul 26, 2015 · 25 comments
Closed

Issue with Unicode Support in script Summary #678

TimidScript opened this issue Jul 26, 2015 · 25 comments
Labels
bug You've guessed it... this means a bug is reported. CODE Some other Code related issue and it should clearly describe what it is affecting in a comment.

Comments

@TimidScript
Copy link

Some Unicode characters appear as gibberish in the "Summary" on the scripts page and listing. The summary is extracted from the script "description" metadata.

Example here: Pixiv++

If you look at the summary you get gibberish (���) both in FF and Chrome. If you extract the text from the metadata and insert it manually in the script description ("Script Info") it works fine.

@Martii
Copy link
Member

Martii commented Jul 26, 2015

This is currently a dependency of #285... will keep it in mind in my testing hopefully later this week when the code is migrated. There are some decodeURI/encodeURI issues in current node as well but I haven't rechecked that recently.

Thanks for the report.


Possible font issue as well won't know till I am at dev station to test more.

@Martii Martii added this to the #285 milestone Jul 26, 2015
@Martii
Copy link
Member

Martii commented Jul 26, 2015

Let's see how GH with this browser handles the raw stored data of Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|ポケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク...

Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with \"Pixiv++ Manga Viewer\" and \"Generic Image Viewer\". 自動ページング|ポケベ���|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク


See some question marks in the same place as your "gibberish".... so this may be leaning towards a font issue which is browser and operating system dependent... will check some things at dev station later.


How odd GH shows the "gibberish" exactly as you reported in Write/Preview but not on final Update of comment where it turns to those weird question marks. ;)

@Martii Martii added the needs testing Anyone can add this but it is primarily there for the Assignee indicating that Testers are wanted. label Jul 27, 2015
@Martii
Copy link
Member

Martii commented Jul 27, 2015

You know I'm not really sure what is doing this yet... I (OUJS) forked your script temporarily and it worked fine... then I temporarily removed your @description on OUJS and of course put it back in... and now it seems fixered... it could have been a node hiccup at any point.

Anyhow... I'll add the mitigation to this, which includes you following up with this issue next time you sync from GH to OUJS, and reopen if it reappears with steps to reproduce... but direct on OUJS seems to work fine... I'll still recheck it out in #285.

Doesn't appear to be a font issue either.

@Martii Martii closed this as completed Jul 27, 2015
@Martii Martii added needs mitigation Needs additional followup. and removed needs testing Anyone can add this but it is primarily there for the Assignee indicating that Testers are wanted. labels Jul 27, 2015
@Martii
Copy link
Member

Martii commented Jul 27, 2015

@TimidScript
I've also forked your repo temporarily on GH and imported your script into my account here and no error with the @description Unicode ... and blanked out the source and still no error with @description Unicode... my best educated guess is a node issue that was fixed at some point... above comment still pertains for steps to reproduce if you see it again.

@TimidScript
Copy link
Author

Thank you

How odd GH shows the "gibberish" exactly as you reported in Write/Preview but not on final Update of comment where it turns to those weird question marks. ;)

I rarely use preview, that's why I love the edit button ^_^

Doesn't appear to be a font issue either.

No I didn't think that was the case.

Anyhow... I'll add the mitigation to this, which includes you following up with this issue next time you sync from GH to OUJS, and reopen if it reappears with steps to reproduce...

Just updated for testing and it seems to work fine.

@Martii
Copy link
Member

Martii commented Oct 5, 2015

@TimidScript
This appears to be back again... what editor are you using? I'm going to comment out your @description and uncomment it.

@Martii Martii reopened this Oct 5, 2015
@Martii
Copy link
Member

Martii commented Oct 5, 2015

@TimidScript
That refixed it editing it on the site itself... I would sure like to know what's going on here. :\

@Martii Martii added the needs testing Anyone can add this but it is primarily there for the Assignee indicating that Testers are wanted. label Oct 5, 2015
@Martii Martii removed this from the #285 milestone Oct 5, 2015
@TimidScript
Copy link
Author

I use Visual Studio and then copy and paste into GitHub. Works fine on GreasyFork

@Martii
Copy link
Member

Martii commented Oct 5, 2015

and paste into GitHub

So you use their online editor? I'll try to track using that methodology with what's going on... if OUJS site edit works fine then it's not #285 at all. I've gone over that routine with a fine toothed comb.

Things to recheck (this section will change during testing):

  • Any dependencies we use that may be doing this.
    • Using native node https dep
  • What we import
    Checked with reimport... works fine on your account with [TS] Pixiv++.
  • What GH is sending in exactly Content-Type wise.
    • reject if not correct type, if possible, in the webhook .... already rejects
  • Test buffer conversion to String
  • Upstream node since this is something we don't coerce (yet)

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 5, 2015
* Webhook is setup for production environments and not dev so I get to do this one step at a time... blech
* Some STYLEGUIDE.md conformance in these functions
* Using `isDbg` for these tests ... production will go up an down a little bit during these times... shouldn't be for more than 2 minutes or less though
* Change `console.log` to `console.warn` in `https.request`

Test for OpenUserJS#678
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 5, 2015
``` sh-session
< POST /github/hook 200 6.583 ms - -
< /home/oujs/OpenUserJS.org1/libs/repoManager.js:42
<         console.log(JSON.stringify(aRes, null, ' '));
<                          ^
< TypeError: Converting circular structure to JSON
<     at Object.stringify (native)
<     at ClientRequest.<anonymous> (/home/oujs/OpenUserJS.org1/libs/repoManager.js:42:26)
<     at ClientRequest.g (events.js:199:16)
<     at ClientRequest.emit (events.js:107:17)
<     at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:426:21)
<     at HTTPParser.parserOnHeadersComplete (_http_common.js:111:23)
<     at TLSSocket.socketOnData (_http_client.js:317:20)
<     at TLSSocket.emit (events.js:107:17)
<     at readableAddChunk (_stream_readable.js:163:16)
<     at TLSSocket.Readable.push (_stream_readable.js:126:10)
<     at TCP.onread (net.js:538:20)

```

Applies to OpenUserJS#678
@TimidScript
Copy link
Author

So you use their online editor?

Yes I do. Sometimes I change the code in the editor for small updates, like I did for the icon fix, and other times I just paste the whole code from my offline editor.

My GreasyFork account now resync with GitHub but through manual update, and I update the files over there every so often. I've never noticed any character errors, if that's any help.

@Martii
Copy link
Member

Martii commented Oct 5, 2015

I've been able to reproduce this now with GH's editor... it's somewhere near the webhook... still analyzing the snapshot output... so far it appears we are receiving UTF-8 from GH but we also use buffers and then convert it to a string... this might be the issue with node but I'll need to run some more tests after analyzing. The object I'm analyzing is huge too so this might take a little bit of time.

@TimidScript
Copy link
Author

Glad to hear you could reproduce it.

@Martii
Copy link
Member

Martii commented Oct 5, 2015

Dumped output

This is what the request response currently shows on production (snipped):

<   httpVersionMajor: 1,
<   httpVersionMinor: 1,
<   httpVersion: '1.1',
<   complete: false,
<   headers: 
<    { 'content-security-policy': 'default-src \'none\'',
<      'x-xss-protection': '1; mode=block',
<      'x-frame-options': 'deny',
<      'x-content-type-options': 'nosniff',
<      'strict-transport-security': 'max-age=31536000',
<      etag: '"a27bc31a7cf263b94de07bfb2e15b7f1d45b61a2"',
<      'content-type': 'text/plain; charset=utf-8',
<      'cache-control': 'max-age=300',
<      'content-length': '8087',
<      'accept-ranges': 'bytes',
<      date: 'Mon, 05 Oct 2015 18:52:23 GMT',
<      via: '1.1 varnish',
<      connection: 'close',
<      'x-served-by': 'cache-jfk1020-JFK',
<      'x-cache': 'MISS',
<      'x-cache-hits': '0',
<      vary: 'Authorization,Accept-Encoding',
<      'access-control-allow-origin': '*',
<      expires: 'Mon, 05 Oct 2015 18:57:23 GMT',
<      'source-age': '0' },

This is what parseMeta currently shows on production (snipped and notice the three "commas" in edit on GH which yields hexagon question marks on GH display):

<   "description": [
<    {
<     "value": "Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with \"Pixiv++ Manga Viewer\" and \"Generic Image Viewer\". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク"
<    }

@sizzlemctwizzle
Copy link
Member

We might want to use https://nodejs.org/api/string_decoder.html on line 344 of scriptStorage.js
On Oct 5, 2015 2:37 PM, "Marti Martz" notifications@github.com wrote:

Dumped output

This is what the request response currently shows on production
(snipped):

< httpVersionMajor: 1,< httpVersionMinor: 1,< httpVersion: '1.1',< complete: false,< headers: < { 'content-security-policy': 'default-src 'none'',< 'x-xss-protection': '1; mode=block',< 'x-frame-options': 'deny',< 'x-content-type-options': 'nosniff',< 'strict-transport-security': 'max-age=31536000',< etag: '"a27bc31a7cf263b94de07bfb2e15b7f1d45b61a2"',< 'content-type': 'text/plain; charset=utf-8',< 'cache-control': 'max-age=300',< 'content-length': '8087',< 'accept-ranges': 'bytes',< date: 'Mon, 05 Oct 2015 18:52:23 GMT',< via: '1.1 varnish',< connection: 'close',< 'x-served-by': 'cache-jfk1020-JFK',< 'x-cache': 'MISS',< 'x-cache-hits': '0',< vary: 'Authorization,Accept-Encoding',< 'access-control-allow-origin': '*',< expires: 'Mon, 05 Oct 2015 18:57:23 GMT',< 'source-age': '0' },

This is what parseMeta currently shows on production (snipped and notice
the three "commas" in edit on GH which yields hexagon question marks on GH
display)
:

< "description": [
< {
< "value": "Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク"
< }


Reply to this email directly or view it on GitHub
#678 (comment)
.

@Martii
Copy link
Member

Martii commented Oct 5, 2015

@sizzlemctwizzle
I'm thinking since node strings were detected as UTF-16 for BOM checking we can try sending the Accept-Charset and see what GH does with that (reading up on this now). The buffer might be trying to convert UTF-8 then with using toString and keeps the string in UTF-8 garbled. GH isn't sending the uppercase UTF-8 either as the output shows... that could mess things up if there isn't a toLowerCase in node and our deps.

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 5, 2015
…keeps things consistent *hopefully*

* Using uppercase as mentioned at OpenUserJS#198 (comment)

Applies to OpenUserJS#678

Related to:
* OpenUserJS#348 discovery
* OpenUserJS#200
* OpenUserJS#198
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 5, 2015
@Martii
Copy link
Member

Martii commented Oct 5, 2015

Newest dump (snipped and separated)

< > getMeta()
< >> Chunks loop 0
< // ==UserScript==
< // @name                [TS] Pixiv++
< // @namespace           TimidScript
< // @version             3.3.80a.010 Beta
< // @description         Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|��

... already bugged here with hexagonal question marks.

Four total Chunks loop... all bugged

Final blocksContent:

< >> blocksContent
< { UserScript: '\n// @name                [TS] Pixiv++\n// @namespace           TimidScript\n// @version             3.3.80a.010 Beta\n// @description         Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク\n// @author              TimidScript\n// @homepageURL         https://openuserjs.org/users/TimidScript\n// @copyright           © 2014 TimidScript, All Rights Reserved.\n// @license             Creative Commons BY-NC-SA + Read Copyright inside the script\n// @include             http://www.pixiv.net/*\n// @exclude             http://www.pixiv.net/*mode=manga&illust_id*\n// @exclude             http://www.pixiv.net/*mode=big&illust_id*\n// @exclude             http://www.pixiv.net/*mode=manga_big*\n// @require\t\t        https://openuserjs.org/src/libs/TimidScript/TSL_-_Generic.js\n// @require             https://openuserjs.org/src/libs/TimidScript/TSL_-_GM_Update.js\n// @homeURL             https://openuserjs.org/scripts/TimidScript/[TS]_Pixiv++\n// @grant               GM_info\n// @grant               GM_getMetadata\n// @grant               GM_registerMenuCommand\n// @grant               GM_getValue\n// @grant               GM_setValue\n// @grant               GM_listValues\n// @grant               GM_deleteValue\n// @grant               GM_xmlhttpRequest\n// @icon                \n' }

blocks

< >> blocks
< {
<  "UserScript": {
<   "name": [
<    {
<     "value": "[TS] Pixiv++"
<    }
<   ],
<   "namespace": [
<    {
<     "value": "TimidScript"
<    }
<   ],
<   "version": [
<    {
<     "value": "3.3.80a.010 Beta"
<    }
<   ],
<   "description": [
<    {
<     "value": "Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with \"Pixiv++ Manga Viewer\" and \"Generic Image Viewer\". 自動ページング|���ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク"
<    }
<   ],
<   "homepageURL": [
<    {
<     "value": "https://openuserjs.org/users/TimidScript"
<    }
<   ],
<   "copyright": [
<    {
<     "value": "© 2014 TimidScript, All Rights Reserved."
<    }
<   ],
<   "license": [
<    {
<     "value": "Creative Commons BY-NC-SA + Read Copyright inside the script"
<    }
<   ],
<   "include": [
<    {
<     "value": "http://www.pixiv.net/*"
<    }
<   ],
<   "exclude": [
<    {
<     "value": "http://www.pixiv.net/*mode=manga&illust_id*"
<    },
<    {
<     "value": "http://www.pixiv.net/*mode=big&illust_id*"
<    },
<    {
<     "value": "http://www.pixiv.net/*mode=manga_big*"
<    }
<   ],
<   "require": [
<    {
<     "value": "https://openuserjs.org/src/libs/TimidScript/TSL_-_Generic.js"
<    },
<    {
<     "value": "https://openuserjs.org/src/libs/TimidScript/TSL_-_GM_Update.js"
<    }
<   ],
<   "grant": [
<    {
<     "value": "GM_info"
<    },
<    {
<     "value": "GM_getMetadata"
<    },
<    {
<     "value": "GM_registerMenuCommand"
<    },
<    {
<     "value": "GM_getValue"
<    },
<    {
<     "value": "GM_setValue"
<    },
<    {
<     "value": "GM_listValues"
<    },
<    {
<     "value": "GM_deleteValue"
<    },
<    {
<     "value": "GM_xmlhttpRequest"
<    }
<   ],
<   "icon": [
<    {
<     "value": ""
<    }
<   ]
<  }
< }

... bug propagated.

I think for grins I'll try the StringDecoder with utf16 and see what that does.

@Martii
Copy link
Member

Martii commented Oct 5, 2015

heh tried utf16 direct edit on production and it didn't like that:

< > getMeta()
< string_decoder.js:24
<     throw new Error('Unknown encoding: ' + encoding);
<           ^
< Error: Unknown encoding: utf16
<     at assertEncoding (string_decoder.js:24:11)
<     at new exports.StringDecoder (string_decoder.js:38:3)
<     at Object.exports.getMeta (/home/oujs/OpenUserJS.org1/controllers/scriptStorage.js:351:17)
<     at /home/oujs/OpenUserJS.org1/libs/repoManager.js:122:23
<     at IncomingMessage.<anonymous> (/home/oujs/OpenUserJS.org1/libs/repoManager.js:55:11)
<     at IncomingMessage.emit (events.js:129:20)
<     at _stream_readable.js:908:16
<     at process._tickDomainCallback (node.js:381:11)

Will need to rewrite getMeta a bit I think to not split and concat the blocksContent with our code... this is probably why StringDecoder accepts an Array of Buffers. With Buffers I assume it is Byte transfer and maybe @TimidScript has hit it right on the nose between chunks based off the previous dumps... then String creates the last Unicode character as junk... then the next Unicode character is junk too... thus a bugged String.

Will need a bit to do this again. Apologies for a lot of noise and ups/downs.

@Martii Martii added the bug You've guessed it... this means a bug is reported. label Oct 5, 2015
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 6, 2015
* Possible destruction of Unicode characters... rare use case but should be covered
* Added a watchpoint to this in case it ever bombs
* Simplified up the debug comments to earliest detection point

Applies to OpenUserJS#678
@Martii Martii mentioned this issue Oct 6, 2015
@Martii
Copy link
Member

Martii commented Oct 6, 2015

Uggh

< >> decoded str
< // ==UserScript==
< // @name                [TS] Pixiv++
< // @namespace           TimidScript
< // @version             3.3.80a.020 Beta
< // @description         Ultimate Pixiv Script: Direct Links, Auto-Paging, Preview, IQDB/Danbooru, Filter/Sort using Bookmark,views,rating,total score. | Safe Search | plus more. Works best with "Pixiv++ Manga Viewer" and "Generic Image Viewer". 自動ページング|��,�ケベル|ロード次ページ|フィルター|並べ替え|注文|ダイレクトリンク
< // @author              TimidScript

plus the webhook bombed on this... GRR!

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 6, 2015
* A lot of stuff not working with this... not sure this is the correct route.
* Still reading a bit and reverting to keep dev from blocking something

Applies to OpenUserJS#678
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 6, 2015
* Since it appears that `Chunks` is an array of `Buffers` still have to cycle through.
* This may not cover the junk Unicode... but trying

Applies to OpenUserJS#678
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 6, 2015
* Works on dev with write/edit, upload, and import... trying on pro for webhook

Applies to OpenUserJS#678
@Martii
Copy link
Member

Martii commented Oct 6, 2015

Holy smokes... it finally worked in all methods with just the metadata block Unit Test of the target script.

@TimidScript
Mind bumping your project version to confirm please?

@Martii Martii added the CODE Some other Code related issue and it should clearly describe what it is affecting in a comment. label Oct 6, 2015
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 6, 2015
@Martii
Copy link
Member

Martii commented Oct 6, 2015

@sizzlemctwizzle
This methodology fixes the issue however it looks like it may have busted the chunks... e.g. I only get one console message saying that it is using all 9 chunks e.g. aChunks.length... I'm a bit tired from all of this and will reexamine this a bit more later with the OpenUserJS metadata block present. If you have any ideas on this please let me know.

@Martii
Copy link
Member

Martii commented Oct 6, 2015

Some notes out loud (comment may be subject to change):

  • Could readd the loop back in... recalculate the buffer size (aChunks[i].length)... accumulate that value to the next iteration... recombine using .concat but use the 2nd parameter to set the accumulated value of the buffers up to that point... once the blocks are attained return the object... not sure this would work as I was getting initially 4 function iterations and now it's 1 (with 9 for the array of Buffers length) e.g. full source for every publishing method. Need to ponder some more.

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 6, 2015
* Rename `aChunks` to `aBufs` to match *node* docs and indicate that is what we are expecting
* `.concat` is vague on `totalLength` parameter but in tests with *node* it is `String` length e.g. `1` returns one character currently ... added Watchpoint here
* Remove the debug messages for now
* Tested on ~112.9KiB User Script... took ~10ms-11ms versus 7ms in debug mode on full Buffers conversion... slower **however** less memory intensive in some use cases especially with larger scripts

Applies to OpenUserJS#678

Refs:
* https://nodejs.org/api/buffer.html#buffer_class_method_buffer_concat_list_totallength
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 6, 2015
* Thanks to *node* maintainer for the clarification at nodejs/node#3219 of Buffer length versus character length
* Retested on a User Script that contained a lot of Unicode characters in a comment at the beginning then the metadata blocks at the EOF... String.length fails to upload... Buffer length passes on upload with two "chunks"

Applies to OpenUserJS#678
Martii pushed a commit to Martii/UserScripts that referenced this issue Oct 6, 2015
* Contains ordered ... metadata blocks first ... should use less chunks to initialize
* Contains inverted ... metadata blocks last ... should use maximum chunks to initialize
* Should **PASS** on publishing to OUJS and not be rejected in the "Chunking" buffered input e.g. metadata blocks should all be found
* Initial commit... Zero based versioning since it probably won't change much

Applies to OpenUserJS/OpenUserJS.org#678
@Martii Martii removed the needs testing Anyone can add this but it is primarily there for the Assignee indicating that Testers are wanted. label Oct 6, 2015
@Martii
Copy link
Member

Martii commented Oct 6, 2015

I'm relatively sure this issue can be closed now... haven't gotten external confirmation yet but tons of testing done. Reopen if needed or just leave a comment and we'll hear ya. :)

Leaving needs mitigation on for a little while as that's how I do some followup.

@Martii Martii closed this as completed Oct 6, 2015
Martii pushed a commit to Martii/UserScripts that referenced this issue Oct 6, 2015
* Fix OpenUserJS block closing ... ooops

Applies to OpenUserJS/OpenUserJS.org#678
@Martii Martii removed their assignment Oct 6, 2015
@Martii Martii removed the needs mitigation Needs additional followup. label Nov 19, 2015
@OpenUserJS OpenUserJS locked as resolved and limited conversation to collaborators Apr 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug You've guessed it... this means a bug is reported. CODE Some other Code related issue and it should clearly describe what it is affecting in a comment.
Development

No branches or pull requests

3 participants