Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UTF-8 Support #200

Open
sizzlemctwizzle opened this issue Jun 24, 2014 · 24 comments
Open

Fix UTF-8 Support #200

sizzlemctwizzle opened this issue Jun 24, 2014 · 24 comments
Labels
bug You've guessed it... this means a bug is reported. needs testing Anyone can add this but it is primarily there for the Assignee indicating that Testers are wanted.

Comments

@sizzlemctwizzle
Copy link
Member

@OpenUserJs/admin @OpenUserJs/backend @OpenUserJs/frontend
Related to #198 and a few other bugs, I just wanted to consolidate identifying and fixing a type of bug that might be common inside of OUJS. UTF-8 characters sometimes break things in production. It doesn't happen in development for some reason. It mostly has to do with encoding going wrong (or the wrong encoding function being used) during a redirect, but it might happen it other areas that have yet to be uncovered.

So far the errors I've been able to verify in production are:

  • Import using GitHub (UTF-8 in GitHub path)
  • Edit script description (UTF-8 in scriptname)

@Martii
Were there any others? Any other script manipulation I tried in production seems to work.

This is a blocking bug. I will not accept any unrelated PRs, except bug fixes, until this issue is resolved. Update: All issues that been discovered have been fixed, but there may be lingering undiscovered bugs so I'm leaving this issue open until we're confident this has been resolved.

Your mission (should you choose to accept it): Use UTF-8 characters in production in any way you can possibly imagine, and report back here with your results in the comments. This way we don't waste effort testing something twice and can identify the problem areas that need to be fixed.

sizzlemctwizzle referenced this issue Jun 24, 2014
This and the following commits should be PR ready, and should not
break existing routes. This means refactored code that affect more
than one route will be duplicated and renamed. We can easily cleanup
extra code after implementing the entire refactor.
sizzlemctwizzle referenced this issue Jun 24, 2014
If a githubRepoPage has multiple javascriptBlobs, open the import page in a new tab.
Accepts POST requests on the import page as well.
@Martii
Copy link
Member

Martii commented Jun 24, 2014

Were there any others?

Not that I have seen yet. I could delete my affected scripts but not show the source or edit the source on production.

  • Import using GitHub (UTF-8 in GitHub path)
  • Edit script description (UTF-8 in scriptname)

Most notably on the POST routines... e.g. they do change the data store on production but come back with ISO 8859-1 string redirects on dev (which means they work as expected but maybe not correctly) and UTF-8 string redirects on production... which is very peculiar.

See also:

@sizzlemctwizzle
Copy link
Member Author

but not show the source or edit the source on production.

Could you give me a url? I haven't been able to reproduce.

Most notably on the POST routines...

Wait a minute... Since they're redirections and not views, we must not be setting the character set correctly.

@Martii
Copy link
Member

Martii commented Jun 24, 2014

I haven't been able to reproduce.

I've subsequently deleted my affected scripts... on resync the issue of being non-editable isn't present but the 400 is... I didn't want orphaned scripts on OUJS currently if a fix shows up... I can infer that we were/are storing in ISO 8859-1 instead of UTF-8 in the DB I think. If you pick my Hello World script (only one in UserScripts repo) it should do it as of testing earlier this morning. I haven't retried it with your recent commits and possible deployment yet.

EDIT: You might try in jesus2099's account with his scripts (specifically this one that had source before DB Migration) that have Unicode in them too. I can't view Source Code on a few of them which is how I detected which ones of mine to delete.

EDIT: CONFIRMED pro importing my Hello World script causes the 400 with new tab on current, as of just a moment ago, still.

@Martii
Copy link
Member

Martii commented Jun 24, 2014

btw production appears to be goofed up right now... with CSS at least. :\ This is challenging to find where things should be vs where they are. ;)

@Zren
Copy link
Contributor

Zren commented Jun 24, 2014

btw production appears to be goofed up right now... with CSS at least. :\

Getting

Resource interpreted as Stylesheet but transferred with MIME type text/plain: "https://openuserjs.org/css/common.css". openuserjs.org/:15
Resource interpreted as Stylesheet but transferred with MIME type text/plain: "https://openuserjs.org/css/bootstrap-custom.css". openuserjs.org/:14
Resource interpreted as Stylesheet but transferred with MIME type text/plain: "https://openuserjs.org/css/github.css".

I left out type="text/css" in the <link> tags, but it was working without them so I've got no idea why it just broke.

@sizzlemctwizzle
Copy link
Member Author

so I've got no idea why it just broke.

I was trying something out in production. Reverted. Should be fixed now.

@Martii
Copy link
Member

Martii commented Jun 25, 2014

I left out type="text/css" in the <link> tags

Later those should probably go back in from where they left. MIME types are important to maintain with client side.

Reverted. Should be fixed now.

It is. :) Thanks.

@Martii
Copy link
Member

Martii commented Jun 25, 2014

There's also may be some on dev that I saw last night in my account too. Those are totally orphaned from what I can tell... e.g. can't delete them because no homepage access. I haven't check today with current HEAD yet.

from my dev script listings page. These were orphaned during some of the past commit fixes including dev DB migration from your discussion.

@sizzlemctwizzle
Copy link
Member Author

You're pretty much on your own in dev. I might just wipe out all existing script data.

@Martii
Copy link
Member

Martii commented Jun 25, 2014

I might just wipe out all existing script data.

LOL well that works too. +1

Until I get a little more involved with how to tweak the dev DB any help is always appreciated. :P :) Btw I sort of expected this to happen and was wondering if pro would do the same thing but didn't try it during migration assuming that was possible. ;)

@sizzlemctwizzle
Copy link
Member Author

Well manually fixed these above scripts I identified along with the first three the migration broke. Please let me know if anyone runs across any others.

LOL well that works too. +1

Okay, I'll do that soon.

if pro would do the same thing but didn't try it during migration assuming that was possible.

It was. There was a small amount of window between the data migration (preformed locally) and the deployment of the new code that could handle it, where everything would have been broken. But I was fast, and it happened late at night if I remember correctly.

@Martii
Copy link
Member

Martii commented Jun 25, 2014

Now this is unusual:

currently resolves to:

  • https://openuserjs.org/#2664057237963052706

@Martii
Copy link
Member

Martii commented Jun 25, 2014

Use UTF-8 characters in production in any way you can possibly imagine, and report back here with your results in the comments.

and "needs testing" label:

The redirect issue still present for Unicode of course.

@sizzlemctwizzle
Copy link
Member Author

Okay my latest commit seems to have fixed the redirect issue, as far as I can tell.

currently resolves to:

I can't confirm. The link works correctly for me.

  • Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0
  • Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.114 Safari/537.36

@Martii
Copy link
Member

Martii commented Jun 25, 2014

I can't confirm.

  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_%28DE%29%28no_longer_PayPal%29 is the "real" URI (encoded from a Fx addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofortÜberweisung_(DE)(no_longer_PayPal) (decoded)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofortÜberweisung_(DE)(no_longer_PayPal) is what you posted above as a plain URL.
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_(DE)(no_longer_PayPal) Firebug grab of the GH href
    .... appears that GH may have some issues too with their automatic linking or perhaps Fx via address bar. hmmm.

  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_%28DE%29%28no_longer_PayPal%29 (encoded from a Fx addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofortÜberweisung_(DE)(no_longer_PayPal) (Not encoded at all from Opera addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_(DE)(no_longer_PayPal) (encoded from a Chromium addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_%28DE%29%28no_longer_PayPal%29 (encoded from a SM addy bar copy)

Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 - Firefox
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 - Chromium
Opera/9.80 (X11; Linux x86_64) Presto/2.12.388 Version/12.16 - Opera
Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26 - SeaMonkey


  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_%28DE%29%28no_longer_PayPal%29 (encoded from a Fx addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofortÜberweisung_(DE)(no_longer_PayPal) (Not encoded at all from Opera addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_(DE)(no_longer_PayPal) (encoded from a Chrome addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_(DE)(no_longer_PayPal) (encoded from a IE addy bar copy)
  • https://openuserjs.org/scripts/Kelseydehaan/iBOOD_Autosubmit_to_sofort%C3%9Cberweisung_%28DE%29%28no_longer_PayPal%29 (encoded from a SM addy bar copy)

However Win7 does appear to work with your link... reconfirming on that machine directly... CONFIRMED however works with mine too... UTF-8 and encodings can be so much fun! :\

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0 - Firefox
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 - Chrome
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko - IE
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 OPR/22.0.1471.70 - Opera
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26 - SeaMonkey


redirect issue

Will check momentarily... Seems to be working. Awesome!

Since you are using encodeURI and I have submitted at least one pr using encodeURIComponent we might want to recheck these... encodeURI has these earmarked as reserved ; , / ? : @ & = + $ according to MDN. I'm fairly confident with usernames it should be as much as possible for @collaborator... the autolinking for @supportURL and the @homepageURL I'm only decoding with decodeURI on the text value and url value remains unchanged from script source.

@Martii
Copy link
Member

Martii commented Jun 25, 2014

Okay @sizzlemctwizzle the full set of my browsers is in for address bar copying testing... there doesn't seem to be any pattern to the browser madness other than symmetry with who doesn't do what... Think I need to look up the ES specs (vague as usual) and see what it is supposed to be. (See also IETF) There's a definite difference on address bar copying between all of them... and some issues between Nix browsers with accessing and not accessing your originally linked url... I think this one might be a "wait on all of them to sort it out"...kind of thing.

EDIT: And now my url nor yours doesn't work with Fx here (Linux), even on a paste, even though I just tested this out on 9 different browsers ... GRR! I don't like bugs like this... impedes testing here.

EDIT: Tried in a squeeky clean Fx profile and now my link works again and so does yours... bleh Add-On compatibility issue... scratch that report for now. I'll retest after a while and a few updates including from other key AO devs including a potential candidate of NoScript since it is saying it is filtering an XSS attack with these now.

EDIT: Looking at the percent encoding spec, under "sub-delims" on IETF, parenthesis are reserved so they should always be encoded when used in a filename... so Mozilla is the only one that seems to have got it right. :) The rest including GH's current autolinking is incorrect according to those specs.

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Jun 25, 2014
…ginal in OpenUserJS#89

Applies to OpenUserJS#200

* I currently see no advantage of pointing the view url to the current `tree/HEAD` when clicking the url to go to GH during the import process. I do however see a disadvantage of not being able to traverse up their breadcrumb list and being locked into a specific commit hash
* Since the import process is probably dependent upon `/tree/HEAD` now leaving that alone
* Add `target="_blank"` to this url since we do this anyways with multiple entries and keeps that window/tab open for importing after checking with GH
* Fix white space issue with optional target after POST routine returns
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Jun 25, 2014
* This is considered bad form and could eventually break this controller if V8 updates to strict mode.
* Think I got all of them on a manual survey of this controller.

Applies to OpenUserJS#200 and should be clarified in OpenUserJS#19
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Jun 25, 2014
* Create another object for mustache to correctly show GH target url if percent is used in `dir` or `filename`
* Alter view page mustache usage to match

Applies to OpenUserJS#200 and OpenUserJS#200 (comment)
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Aug 13, 2014
* Apparently this isn't needed with a redirect in express under this context

Inversely applies to OpenUserJS#200 and included in OpenUserJS#288

Tested on dev and target test script on production when deployed is https://openuserjs.org/scripts/TimidScript/%5BTS%5D_Pixiv++

Hopefully this won't need to be backed out but it's small enough to redeploy
@cvzi
Copy link

cvzi commented Sep 9, 2014

I found an error with a UTF-8 file (and I hope this is the right place to report it)
I tried to update one of my script via Upload Script on https://openuserjs.org/user/add/scripts
After the file was upload I got a 302 redirect back to https://openuserjs.org/user/add/scripts instead of the source code being updated.
I think that the problem was the BOM which was added automatically by Windows' notepad.exe when I edited the file.

@Martii
Copy link
Member

Martii commented Sep 10, 2014

Tested BOM at https://github.com/Martii/UserScripts/blob/master/src/RFC%202606%C2%A73/BOM%20test/RFC%202606%C2%A73%20-%20BOM%20test.user.js

  • OUJS File upload - PASS
  • OUJS GitHub import - PASS
  • Serving user.js with BOM from OUJS - PASS
  • Serving user.js with BOM from GH - PASS

Notepad isn't known for adding BOMs but it's also gone through a lot of changes. Notepad++ on the other hand is how this one was crafted by changing to Encode with UTF-8 (which includes the BOM and verified in a hex editor starting file with EF BB BF)

Any chance that you can upload that to a GitHub (GH) repository so I can test that file instead of one that I created? Don't do a gist as I'm not sure that will maintain the BOM at all.

And yes this is the correct place. :)

@cvzi
Copy link

cvzi commented Sep 10, 2014

I tried to upload your testfile to oujs.org and it worked. Maybe my files are corrupt in another way.

I overwrote the original file, but I was able to reproduce the error. I've done this with Windows 7.
Create a new file with Notepad++ and set Encoding to UTF8 without BOM.
Add at least one 2-Byte character (I used a German umlaut ö) and save the file.
Now open the file with notepad.exe and just save it.
And now the file has the EF BB BF Bytes. And I cannot upload it to OJUS.

Here is my test file:
https://github.com/cvzi/Userscripts/tree/master/UTF8BOM_test
When I try to upload UTF8 BOM test.user.js on OUJS I get the behavior as mentioned before.
When I try to import it via OUJS GitHub import I get this error page: 400 | Specified file does not contain a userscript header.

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Sep 10, 2014
* Similar to @janekptacijarabaci fix in greasemonkey/greasemonkey#1940
* Fix compliance with STYLEGUIDE.md and usage of pre-initialized identifiers
* Currently **do not** propagate BOM with meta routine or user.js source with and without installation count increment
* BOM currently shows up in Ace as a exclamation triangle with "This character may get silently deleted by one or more browsers"

**NOTE**: Many thanks to the report by @cvzi and applies to OpenUserJS#200 and partially outlined in OpenUserJS#198.
@Martii
Copy link
Member

Martii commented Sep 10, 2014

I overwrote the original file, but I was able to reproduce the error. I've done this with Windows 7.
Create a new file with Notepad++ and set Encoding to UTF8 without BOM.
Add at least one 2-Byte character (I used a German umlaut ö) and save the file.
Now open the file with notepad.exe and just save it.
And now the file has the EF BB BF Bytes. And I cannot upload it to OJUS.

@cvzi
I've confirmed it with your file, including hex editor check, and have a PR that will probably get merged tomorrow. Needs review time for a little while then we'll get you taken care of. Thank you very much for your report. :)

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Feb 10, 2015
* IIFEs client side to prevent potential and future identifier collisions e.g. "Give a hoot, Don't Pollute" the global content scope unless we plan to support something there or someone else already does.
* Some more STYLEGUIDE.md conformance
* Some more white-space for readability
* Move all view page inline scripts to `.../includes/scripts` where they should be for possible reuse
* Always declare `type="text/javascript"` for some older browsers that don't default to JavaScript
* Always assume and declare `charset="UTF-8"` for served scripts... our inline scripts should always currently be UTF-8 ... applies to OpenUserJS#200
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Jul 22, 2015
* Crashes the server when removing or never existing `@name` but localized ones exist
* Not entirely sure why one needs to `encodedURI` and the other `decodeURI` but redirect has issue ... possibly a current *node* issue
* Change text to import since it's not uploading a script technically

Additionally applies to OpenUserJS#200, OpenUserJS#226 edge case, and a bit from OpenUserJS#655 detection
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 5, 2015
…keeps things consistent *hopefully*

* Using uppercase as mentioned at OpenUserJS#198 (comment)

Applies to OpenUserJS#678

Related to:
* OpenUserJS#348 discovery
* OpenUserJS#200
* OpenUserJS#198
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 20, 2015
* Fix flagging and auth as redirect to be encoded. *(again?)*

Applies to OpenUserJS#200
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Nov 2, 2015
* Fixes a bug where ANSI *(and probably any other encoding types)* is served from uploaded scripts
* Apparently the header CHARSET is somewhat ignored in *node* output streams

Applies to OpenUserJS#200 and wherever file uploading was first added... no issue for that in a search
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Nov 6, 2015
* Don't use `options` for JSON and plain text outputs instead use module helper ... faster/cleaner... this is also a Unit Test in this area as well as OpenUserJS#249 checking
* Added watchpoint for OpenUserJS#200 with `encodeURI`
* Only time I've encounter `nil` helper function **so far**... added notation on inconsistency and since it's the only helper function used at this time in this file condensed the `require`
* Added watchpoint for an encountered special render op... first I've seen of these so far... I know what the output looks like but unable to find a matching view for it other than it's visual reference in Admin Tools

**NOTES**
* Questions to follow
* One TODO to finish depending on answers

Originally applies to OpenUserJS#262 ... not part of OpenUserJS#262 (comment)
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Nov 17, 2015
* Some whitespace adjustment for STYLEGUIDE.md
* Drop `slug` terminology except where currently needed... this still might be completely going away but smaller steps to avoid mass breakage and increase mutual understanding
* Partial fix for an encoding component issue ... there are more TODO's needed to fix all of it
* Some controller symmetry
* Using some ...`URIComponent` as indicated in OpenUserJS#795 ... needs individual handling of username and scriptname and avoids some encoding/decoding issues

**NOTES**
* This is not done yet but getting closer
* Should be a near parallel change with a partial bug fix


Applies to OpenUserJS#819 , OpenUserJS#200, and

Originally applies to OpenUserJS#262 ... treads on OpenUserJS#262 (comment) *(preauth)*
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Nov 25, 2015
* Still a partial bug fix... this entire slug methodology still needs to be refactored which is going to be lengthy.
* Surprised no one noticed, including me until now, that issue counts were all zero on the badge.

Followup from OpenUserJS@1f82419

Applies to OpenUserJS#819, OpenUserJS#200, and

Originally applies to OpenUserJS#262 ... not part of OpenUserJS#262 (comment)
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Nov 22, 2021
* Now I recall some of it... Percent wasn't excluded in `cleanFilename` so we have to double check things. MongoDB doesn't care neither does AWS... but browsers do.

Post OpenUserJS#1847 and applies to OpenUserJS#200

NOTE:
* Test script RFC and n%
Martii added a commit that referenced this issue Nov 22, 2021
* Now I recall some of it... Percent wasn't excluded in `cleanFilename` so we have to double check things. MongoDB doesn't care neither does AWS... but browsers do.

Post #1847 and applies to #200

NOTE:
* Test script RFC and n%

Auto-merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug You've guessed it... this means a bug is reported. needs testing Anyone can add this but it is primarily there for the Assignee indicating that Testers are wanted.
Development

No branches or pull requests

4 participants