Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8, MIME Type Declaration, BOMs, and other encoding standards #198

Closed
Martii opened this issue Jun 24, 2014 · 5 comments
Closed

UTF-8, MIME Type Declaration, BOMs, and other encoding standards #198

Martii opened this issue Jun 24, 2014 · 5 comments
Labels
DOC Pertains inclusively to the Documentation operations. question A question has been encountered by anyone and has remained unanswered until cleared. team biz This is similar to a meta discussion.

Comments

@Martii
Copy link
Member

Martii commented Jun 24, 2014

This is going to be a very general issue ticket in order to discuss and address a few inconsistencies with node projects, including ours. And I'd like to hear some experiences encountered with our node project and any others.

DONE - UTF-8: All node projects should be using uppercase UTF-8 and not utf-8 according to the spec. When this is not consistent this can lead to unpredictable behavior between deployments whether on dev or production servers. Some book page websites may say it can be lowercase but some software components may not be smart enough to distinguish the difference.

DONE - Byte Order Mark (BOM) : BOMs are said to be not used everywhere... currently we have one in at least one file. This should be remedied on a system that can control that. This from my experience usually happens when there is a Unicode character inserted into a file and saved. Our current STYLEGUIDE hints at this with "Avoid use of international characters
because they may not read well or be understood everywhere.". Unfortunately I don't see an easier way to detect if a pr or commit is generating these or not.

In general encodings may need to be explicitly defined in contradiction to our current STYLEGUIDE saying the server handles it.

_EDIT_:
MIME types: These should always be included rather than having the server/client guessing off of file extensions.

See also:

Applies to and isolated from #19. Most of this will go in either STYLEGUIDE and/or CONTRIBUTING

@sizzlemctwizzle
Copy link
Member

Moved the UTF-8 bug hunting work to #200.

sizzlemctwizzle referenced this issue Jun 24, 2014
If a githubRepoPage has multiple javascriptBlobs, open the import page in a new tab.
Accepts POST requests on the import page as well.
sizzlemctwizzle referenced this issue Jun 24, 2014
This and the following commits should be PR ready, and should not
break existing routes. This means refactored code that affect more
than one route will be duplicated and renamed. We can easily cleanup
extra code after implementing the entire refactor.
@Zren
Copy link
Contributor

Zren commented Jun 24, 2014

I just imported /ZrenTest/TestUserScript/folder/folder/%E3%83%86%E3%82%B9%E3%83%88.user.js. On dev, it redirects okay.

On Production it redirects to: https://openuserjs.org/scripts/zrentest/%C6%B9%C8 %C6%B9%C8
Instead of https://openuserjs.org/scripts/zrentest/%E3%83%86%E3%82%B9%E3%83%88 %E3%83%86%E3%82%B9%E3%83%88

The script imports fine.

@sizzlemctwizzle
Copy link
Member

@Zren

That discussion has been moved to #200. And yes, I ran into the same problem myself (as has @Martii).

@Martii Martii changed the title UTF-8 MIME Type Declaration, BOMs, and other encoding standards UTF-8, MIME Type Declaration, BOMs, and other encoding standards Jun 25, 2014
@Martii
Copy link
Member Author

Martii commented Jun 25, 2014

One thing I've noticed is there usually isn't a clear path of:

  1. Input encoding and/or encoded
  2. Working encoding and/or encoded
  3. Output encoding and/or encoded

With a specific example... in our search routines Input is currently converted to a Working ISO 8859-1 "sort of" regular expression compatible syntax via deprecated escape and originally a plain ASCII regular expression match with some UTF-8 to match against the assumed UTF-8 data. e.g. Not much consistency here.

Ideally Working encoding should always be UTF-8 but we know regular expressions have some issues (as do some other support routines expecting something else)... which is why they aren't always reliable to use with variant Input. Another reason why I opted out of using a re with sanitizing certain linked @keys in another pr.

For the docs we should clarify for prs that all transformations like this should be commented briefly in the source as to what the I/W/O is and perhaps a variable name identifier prefix ... unless it's in HTML/XML where the tags could of course denote this.

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Jun 26, 2014
If running Linux a good recursive method for detecting these is:

``` sh-session
$ find . -type f -print0 | xargs -0r awk '/^\xEF\xBB\xBF/ {print FILENAME} {nextfile}'
```

Applies to OpenUserJS#200 and OpenUserJS#198
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Jun 26, 2014
…ion-4)

Applies to OpenUserJS#198 and OpenUserJS#200

**NOTE**
* Why is there a `defer` flag for ace? Could be related to OpenUserJS#148
* Currently ignoring `./public/js`
@Martii Martii added the DOC label Jun 30, 2014
@Martii
Copy link
Member Author

Martii commented Jul 16, 2014

This is more or less been determined... closing.

@Martii Martii closed this as completed Jul 16, 2014
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Sep 10, 2014
* Similar to @janekptacijarabaci fix in greasemonkey/greasemonkey#1940
* Fix compliance with STYLEGUIDE.md and usage of pre-initialized identifiers
* Currently **do not** propagate BOM with meta routine or user.js source with and without installation count increment
* BOM currently shows up in Ace as a exclamation triangle with "This character may get silently deleted by one or more browsers"

**NOTE**: Many thanks to the report by @cvzi and applies to OpenUserJS#200 and partially outlined in OpenUserJS#198.
Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Oct 5, 2015
…keeps things consistent *hopefully*

* Using uppercase as mentioned at OpenUserJS#198 (comment)

Applies to OpenUserJS#678

Related to:
* OpenUserJS#348 discovery
* OpenUserJS#200
* OpenUserJS#198
@OpenUserJS OpenUserJS locked as resolved and limited conversation to collaborators Apr 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DOC Pertains inclusively to the Documentation operations. question A question has been encountered by anyone and has remained unanswered until cleared. team biz This is similar to a meta discussion.
Development

No branches or pull requests

3 participants