Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bot results cannot handle Unicode #5998

Closed
annevk opened this issue May 19, 2017 · 11 comments
Closed

Bot results cannot handle Unicode #5998

annevk opened this issue May 19, 2017 · 11 comments

Comments

@annevk
Copy link
Member

annevk commented May 19, 2017

Branch names containing Unicode I can sorta appreciate breaking things (although, it's 2017), but bot results?

#5976

This makes me wonder whether the bots even see the same tests I do.

cc @jugglinmike @bobholt

@annevk annevk added the infra label May 19, 2017
@bobholt
Copy link
Contributor

bobholt commented May 19, 2017

A cursory examination of the bot files shows that they all carry the utf-8 encoding declaration. Additionally, the broken encoding doesn't happen on the Edge results, which makes me think it isn't a universal-type issue.

check_stability.py that runs the tests and produces the results in the logs doesn't carry the utf-8 encoding declaration, however the logs appear fine on Travis.

The bots don't care what the tests are, they just strip lines out of the raw Travis log and post that to GitHub as a comment.

I was going to call this a Travis bug, because their raw text logs aren't properly encoded, but that's true even for the Edge log, so it shouldn't be working there, either.

I'll see if I can dig up anything else.

@bobholt bobholt self-assigned this May 19, 2017
@Hexcles
Copy link
Member

Hexcles commented Sep 29, 2017

Chromium developer @bsittler reported another case yesterday: https://travis-ci.org/w3c/web-platform-tests/jobs/281056652#L2086

I took a quick look and check_stability.py is the culprit, in particular this line: https://github.com/w3c/web-platform-tests/blob/4efe9f5bfe0eb84e3f131e7bb1d9bd91ad24c288/tools/ci/check_stability.py#L76

The root cause is this line essentially assumes the provided msg is an un-encoded string (raw string with only ASCII, or Unicode string which in Python 2 looks like u'世界'). If msg is already encoded (and the encoded result contains high bytes), an exception is thrown:

s = u'世界'
msg = s.encode('utf8')
print msg.encode('utf8')
----
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

This line encodes and decodes the message (to and from UTF-8). It seems to me the sole purpose is to get backslashreplace. If that's the case, some other approaches may be better. Otherwise, we need to test if msg is already encoded (it may or may not be, depending on the call site).

@Hexcles
Copy link
Member

Hexcles commented Sep 29, 2017

@bobholt see my comment above. By the way, I just realized this might be a separate issue from the OP, but I'm not exactly sure about the context. Could you kindly help moving my report above into a new issue if you think they are two separate issues?

@foolip
Copy link
Member

foolip commented Jan 24, 2018

@Hexcles, should this also be backlog like #8285? Are they dupes?

@Hexcles
Copy link
Member

Hexcles commented Jan 24, 2018

@foolip I am not exactly what this issue originally was. It could be something different, as no one mentioned exceptions.

@annevk
Copy link
Member Author

annevk commented Jan 25, 2018

If you follow the link in OP and then expand the results from the bot you'll find "Loading data�" and various other errors meaning the bot did not decode using UTF-8.

@alijuma
Copy link
Contributor

alijuma commented Apr 3, 2018

@Hexcles, should this still be priority:roadmap?

@Hexcles
Copy link
Member

Hexcles commented Apr 3, 2018

This, along with #8285 , should be fixed soon-ish.

Though it might be hard to find all edge cases, I suspect there are a few low hanging fruits which can cover many code paths. This would perhaps be a good first issue? @gsnedders I saw you using that label. Are there any plans for regarding these issues? (Like a GSoC or something?)

@gsnedders
Copy link
Member

@Hexcles They've partly been used to give people planning on applying to Outreachy for Mozilla to have something to do (rather than throwing them in with no guidance) for the sake of selecting applicants.

@gsnedders
Copy link
Member

So what's the scope here?

  • Make sure the wptrunner logs always output in a constant, defined encoding
  • Make sure our CI scripts handle the logs containing that constant, defined encoding
  • Make sure prbuildbot handle the logs containing that constant, defined encoding

The last of these should probably be a prbuildbot issue handled by those working on that?

@jgraham
Copy link
Contributor

jgraham commented Jul 24, 2018

AFAICT this issue was a prbuildbot issue and that's no longer running, so closing by default.

@jgraham jgraham closed this as completed Jul 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants