-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Autorest on Windows generates CP1252 files and not UTF-8 #943
Comments
hmm this seems to have changed recently, as my previous generation of the Batch client generated the comments correctly.... |
In fact, if you look at the files in your PR to the Python SDK in first place, it was already broken: |
You're right - interesting my diff locally doesn't display it correctly... Looks like the NodeJS generator is also doing this:
And the C# generator:
So it appears to be part of AutoRest |
So I can make a fix for converting the documentation strings to UTF-8 when generating (I pretty sure this would only apply to comments/docstrings... but correct me if I'm wrong). It will only be for Python though - so I'm not sure if we want to explore a broader solution for autorest, or whether Python is the only language unduly affected by it. |
In fact, this can come for many places:
We need to find where the encoding is broken to be able to fix it |
I believe it's in the JSON reader, as the strings are already broken by the time they're loaded into the model. |
When it's the Writer, this is someone who uses a Writer without defining the optional encoding parameter. This https://msdn.microsoft.com/en-us/library/3aadshsx(v=vs.110).aspx will specify explictly the encoding (should be forced to UTF8 in Python at least) But if you are right, it's the reader, but same concept: JSON is UTF8 (it's in the spec). The reader MUST specify EXPLICITLY the UTF8 encoding and do NOT use the default encoding. |
It looks like this might be the line: @amarzavery , @devigned If you're happy that we can enforce UTF-8 here, I can submit a PR. |
@lmazuel , @annatisch - Yes we should enforce that. Please update that in the PR. Default value for encoding should be 'UTF-8'. |
Sweet thanks - will get that out today.... though it would be great it you could merge PR #928 first :) |
With regards to writing files it looks like this: |
Hi @amarzavery , And make sure they are acceptable? |
Actually @amarzavery , it seems that the change in SwaggerParser is causing issues in the test in Travis (Jenkins seems to be fine). TestClientModelWithNoContent [FAIL] Any thoughts on what I could do to resolve this? Is it simple a path formatting difference between fileSystem.ReadFileAsText and File.ReadAllText? Thanks! |
Actually I think I've found where I messed up :) |
@amarzavery - this can be closed now. Thanks! |
Python files have to be UTF-8, no matter the default encoding of the system. Autorest for Python (at least Python, I didn't check the other languages) uses the default encoding system to generate the files.
For Swagger comment like this:
https://github.com/Azure/azure-rest-api-specs/blob/master/batch/2016-02-01.3.0/swagger/BatchService.json#L10918
where the apostrophe is an UTF-8 character, the generation on Windows is broken:
Works great when it's Travis, because Ubuntu uses UTF-8 as the system encoding default.
@annatisch ?
The text was updated successfully, but these errors were encountered: