Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Autorest on Windows generates CP1252 files and not UTF-8 #943

Closed
lmazuel opened this issue Apr 13, 2016 · 15 comments
Closed

[Python] Autorest on Windows generates CP1252 files and not UTF-8 #943

lmazuel opened this issue Apr 13, 2016 · 15 comments

Comments

@lmazuel
Copy link
Member

lmazuel commented Apr 13, 2016

Python files have to be UTF-8, no matter the default encoding of the system. Autorest for Python (at least Python, I didn't check the other languages) uses the default encoding system to generate the files.

For Swagger comment like this:
https://github.com/Azure/azure-rest-api-specs/blob/master/batch/2016-02-01.3.0/swagger/BatchService.json#L10918

"poolInfo": {
    "$ref": "#/definitions/PoolInformation",
    "description": "Gets or sets the pool on which the Batch service runs the job’s tasks."
},

where the apostrophe is an UTF-8 character, the generation on Windows is broken:

    :param pool_info: Gets or sets the pool on which the Batch service runs
     the job’s tasks.

Works great when it's Travis, because Ubuntu uses UTF-8 as the system encoding default.

@annatisch ?

@annatisch
Copy link
Member

hmm this seems to have changed recently, as my previous generation of the Batch client generated the comments correctly....
Investigating now!

@lmazuel
Copy link
Member Author

lmazuel commented Apr 14, 2016

In fact, if you look at the files in your PR to the Python SDK in first place, it was already broken:
Azure/azure-sdk-for-python@afe71c4#diff-3600c4f9d4cec1a865a3cad1631b2e18R87

@annatisch
Copy link
Member

You're right - interesting my diff locally doesn't display it correctly...

Looks like the NodeJS generator is also doing this:

  • @member {object} [poolInfo] Gets or sets the pool on which the Batch
  • service runs the job’s tasks.

And the C# generator:
///

    /// Gets or sets the pool on which the Batch service runs the job’s
    /// tasks.
    /// </summary>

So it appears to be part of AutoRest

@annatisch
Copy link
Member

So I can make a fix for converting the documentation strings to UTF-8 when generating (I pretty sure this would only apply to comments/docstrings... but correct me if I'm wrong).

It will only be for Python though - so I'm not sure if we want to explore a broader solution for autorest, or whether Python is the only language unduly affected by it.

@lmazuel
Copy link
Member Author

lmazuel commented Apr 15, 2016

In fact, this can come for many places:

  • The JSON reader of the Swagger spec
  • The Serializer of the final Python file

We need to find where the encoding is broken to be able to fix it

@annatisch
Copy link
Member

I believe it's in the JSON reader, as the strings are already broken by the time they're loaded into the model.

@lmazuel
Copy link
Member Author

lmazuel commented Apr 15, 2016

When it's the Writer, this is someone who uses a Writer without defining the optional encoding parameter.
I'm not a C# coder, but it seems like Java:
This https://msdn.microsoft.com/en-us/library/wtbhzte9(v=vs.110).aspx will use default encoding of the system

This https://msdn.microsoft.com/en-us/library/3aadshsx(v=vs.110).aspx will specify explictly the encoding (should be forced to UTF8 in Python at least)

But if you are right, it's the reader, but same concept:
https://msdn.microsoft.com/en-us/library/ms143456(v=vs.110).aspx

JSON is UTF8 (it's in the spec). The reader MUST specify EXPLICITLY the UTF8 encoding and do NOT use the default encoding.

@devigned ?

@annatisch
Copy link
Member

It looks like this might be the line:
https://github.com/Azure/autorest/blob/master/AutoRest/Modelers/Swagger/SwaggerParser.cs#L25

@amarzavery , @devigned
Would be great if you guys could comment on what impact this might have on other languages. Or whether this can be overridden on a per-language basis.

If you're happy that we can enforce UTF-8 here, I can submit a PR.

@amarzavery
Copy link
Contributor

@lmazuel , @annatisch - Yes we should enforce that. Please update that in the PR. Default value for encoding should be 'UTF-8'.

@annatisch
Copy link
Member

Sweet thanks - will get that out today.... though it would be great it you could merge PR #928 first :)

@annatisch
Copy link
Member

@annatisch
Copy link
Member

Hi @amarzavery ,
Could you (or another Autorest/C# expert) please take a look at the changes I made here:
https://github.com/Azure/autorest/pull/959/files#diff-b56c6227edf4561087b07f4947a1ac20R6
and here:
https://github.com/Azure/autorest/pull/959/files#diff-11296cc7adeb93899ed6fdbebeade46dR5

And make sure they are acceptable?
Thanks!

@annatisch
Copy link
Member

Actually @amarzavery , it seems that the change in SwaggerParser is causing issues in the test in Travis (Jenkins seems to be fine).

TestClientModelWithNoContent [FAIL]
System.IO.FileNotFoundException : Could not find file "/home/travis/build/Azure/autorest/AutoRest/Modelers/Swagger.Tests/bin/Net45-Debug/Swagger\swagger-no-content.json".
Stack Trace:
at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, Boolean anonymous, FileOptions options) <0x7fc12030ad40 + 0x005cd> in :0
at System.IO.FileStream..ctor (System.String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, System.String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost) <0x7fc12030ac90 + 0x0004d> in :0
at (wrapper remoting-invoke-with-check) System.IO.FileStream:.ctor (string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,int,System.IO.FileOptions,string,bool,bool,bool)
at System.IO.StreamReader..ctor (System.String path, System.Text.Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize, Boolean checkHost) <0x7fc12016dac0 + 0x000e4> in :0
at System.IO.StreamReader..ctor (System.String path, System.Text.Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize) <0x7fc12016da80 + 0x00030> in :0
at System.IO.StreamReader..ctor (System.String path, System.Text.Encoding encoding) <0x7fc12016da00 + 0x00031> in :0
at (wrapper remoting-invoke-with-check) System.IO.StreamReader:.ctor (string,System.Text.Encoding)
at System.IO.File.ReadAllText (System.String path, System.Text.Encoding encoding) <0x7fc120309160 + 0x0003b> in :0
at Microsoft.Rest.Modeler.Swagger.SwaggerParser.Load (System.String path, IFileSystem fileSystem) <0x402c81b0 + 0x0003b> in :0
at Microsoft.Rest.Modeler.Swagger.SwaggerModeler.Build () <0x402c71b0 + 0x0013e> in :0
at Microsoft.Rest.Modeler.Swagger.Tests.SwaggerModelerTests.TestClientModelWithNoContent () <0x40323fa0 + 0x00158> in :0
at (wrapper managed-to-native) System.Reflection.MonoMethod:InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&)
at System.Reflection.MonoMethod.Invoke (System.Object obj, BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) <0x7fc12035a5d0 + 0x000a1> in :0

Any thoughts on what I could do to resolve this? Is it simple a path formatting difference between fileSystem.ReadFileAsText and File.ReadAllText?

Thanks!

@annatisch
Copy link
Member

Actually I think I've found where I messed up :)
All the changes I've made to support this are now found here:
https://github.com/Azure/autorest/pull/959/files#diff-b56c6227edf4561087b07f4947a1ac20R8

@annatisch
Copy link
Member

@amarzavery - this can be closed now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants