Skip to content

Change default encoding in XML.save to UTF-8 #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 25, 2017
Merged

Conversation

Rogach
Copy link
Contributor

@Rogach Rogach commented Nov 10, 2016

Was ISO-8859-1, which resulted in encoding errors at runtime if document
contained non-latin1 characters. Also XML spec states that
documents without xml declaration are expected to contain UTF-8 or UTF-16,
so writing in ISO-8859-1 without xml declaration (which was the default)
can easily break compliant parsers.

@ashawley
Copy link
Member

Is it worthwhile to add a scala.deprecated annotation to the method?

@Rogach
Copy link
Contributor Author

Rogach commented Nov 10, 2016

@ashawley In that case we will need to create another method (saveFile, for example) and apply the change there. Is it worth it?

@ashawley
Copy link
Member

Oh, right. I didn't mean deprecate. Sorry about that. Is there a way to emit a warning that the behavior of the method has changed?

@Rogach
Copy link
Contributor Author

Rogach commented Nov 10, 2016

@ashawley - never saw anything like that. And since scala does not have any method for silencing warnings, I doubt there is one (else everybody would be stuck looking at that warning message on each compilation).

@ashawley
Copy link
Member

Ok, good points. Maybe add a note in the method's documentation about the change then? We'll have to notify the town crier as well.

@Rogach
Copy link
Contributor Author

Rogach commented Nov 10, 2016

Done. Who's the town crier?

@SethTisue
Copy link
Member

SethTisue commented Nov 10, 2016

we can town-cry about it in the release notes for whatever Scala 2.12.x version bumps the scala-xml module version to include this

@@ -74,6 +74,10 @@ object XML extends XMLLoader[Elem] {
* Saves a node to a file with given filename using given encoding
* optionally with xmldecl and doctype declaration.
*
* Note: default encoding was ISO-8859-1 (latin1) in pre-1.0.7 scala-xml versions.
* If your code depends on charaters in non-ASCII latin1 range, specify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/charaters/characters/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame on me :(

ashawley added a commit to ashawley/scala-xml that referenced this pull request Nov 11, 2016
Try to closely mimic bug in XML.save and XML.loadFile, but write tests
that don't use the file system.

Will fail in 1.0.6 and earlier:

    expected:<...klmnopqrstuvwxyz{|}~[ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]</x>>
    but was:<...klmnopqrstuvwxyz{|}~[????????????????????????????????????????????????????????????????????????????????????????????????]</x>>

Will be fixed in scala#122.
@ashawley
Copy link
Member

Unit test in another PR that will fail, but will be fixed by this PR.

@Rogach
Copy link
Contributor Author

Rogach commented Nov 11, 2016

@ashawley Should I include that test in this PR?

@ashawley
Copy link
Member

Sure. Feel free to cherry-pick or rebase in here.

Rogach pushed a commit to Rogach/scala-xml that referenced this pull request Nov 11, 2016
Try to closely mimic bug in XML.save and XML.loadFile, but write tests
that don't use the file system.

Will fail in 1.0.6 and earlier:

    expected:<...klmnopqrstuvwxyz{|}~[ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]</x>>
    but was:<...klmnopqrstuvwxyz{|}~[????????????????????????????????????????????????????????????????????????????????????????????????]</x>>

Will be fixed in scala#122.
@Rogach
Copy link
Contributor Author

Rogach commented Nov 16, 2016

Is there something else I need to do to get this PR accepted?

@SethTisue
Copy link
Member

SethTisue commented Feb 4, 2017

needs rebase now. LGTM otherwise and I would suggest we merge in a few days, after the rebase, if there are no actual objections

Rogach pushed a commit to Rogach/scala-xml that referenced this pull request Feb 4, 2017
Try to closely mimic bug in XML.save and XML.loadFile, but write tests
that don't use the file system.

Will fail in 1.0.6 and earlier:

    expected:<...klmnopqrstuvwxyz{|}~[ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]</x>>
    but was:<...klmnopqrstuvwxyz{|}~[????????????????????????????????????????????????????????????????????????????????????????????????]</x>>

Will be fixed in scala#122.
@Rogach
Copy link
Contributor Author

Rogach commented Feb 4, 2017

Done.

Seems rebase loses commit times, sadly (I'm more used to merge workflow).

@SethTisue
Copy link
Member

Seems rebase loses commit times, sadly

are you sure? I don't think so

(regardless, thanks for taking care of it)

@Rogach
Copy link
Contributor Author

Rogach commented Feb 4, 2017

I see it now - my log viewer was set up to display commiter (%cr) date instead of author (%ar). I fixed it, thanks for pointing it out!

@ashawley
Copy link
Member

ashawley commented Feb 4, 2017

Still LGTM!

Copy link
Member

@ashawley ashawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashawley
Copy link
Member

If you could do one more rebase and push, I'll merge this, tomorrow.

Rogach and others added 4 commits May 24, 2017 21:25
Was ISO-8859-1, which resulted in encoding errors at runtime if document
contained non-latin1 characters. Also XML spec states that
documents without xml declaration are expected to contain UTF-8 or UTF-16
- so writing in ISO-8859-1 without xml declaration (which was the default)
can easily break compliant parsers.
Try to closely mimic bug in XML.save and XML.loadFile, but write tests
that don't use the file system.

Will fail in 1.0.6 and earlier:

    expected:<...klmnopqrstuvwxyz{|}~[ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]</x>>
    but was:<...klmnopqrstuvwxyz{|}~[????????????????????????????????????????????????????????????????????????????????????????????????]</x>>

Will be fixed in scala#122.
@Rogach
Copy link
Contributor Author

Rogach commented May 24, 2017

@ashawley - Done.

As a side note, I have huge troubles with rebase workflow - in this case, for example, I had to rebase the changes, thus rewriting history, and then had to do push -f to github (which is usually frowned upon). How rebase workflow is supposed to work in case of several distributed repositories?

@ashawley
Copy link
Member

Thanks for responding so quickly. And apologies, for troubling you by repeatedly asking you to rebase. Especially, if you aren't used to our git workflow.

Yes, rebase does rewrite git history on your branch. It looks like your branch name for this PR is called master. Is that the reason you have trouble? Usually, I create a new branch called utf-8-encoding or fix-121. That way I can

git checkout master
git pull upstream master
git checkout fix-121
git rebase master
git push -f origin fix-121

Hope that helps!

@Rogach
Copy link
Contributor Author

Rogach commented May 24, 2017

Thanks for the explanation!

No, master branch does not give much trouble (though it will be a little bit more convenient if I did name the PR branch differently). But the thought of routinely doing push -f scares me a lot (and different branch name does not save from that). Merge workflow results in a little dirtier history, but saves me from messing with history rewriting.

@ashawley
Copy link
Member

Yeah, push force is weird. In addition to git reflog, most of the git commands leave a trace in thehir output of what sha hashes you're moving between, so you can always recover or study them if you need to. I am also paranoid, so I ocassionally inspect my commits and diffs after a rebase. Although, over time and greater understanding you eventually learn to trust what rebase does, and git generally, more and more.

@ashawley ashawley merged commit c23ffd6 into scala:master May 25, 2017
@ashawley ashawley mentioned this pull request Oct 9, 2017
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants