Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove locale-dependence in Agg text rendering #310

Merged
merged 6 commits into from
Jul 12, 2018
Merged

Remove locale-dependence in Agg text rendering #310

merged 6 commits into from
Jul 12, 2018

Conversation

rkern
Copy link
Member

@rkern rkern commented Jul 6, 2018

To handle rendering Unicode text strings in the agg backend, we explicitly encode as UTF-8 byte strings in the Python layer before passing the encoded char* to the C++ API. On Unix platforms currently, it uses locale-dependent functions to decode the bytes back into codepoint integers to pass to the font engine. Under a non-UTF-8 locale, we render mojibake.

This PR vendorizes a liberally-licensed header-only library for explicitly decoding the UTF-8 bytes in the C++ layer.

@codecov-io
Copy link

codecov-io commented Jul 6, 2018

Codecov Report

Merging #310 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #310   +/-   ##
=======================================
  Coverage   33.66%   33.66%           
=======================================
  Files         206      206           
  Lines       18299    18299           
  Branches     2411     2411           
=======================================
  Hits         6161     6161           
  Misses      11747    11747           
  Partials      391      391

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f470504...d3807ff. Read the comment docs.

Copy link
Member

@jwiggins jwiggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the word mojibake. Not so much what it represents, just the way it rolls of the tongue...

Anyhow, this is an improvement overall. I'm curious if it can be done without any extra allocation (which was present in the previous version, I would argue unnecessarily).

Also, I'm a little concerned that this code throws exceptions. What do you think about catching them here?

#endif
// #if defined(_WIN32) || defined(__WIN32__) || defined(__CYGWIN__)
// #include <windows.h>
// #endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this an experiment? It should probably be removed if things are working

// font API.
std::vector<utf8::uint32_t> codepoints;
std::vector<utf8::uint32_t>::iterator p;
std::string utf8text(text);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about this extra string copy...

std::vector<utf8::uint32_t> codepoints;
std::vector<utf8::uint32_t>::iterator p;
std::string utf8text(text);
utf8::utf8to32(utf8text.begin(), utf8text.end(), std::back_inserter(codepoints));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can just construct a utf8::iterator and avoid having to copy into codepoints. Additionally, if you get the length of text, I think you can construct the iterator like so: utf8::iterator<char*> p(text, text, text+length);. If that works, it would be nice to avoid the allocations.

@rkern
Copy link
Member Author

rkern commented Jul 10, 2018

I'm not too concerned about the exceptions since we are guaranteed that the input is properly-terminated UTF-8 encoded text.

@jwiggins
Copy link
Member

Sounds good to me. Thanks for the update!

@rkern rkern merged commit f546b7b into master Jul 12, 2018
@rkern rkern deleted the fix/utf-8 branch July 12, 2018 16:10
@rkern rkern mentioned this pull request Jul 12, 2018
jwiggins pushed a commit that referenced this pull request Jul 12, 2018
Remove locale-dependence in Agg text rendering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants