appropriate # Python and Pandas internationalisation
Matplotlib is a commonly used tool for basic data visualisation in Python, and is the default plotting tool with pandas.Dataframe.plot. It is also used by seaborn and wordcount, along with other libraries and tools.
The default backends for Matplotlib have a number of limitations:
- No support for the Unicode bidirectional algorithm,
- No support for complex font rendering
This places severe limits on what natural languages can be used in titles, lables, legends, and other text elements in plots.
The package mplcairo provides an alternative backend for matplotlib that uses Raqm and GNU FriBidi for bidirectional text layout and complex rendering of OpenType features. This allows the use of most languages to be supported in plots.
The key limitations for mplcairo are bugs in iPython and the lack of support for Jupyter notebooks.
Using the mplcairo backend for matplotlib we can display plot titles, axes labels and categorical tick labels in any language we need to support.
There are two missing pieces at this point:
- Display of numeric tick labels in a numeral system appropriate for the UI language.
- Choice on bidirectional layout req uirements of the appropriate data visualisation.
Regarding the first issue, it is possible to use matplotlib.ticker.FuncFormatter()
to apply a function to convert to the target numeral system, and apply necessary grouping and decimal separators.
It isn't always necessary to change the layout of the plot. If the plot is using a cartesian coordinate system, it is best to use the default layout. The layout used, combined with user expectations, will impact the interpretation of trends in data visualisations. User interpretation of the visualisations, combined with user experience are critical inputs into a data visualisation design.
If a RTL layout is required:
- Use
yaxis.tick_right()
andyaxis.set_label_position("right")
to reposition y-axis to the right side of the plot - Use
plt.gca().invert_xaxis()
to invert the x-axis. This step may not be necessary. UX is an important consideration.
The following python scripts uses Sorani Kurdish data:
Fig.1 - Kurdish bar charts in both LTR and RTL layouts. Fig.2 - Kurdish wordcloud.