look into using Ryu instead of Errol3 for floating point printing

Bold claim:

> simpler and approximately three times faster than the previously fastest implementation.

paper: https://dl.acm.org/citation.cfm?id=3192369
C implementation: https://github.com/ulfjack/ryu

things to measure:
 * performance
 * code size
 * if it's possible to work without ever using floating point registers