The binary operations implemented in 2656 are not optimized yet. They are setup to be vectorized/parallelized though.
2656 has other optimization suggestions too to reduce the JITed code size. All of those should be a part of this PR.
How do I add myself as the Assignee here?