Edit: Oops! This lib doesn't have arbitrary length vectors, only nalgebra has that. But they are not as responsive..
See this pattern for how to write a float reduce loop so that it can autovectorize:
Implementation in ndarray: unroll_sum
Same pattern for dot product: user forum post
It's not super pretty, and it hardcodes the number of lanes used, but it's still much faster for floats than the naive sum since llvm can not vectorize that. (rust-lang/rust/issues/21690 would be needed).