I'm not sure how the optimization results are currently checked? It looks like it's by looking at the diff? But I created this issue because I realized that just checking the LLVM IR may wasn't enough. Despite providing an IR with more facts, the code gen may submit poor code. See https://github.com/llvm/llvm-project/issues/78578.