Decide how to handle "probably" relevant comparisons

This comparison was categorized as probably relevant - https://perf.rust-lang.org/compare.html?start=7611fe438dae91084d17022e705bf64374d5ba4b&end=bcfd3f7e88084850f87b8e34b4dcb9fceb872d00&stat=instructions:u - but I think it is definitely so, due to the wide range of -doc benchmarks affected.