Enable optimisation level -O3 for SAM QUAL+33 formatting. #1679
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On long read data, the time to format SAM files is dominated by sequence and quality.
The qual[i]+33 loop to turn binary quals into printable ASCII is not vectorised by GCC without using -O3. I would consider this a weakness of the compiler, but nothing I've done has persuaded gcc (before v12) to generate vector instructions. Not even the "restrict" keyword.
Hence using attribute((optimize("O3"))).
The time the new add33 function is approx 15x quicker with gcc -O3 than gcc -O2. Clang's and icc's default optimisation level gives speeds comparable to the gcc -O3.
With a compressed Illumina BAM this was just 3% overall speed gain to decode to SAM. The extreme opposite is uncompressed ONT BAM which shows a 23% speed gain.