Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The use of "char" to hold signed values can lead to unexpected behaviour #242

Open
rwk-unil opened this issue Nov 13, 2024 · 1 comment · May be fixed by #243
Open

The use of "char" to hold signed values can lead to unexpected behaviour #242

rwk-unil opened this issue Nov 13, 2024 · 1 comment · May be fixed by #243

Comments

@rwk-unil
Copy link
Contributor

The phasing_hmm class defines std::vector < char > VAR_TYP; it uses this to hold signed values for example :

#define VAR_PEAK_HET 0
#define VAR_PEAK_HOM -1
#define VAR_FLAT_HET -2

The problem is that the C/C++ standard does not define char to be signed or unsigned.
See : https://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default

This is also documented in the GCC manual here https://gcc.gnu.org/onlinedocs/gcc-4.2.2/gcc/C-Dialect-Options.html :

"Each kind of machine has a default for what char should be. It is either like unsigned char by default or like signed char by default.

Ideally, a portable program should always use signed char or unsigned char when it depends on the signedness of an object. But many programs have been written to use plain char and expect it to be signed, or expect it to be unsigned, depending on the machines they were written for. This option, and its inverse, let you make such a program work with the opposite default.

The type char is always a distinct type from each of signed char or unsigned char, even though its behavior is always just like one of those two."


For GLIMPSE what this means is that if it is compiled with a compiler or machine that defaults char to unsigned, multiple tests will fail, see the example below :

The value is used in void phasing_hmm::forward() and void phasing_hmm::backward() e.g., :

		if (VAR_TYP[curr_idx_locus] >= VAR_PEAK_HET) {
			if (curr_idx_locus == (VAR_TYP.size()-1)) INIT_PEAK_HET(VAR_TYP[curr_idx_locus]);
			else if (curr_segment_locus != (segments[curr_segment_index]-1)) RUN_PEAK_HET(VAR_TYP[curr_idx_locus]);
			else COLLAPSE_PEAK_HET(VAR_TYP[curr_idx_locus]);
		} else if (VAR_TYP[curr_idx_locus] == VAR_PEAK_HOM) {
			if (curr_idx_locus == (VAR_TYP.size()-1)) INIT_PEAK_HOM(VAR_ALT[curr_idx_locus]);
			else if (curr_segment_locus != (segments[curr_segment_index]-1)) RUN_PEAK_HOM(VAR_ALT[curr_idx_locus]);
			else COLLAPSE_PEAK_HOM(VAR_ALT[curr_idx_locus]);
		} else if (VAR_TYP[curr_idx_locus] == VAR_FLAT_HET) {
			if (curr_idx_locus == (VAR_TYP.size()-1)) INIT_FLAT_HET();
			else if (curr_segment_locus != (segments[curr_segment_index]-1)) RUN_FLAT_HET();
			else COLLAPSE_FLAT_HET();
		} else vrb.error("Unknown variant type in phasing backward pass at variant " + stb.str(curr_idx_locus));

If char is unsigned the test if (VAR_TYP[curr_idx_locus] >= VAR_PEAK_HET) will always succeed as VAR_PEAK_HET is defined as 0. This is not the desired behaviour.

rwk-unil added a commit to rwk-unil/GLIMPSE that referenced this issue Nov 13, 2024
Because the C/C++ standard does not specify the signedness
of the char type the results will be compiler/machine dependent
this can lead to unexpected behaviour.

Use signed char instead of char to avoid these cases.

References :
https://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default
https://gcc.gnu.org/onlinedocs/gcc-4.2.2/gcc/C-Dialect-Options.html
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

Fixes : odelaneau#242

Signed-off-by: Rick Wertenbroek <rick.wertenbroek@unil.ch>
@rwk-unil rwk-unil linked a pull request Nov 13, 2024 that will close this issue
@rwk-unil
Copy link
Contributor Author

To test this you can run this very simple code snippet :

#include <iostream>

int main() {
    char c = -1;

    if (c >= 0)
        std::cout << "unsigned\n";
    else
        std::cout << "signed\n";

    return 0;
}
  • This returns "signed" when compiled with clang 14.0.0 on x86_64
  • This returns "unsigned" when compiled with gcc 13.2.0 on ARM64

This makes GLIMPSE_phase broken on ARM64 platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant