Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For the time being, refuse to build on big-endian architectures #181

Merged
merged 2 commits into from
May 23, 2024

Conversation

robomics
Copy link
Contributor

hictk itself can be built on both little and big-endian architectures. However, the hic library will be quite limited on big-endian archs. These are the current limitations:

  • Creating .hic files will work, but the files will only be readable by hictk and only on other big-endian machines
  • Reading .hic files will not work, unless files were created by hictk running on big-endian machines (see above)

The above limitations are due to the use of reinterpret_cast<> and raw std::fstream methods to serialize binary data to disk. The .hic file specification does not mandate the endianness that should be used to read and write .hic files, however virtually all .hic files are generated on little-endian machines, so the byte order of the serialized data is also little-endian, making little-endian the de-facto byte order standard.

It is my understanding that machines with big-endian architecture are not that common in desktop/HPC environments used for bioinformatics, so supporting big-endian architectures is likely not a pressing matter.

Supporting big-endian architecture can be achieved by manually converting binary types stored in BinaryBuffers to little-endian before casting them to char* for serialization.
This can be achieved with the help of libraries such as Boost.Endian.

As this conversion is only needed on big-endian machines, extra care should be taken to ensure that the implementation of this feature does not add build nor runtime costs on little-endian machines (i.e. hictk should link to boost only on big-endian platforms and the code performing endianness conversion should be wrapped in if constexpr and/or disabled with std::enable_if as appropriate).

hictk itself can be built on both little and big-endian architectures.
However, the hic library will be quite limited on big-endian archs.
These are the current limitations:
- Creating .hic files will work, but the files will only be readable by
  hictk and only on other big-endian machines
- Reading .hic files will not work, unless files were created by hictk
  running on big-endian machines (see above)

The above limitations are due to the use of reinterpret_cast<> and raw
std::fstream methods to serialize binary data to disk.
The .hic file specification does not mandate the endianness that should
be used to read and write .hic files, however virtually all .hic files
are generated on little-endian machines, so the byte order of the
serialized data is also little-endian, making little-endian the de-facto
byte order standard.

It is my understanding that machines with big-endian architecture are
not that common in desktop/HPC environments used for bioinformatics, so
supporting big-endian architectures is likely not a pressing matter.

Supporting big-endian architecture can be achieved by manually
converting binary types stored in BinaryBuffers to little-endian before
casting them to char* for serialization.
This can be achieved with the help of libraries such as Boost.Endian.
As this conversion is only needed on big-endian machines, extra care
should be taken to ensure that the implementation of this feature does
not add build nor runtime costs on little-endian machines (i.e. hictk
should link to boost only on big-endian platforms and the code
performing endianness conversion should be wrapped in if constexpr
and/or disabled with std::enable_if as appropriate).
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.07%. Comparing base (9f426e7) to head (4dbdb21).
Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
+ Coverage   84.91%   85.07%   +0.16%     
==========================================
  Files         119      120       +1     
  Lines       10076    10225     +149     
==========================================
+ Hits         8556     8699     +143     
- Misses       1520     1526       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robomics robomics merged commit 4fdd922 into main May 23, 2024
51 checks passed
@robomics robomics deleted the fix/big-endian branch May 23, 2024 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants