Replies: 7 comments 8 replies
-
Supporting big endian seems valuable, thanks for forking on this. As I commented on tensorflow/tensorflow#57060 I would mostly be wary of sprinkling endianness handling code everywhere. That is I would be looking to identify where does endianness matters and build abstractions around it. For example if this is only about how we access the data for constant buffers/array we should work on the class abstracting access to the data to make it portable. |
Beta Was this translation helpful? Give feedback.
-
@kun-lu20 as you already looked into some of the issues, would it be possible to provide a list of the kinds of issues you have seen? That would be a great first step to look into more general solutions for groups of issues and avoid many small one-off fixes. |
Beta Was this translation helpful? Give feedback.
-
Hi All, The following is a list of endianness related scenarios in TF/XLA code we've seen so far:
Any supplements or improvements of this list are highly appreciated. Hope this could be a good start for generic big endian support in TF/XLA code base. |
Beta Was this translation helpful? Give feedback.
-
Thanks @sherhut ! Currently we have a s390x CI running nightly builds on TF master branch (http://ibmz-ci.osuosl.org/job/TensorFlow_IBMZ_CI/). We've run the regression tests on the fix for XLA hlo evaluator and no regression was found. Reg the longer term plan, I think the SystemZ LLVM backend works on s390x, but it needs more work to add support for |
Beta Was this translation helpful? Give feedback.
-
@kun-lu20 Could you provide more information on how far is your CI from being able to pass all XLA tests? Looking at https://ibmz-ci.osuosl.org/job/TensorFlow_IBMZ_CI/, last two builds are green, but I presume not all test are passing? |
Beta Was this translation helpful? Give feedback.
-
Thanks @cheshire . Currently our s390x CI is not running the tests. We ran the test suite from TensorFlow code base locally and some XLA tests in Part of the failed tests could pass when Some test cases failed due to the endianness issue, such as the failure in tensorflow/tensorflow#57067. |
Beta Was this translation helpful? Give feedback.
-
Would it be possible to know how many? And how many PRs would you anticipate to fix them? |
Beta Was this translation helpful? Give feedback.
-
Several XLA related test cases in tensorflow repo would fail on s390x (Big-Endian architecture) due to endianness issue, such as
//tensorflow/compiler/xla/tests:bitcast_convert_test_cpu
.From my understanding, the rationale behind the issue is that some tensor buffers may contain endian specific data when performing model input/output or serialization/deserialization in TensorFlow/XLA code. If the endianness issue isn't taken into consideration, these codes may run correctly on LE platform but fail on BE platform.
For this test case,
Bitcast
operation could not be done in the same way (i.e., direct memory copy) on BE platform as on LE platform.PR tensorflow/tensorflow#57067 has been raised to fix this test case on BE machines. However, a holistic solution for adding big endian support in XLA code base could solve the similar issues once for all.
Is it possible to add a plan for solving the endianness issue in XLA code base? Any suggestions or ideas would be greatly appreciated. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions