-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Magic numbers are implementation-defined multibyte character literals #1518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The underlying representation of multibyte character literals is implementation-defined, and may cause data export/import issues between different builds of llama.cpp independent of endianness. Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>
You are sure that the VS compiler is using an inverse endian? I found that hard to believe in terms of compatibility. |
Correction: I checked cppreference.com, which in hindsight wasn't thoroughly enough. |
I might be wrong about MSVC after all. At least on godbolt.org I got the same integer value as with GCC and Clang. Makes me wonder what cppreference.com is referring to when they mention MSVC as "a notable exception". |
The underlying representation of multibyte character literals is implementation-defined. This could, at least in principle, cause cross-build data export/import issues independent of endianness. Define magic numbers as integer literals to be on the safe side. Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>
I've reworded #1520 to be more accurate about this being a hypothetical portability issue. |
The underlying representation of multibyte character literals is implementation-defined. This could, at least in principle, cause cross-build data export/import issues independent of endianness. Define magic numbers as integer literals to be on the safe side. Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>
The underlying representation of multibyte character literals is implementation-defined. This could, at least in principle, cause cross-build data export/import issues independent of endianness. Define magic numbers as integer literals to be on the safe side. Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>
The underlying representation of multibyte character literals is implementation-defined. This could, at least in principle, cause cross-build data export/import issues independent of endianness. Define magic numbers as integer literals to be on the safe side. Signed-off-by: Juuso Alasuutari <juuso.alasuutari@gmail.com>
Fixed by #1520. |
I don't know if MSVC is a supported compiler for this project, but even if it isn't I feel that this is an oversight. The way that the magic numbers are currently defined is implementation-defined, i.e. the compiler is free to decide their machine representation:
https://github.com/ggerganov/llama.cpp/blob/dc271c52ed65e7c8dfcbaaf84dabb1f788e4f3d0/llama.h#L23-L25
If you were to compile this with MSVC then the magic number written to disk would be in reverse byte order compared to how GCC and Clang do it. Notice that this is not an endianness issue. The cause is not a difference in architecture byte order but instead only applies to multibyte character literals.
A 100% endianness-dependent difference in the binary format would actually be easier to figure out. Just flip the endianness of every single input value and you're done. But if you were to export a weights file with an MSVC-compiled llama.cpp on an x86_64 machine, and then try to import it with a Clang-compiled program on that same machine, all the actual weights data would still be perfectly fine but the magic would be invalid. That might lead to the false conclusion that the endianness is wrong, and compensating for this during data import would fix the magic number and mangle all of the data instead.
Because most (all?) llama.cpp projects seem to use GCC or Clang I suggest defining the magic numbers as 32-bit unsigned integers that match the multibyte character literal representation of said compilers:
'ggml'
becomes0x67676d6cu
etc.The text was updated successfully, but these errors were encountered: