-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build : enable link-time optimizations #3859
Conversation
@wro52 Could you test this branch and see if it fixes the performance in your environment? |
For me with gcc 12.3.0 under Linux it doesn't seem to change anything either way, but also didn't #3833. But it does increase a full build time (with make) by ~20% and adds a lot of warnings from gcc.
build: 1206b5f (1446) build time: 27.87 secs
build: 6e08281 (1445) build time: 23.15 secs
build: 82a6646 (1440) build time: 22.41 secs |
mostly duplicate symbols, right? I think I saw those when I activated LTO locally. Having the same, non-private, symbols or the same symbol names across multiple TU is bad practice anyway, so those warnings are correct. |
No, it is just this warning repeated many times:
|
I also get these warnings - not sure how to fix. Any alternatives?
build: 6e08281 (1445) With LTO and also before #3833 I get:
build: a6aba2c (1448) Almost 2x slowdown. Also, on diff --git a/ggml-quants.c b/ggml-quants.c
index fd4ee1be..5a5ed16f 100644
--- a/ggml-quants.c
+++ b/ggml-quants.c
@@ -6,6 +6,9 @@
#include <assert.h>
#include <float.h>
+#define ggml_fp16_to_fp32
+#define ggml_fp32_to_fp16
+
#ifdef __ARM_NEON
// if YCM cannot find <arm_neon.h>, make a symbolic link to it, for example: But this works only for ARM_NEON where there is native F16 <-> F32 cast |
The warnings disappear with diff --git a/Makefile b/Makefile
index 2a2ac850..348143e0 100644
--- a/Makefile
+++ b/Makefile
@@ -124,7 +124,7 @@ MK_CFLAGS += -Ofast -flto
MK_HOST_CXXFLAGS += -Ofast
MK_CUDA_CXXFLAGS += -O3
else
-MK_CFLAGS += -O3 -flto
+MK_CFLAGS += -O3 -flto=auto
MK_CXXFLAGS += -O3
endif |
We could leave LTO OFF by default, like before, but set it ON in the ci. (not a real solution though) |
I guess we could move the fp16 conversion functions to an internal header |
The inlining issue also used to be a problem with early versions of Lines 1618 to 1623 in 6e08281
|
we might get away with something like since ggml uses a lot of loops, |
This can work. Implemented here: #3861 |
Tried every possible speed setting - no significant influence |
Merged #3861 instead |
ref #3858
Try to restore the performance to what it was before the refactoring #3833
Seems like the
ggml_fp16_to_fp32
andggml_fp32_to_fp16
calls slow down the processing significantly. At least with ARM_NEON. Haven't confirmed for x86 architectures