-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for Error: must forward with targets before backward [#19] #21
Conversation
The fix works on my machine running with Ubuntu 2204 & intel 13i cpu, thanks. |
little update, the problem seems to be with -Ofast, using -O3 and -Ofast togheter throws the error #19, using only -Ofast also throws the error. |
…o-fast-math disable it
Can confirm this makes tests pass. |
Honestly weird this works. Did you just find a compiler bug in gcc?! |
i found out that -Ofast enables a lot of hard optimizations that can alter the behaviour of the program, in particularly it enables the flag -ffast-math, which can mess the floating point arithmetic. |
@ent0n29 does this make the code slower for you? |
I don't think it makes the code slower since -O3 and -Ofast are enabled and only -ffast-math is disabled, I don't know what the contribution of -ffast-math is in terms of speed, but since we are working on floats it is important to maintain machine precision, since this flag creates problems with floating point arithmetic. more tests here -> #19 (comment) |
I have collected some data by running As a side-note: both the two following instances, have been tested almost under the same general workload of the machine. Enabling
By disabling
Some stats to better evaluate the consistency these measurements:
|
yes, -Ofast with -ffast-math enabled seems to be the best option, but looks like only works for macos, |
What about enabling
|
…lso consistency with README
After some investigation, it seems the error is caused by the gcc "-fno-math-errno -funsafe-math-optimizations -ffinite-math-only" combined options introduced by the "-Ofast" / "-ffast-math", and it can be sovled by adding "-fno-finite-math-only" instead of "-fno-fast-math" for a better performance. Btw, if change cc to clang on my ubuntu 2204, it works well without modifying any other options, and archive the best running speed. |
@DongbinNie wow, it's amazing, on my ThinkPad using |
@DongbinNie i can confirm that perfs are 2x with -fno-finite-math-only instead of -fno-fast-math, thanks for the help! |
@karpathy have you tried compiling with |
Sorry @ent0n29 what is the final recommendation here right now? Is it
|
yes @karpathy, -fno-finite-math-only instead of -fno-fast-math for almost 2x improvements |
I went from ~17 seconds per step with -fno-fast-math, to this with -fno-finite-math-only & -march=native: step 0: train loss 5.356086 (took 9611.016869 ms)
step 1: train loss 4.300644 (took 8780.770364 ms)
step 2: train loss 4.623083 (took 7137.313333 ms)
step 3: train loss 4.599365 (took 6426.557283 ms)
step 4: train loss 4.616659 (took 6864.495874 ms)
step 5: train loss 4.231428 (took 6635.326672 ms)
step 6: train loss 3.753164 (took 6516.244371 ms)
step 7: train loss 3.650456 (took 6362.145571 ms)
step 8: train loss 4.182243 (took 6333.873539 ms)
step 9: train loss 4.199581 (took 6407.929416 ms) |
i close this to open a new one synced |
fixes #19