Skip to content

Conversation

jvdp1
Copy link
Collaborator

@jvdp1 jvdp1 commented Jun 14, 2024

As discussed, here is a draft in which I suggst to moved the optimizer from the network level to the layer level.

This is just a draft with an implementation for the dense layer only.

Here are the wall clock times using my dataset (with 2 hidden dense layers):

v0.17.0

  • Forward + backward: 4.79s
  • Update: 4.59s

Current PR

  • Forward + backward: 4.81s
  • Update: 1.40s

@OneAdder
Copy link
Collaborator

OneAdder commented Mar 5, 2025

@jvdp1 That's actually a great idea. Apart from obvious performance gains, it can simplify code for combined layers. I will arrange everything in similar fashion in my project here: https://github.com/OneAdder/llm.f
Then we can backport it here along with implementation for all other layers

@jvdp1
Copy link
Collaborator Author

jvdp1 commented Aug 23, 2025

Closed as proposed changes implemented in #222

@jvdp1 jvdp1 closed this Aug 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants