You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current matrix factorization router (MFModel) is unnecessarily complex. Given that all operations in the forward pass are linear with no activations, we can significantly simplify this model.
Currently, we're doing several steps:
Embedding model IDs
Normalizing embeddings
Projecting text embeddings
Element-wise multiplication
Linear classification
Since these are all linear operations, they can be collapsed into a single matrix multiplication, embedding * model. This would:
Reduce code complexity
Improve performance
Decrease the number of parameters
For an example of what this would look like, here's a flattened vector for the `mixtral-8x7b-instruct-v0.1` model:
Hi there, thank you for raising this! You are right that currently, all the operations can be collapsed into a single matrix multiplication. The reason the operations are broken up is that we were experimenting with nonlinear variants as well when training the matrix factorization router.
The current matrix factorization router (
MFModel
) is unnecessarily complex. Given that all operations in the forward pass are linear with no activations, we can significantly simplify this model.Currently, we're doing several steps:
Since these are all linear operations, they can be collapsed into a single matrix multiplication, embedding * model. This would:
For an example of what this would look like, here's a flattened vector for the `mixtral-8x7b-instruct-v0.1` model:
The text was updated successfully, but these errors were encountered: