Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Initial reference code commit, unchanged Signed-off-by: Guy Jacob <guyj@nvidia.com> * Hyena code changes for NeMO compatibility Signed-off-by: Guy Jacob <guyj@nvidia.com> * MCore spec override functionality + example config w. hyena Signed-off-by: Guy Jacob <guyj@nvidia.com> * Additional changes - now working on char-level TinyShakespeare * Add missing input LayerNorm to spec (in the default attention spec it's fused with the projection Linear layer, so not explicitly defined) * Shape conversion at start and end of Hyena forward Signed-off-by: Guy Jacob <guyj@nvidia.com> * Add fftconv cuda impl from safari Signed-off-by: Guy Jacob <guyj@nvidia.com> * Workaround for shape error in fftconv See: HazyResearch/safari#26 (comment) Signed-off-by: Guy Jacob <guyj@nvidia.com> * Explicitly convert kernel to FP32 (torch.fft doesn't support bf16) Signed-off-by: Guy Jacob <guyj@nvidia.com> * Working run configs Signed-off-by: Guy Jacob <guyj@nvidia.com> * Remove sharded_state_dict from HyenaOperator (made redundant by the default inmplementation in Megatron) Signed-off-by: Guy Jacob <guyj@nvidia.com> * Update configs Signed-off-by: Guy Jacob <guyj@nvidia.com> * Testing TE Linear classes in HyenaOperator Signed-off-by: Guy Jacob <guyj@nvidia.com> * Revert to FusedDense for in/out projections after merging with 24.01.01 Signed-off-by: Guy Jacob <guyj@nvidia.com> * Fix bug (use fused LNorm+Linear), bring back TE layers Signed-off-by: Guy Jacob <guyj@nvidia.com> * Configs rename + cleanup Signed-off-by: Guy Jacob <guyj@nvidia.com> * FlashFFTConv, Multi-head, some cleanup Signed-off-by: Guy Jacob <guyj@nvidia.com> * Bug fix - init FlashFFTConv with 2*seq_len Signed-off-by: Guy Jacob <guyj@nvidia.com> * ModuleSpec + replace nn.Conv1d with causal_conv1d Signed-off-by: Guy Jacob <guyj@nvidia.com> * Remove unneeded arguments Signed-off-by: Guy Jacob <guyj@nvidia.com> * More cleanup, remove fftconv ref functions Signed-off-by: Guy Jacob <guyj@nvidia.com> * Refactor HyenaFilter + more cleanup * Refactor in spirit of implementation in MAD-Lab repo: https://github.com/athms/mad-lab/blob/main/mad/model/layers/hyena.py Signed-off-by: Guy Jacob <guyj@nvidia.com> * Add missing attributions Signed-off-by: Guy Jacob <guyj@nvidia.com> * Remove fftconv sources Signed-off-by: Guy Jacob <guyj@nvidia.com> * Bug fixes Signed-off-by: Guy Jacob <guyj@nvidia.com> * Remove d_model from external API, take from TransformerConfig Signed-off-by: Guy Jacob <guyj@nvidia.com> * cleanup config Signed-off-by: Guy Jacob <guyj@nvidia.com> * Remove spec override logic (possibly push separately) Signed-off-by: Guy Jacob <guyj@nvidia.com> * Add tests Signed-off-by: Guy Jacob <guyj@nvidia.com> * Keep only megatron_gpt_config_hyena (w. 153m parameters) Signed-off-by: Guy Jacob <guyj@nvidia.com> * Black + isort formatting changes Signed-off-by: Guy Jacob <guyj@nvidia.com> * Fixes following PR review * Clearer names + more documentation for config params * Clearer README * Check seq len < 8K with safari-fftconv * Avoid 0*bias op during forward Signed-off-by: Guy Jacob <guyj@nvidia.com> * Fix tests following param name changes Signed-off-by: Guy Jacob <guyj@nvidia.com> --------- Signed-off-by: Guy Jacob <guyj@nvidia.com>
- Loading branch information