You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* init swissai model
* AutoModelForCausalLM
* AutoModelForCausalLM mapping
* qk norm and post ln optional
* fix wrong shape of qk norm: megatron uses head_dim
* automodel fixes
* minor fix in forward
* fix rope validation to accept llama3 scaling
* `SwissAIForTokenClassification` support
* Align `SwissAI` to v4.52.4
* Align `SwissAI` to v4.53.1
* Init CUDA xIELU
* `SwissAI*`->`Apertus*`
* ci fix
* check_docstring ignore ApertusConfig
* Licensing and placeholder tests
* Placeholder doc
* XIELU syntax
* `_xielu_python` optimization
* Fix xIELU
* [tmp] `{beta,eps}` persistent=False
until {beta,eps} saved in checkpoint
* Modular `Apertus`
* CUDA xIELU logging
* ci fix
* ci fix
* ci fix
* Update license
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Update tests/models/apertus/test_modeling_apertus.py
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* `.utils.import_utils.is_torchdynamo_compiling`
* `Apertus` class ordering
* `past_key_value{->s}`, `make fix-copies`
* ci fix
* Remove unused configuration parameters
* `{beta,eps}` saved in checkpoint
* `{beta,eps}` Temporarily on CPU
* Suggestions
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* ci fix
* remove fx_compatible (deprecated)
* remove `rotary_embedding_layer`
As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.
* fully removing `Mask4DTestHard` class
Not needed (for now)
* switch to `dtype` instead of `torch_dtype`
Following this:
#39782
* remove unused imports
* remove `cache_implementation="static"`
* +Apertus to `docs/source/en/_toctree.yml` for the doc builder
---------
Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com>
Co-authored-by: dhia680 <garbayad@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>
0 commit comments