Add EnCodec model #23655

hollance · 2023-05-22T13:16:23Z

What does this PR do?

Adds the EnCodec neural codec from the High Fidelity Neural Audio Compression paper.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

src/transformers/models/encodec/modeling_encodec.py

…to encodec

tests/models/encodec/test_modeling_encodec.py

…d from doc yet)

…to encodec

amyeroberts

Looking good! Thanks for iterating so quickly on this :)

There's just one main behaviour regarding the handling of padding and truncation flags, and returned outputs which I think needs to be addressed before merging, otherwise just a few nits and is good to go.

tests/models/encodec/test_modeling_encodec.py

amyeroberts · 2023-06-14T11:10:37Z

src/transformers/models/encodec/convert_encodec_checkpoint_to_pytorch.py

+
+    original_checkpoint = torch.load(checkpoint_path)
+    recursively_load_weights(original_checkpoint, model, model_name)
+    model.save_pretrained(pytorch_dump_folder_path)


It would be good to check that the outputs of the original and converted model are the same (to some tolerance) before pushing to the hub here

No, people can have their own fine-tuned/modified checkpoints and we don't import the model from the original library so the outputs are not fixed.

amyeroberts · 2023-06-14T11:31:40Z

src/transformers/models/encodec/feature_extraction_encodec.py

+                return_attention_mask=True,
+            )
+            if padding:
+                padded_inputs["padding_mask"] = padded_inputs.pop("attention_mask")


OK, when you say 'we ignore attention_mask', what does 'ignore' mean?

I don't believe this is what is being returned now - there's a padding_mask when truncation=True

amyeroberts · 2023-06-14T11:38:06Z

tests/models/encodec/test_feature_extraction_encodec.py

+        self.assertTrue(torch.allclose(input_values[0, 0, :30], EXPECTED_INPUT_VALUES, atol=1e-6))
+        self.assertTrue(torch.allclose(input_values[0, 1, :30], EXPECTED_INPUT_VALUES * 0.5, atol=1e-6))
+
+    def test_kwargs(self):


This hasn't been resolved. In reality, we shouldn't allow both of these arguments to be passed - and we should add a test that checks an exception is raised if that happens

amyeroberts · 2023-06-14T11:51:21Z

src/transformers/models/encodec/modeling_encodec.py

+        input_values (`torch.FloatTensor` of shape `(batch_size, channels, sequence_length)`, *optional*):
+            Raw audio input converted to Float and padded to the approriate length in order to be encoded using chunks
+            of length self.chunk_length and a stride of `config.chunk_stride`.
+        padding_mask (`torch.BoolTensor` of shape `(batch_size, sequence_length)`, *optional*):


In this case, is the shape of input_values correct?

AFAICT, if padding_mask is None, then it's created as:

padding_mask = torch.ones_like(input_values).bool()

which would imply it is the same dimensions as input_values. This also doesn't match the shape in the encode, and decode docstrings.

src/transformers/models/encodec/feature_extraction_encodec.py

src/transformers/models/encodec/modeling_encodec.py

src/transformers/models/encodec/feature_extraction_encodec.py

src/transformers/models/encodec/modeling_encodec.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

…to encodec

ArthurZucker · 2023-06-14T14:05:47Z

Ok, addressed everything! feel free to merge if it's good with you @amyeroberts

amyeroberts

Beautiful - thanks again for adding this model and iterating!

src/transformers/models/encodec/feature_extraction_encodec.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* boilerplate stuff * messing around with the feature extractor * fix feature extractor * unit tests for feature extractor * rename speech to audio * quick-and-dirty import of Meta's code * import weights (sort of) * cleaning up * more cleaning up * move encoder/decoder args into config * cleanup model * rename EnCodec -> Encodec * RVQ parameters in config * add slow test * add lstm init and test_init * Add save & load * finish EncodecModel * remove decoder_input_values as they are ont used anywhere (not removed from doc yet) * fix test feature extraction model name * Add better slow test * Fix tests * some fixup and cleaning * Improve further * cleaning up quantizer * fix up conversion script * test don't pass, _encode_fram does not work * update tests with output per encode and decode * more cleanup * rename _codebook * remove old config cruft * ratios & hop_length * use ModuleList instead of Sequential * clean up resnet block * update types * update tests * fixup * quick cleanup * fix padding * more styl,ing * add patrick feedback * fix copies * fixup * fix lstm * fix shape issues * fixup * rename conv layers * fixup * fix decoding * small conv refactoring * remove norm_params * simplify conv layers * rename conv layers * stuff * Clean up * Add padding logic use padding mask small conv refactoring remove norm_params simplify conv layers rename conv layers stuff add batched test update Clean up merge and update for padding fix padding fixup * clean up more * clean up more * More clean ups * cleanup convolutions * typo * fix typos * fixup * build PR doc? * start refactoring docstring * fix don't pad when no strid and chunk * update docstring * update docstring * nits * update going to lunch * update config and model * fix broken testse (becaue of the config changes) * fix scale computation * fixu[ * only return dict if speciefied or if config returns it * remove todos * update defaults in config * update conversion script * fix doctest * more docstring + fixup * nits on batched_tests * more nits * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update basxed on review * fix update * updaet tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fixup * add overlap and chunl_length_s * cleanup feature extraction * teste edge cases truncation and padding * correct processor values * update config encodec, nits * fix tests * fixup * fix 24Hz test * elle tests are green * fix fixup * Apply suggestions from code review * revert readme changes * fixup * add example * use facebook checkpoints * fix typo * no pipeline tests * use slef.pad everywhere we can * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * update based on review * update * update mdx * fix bug and tests * fixup * fix doctest * remove comment * more nits * add more coverage for `test_truncation_and_padding` * fixup * add last test * fix text * nits * Update tests/models/encodec/test_modeling_encodec.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * take care of the last comments * typo * fix test * nits * fixup * Update src/transformers/models/encodec/feature_extraction_encodec.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: arthur.zucker@gmail.com <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fxmarty · 2024-03-26T17:43:52Z

src/transformers/models/encodec/modeling_encodec.py

+        if self.config.normalize:
+            # if the padding is non zero
+            input_values = input_values * padding_mask


I am wondering why padding_mask is not used in case normalize=False?

hollance added the New model label May 22, 2023

hollance force-pushed the encodec branch from baff91d to 9c831ee Compare May 31, 2023 15:00

patrickvonplaten reviewed Jun 8, 2023

View reviewed changes

src/transformers/models/encodec/modeling_encodec.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jun 8, 2023

View reviewed changes

src/transformers/models/encodec/modeling_encodec.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jun 8, 2023

View reviewed changes

src/transformers/models/encodec/modeling_encodec.py Outdated Show resolved Hide resolved

hollance force-pushed the encodec branch from 8ec2522 to 5d27e87 Compare June 8, 2023 09:53

hollance added 12 commits June 8, 2023 11:54

boilerplate stuff

e18fd4d

messing around with the feature extractor

1c98495

fix feature extractor

4b88774

unit tests for feature extractor

d075dfe

rename speech to audio

1d6a752

quick-and-dirty import of Meta's code

56d29e6

import weights (sort of)

7229a84

cleaning up

c16b225

more cleaning up

b044acc

move encoder/decoder args into config

66978bd

cleanup model

3e1dea4

rename EnCodec -> Encodec

32d54d5

hollance force-pushed the encodec branch from 5d27e87 to 32d54d5 Compare June 8, 2023 09:54

hollance and others added 3 commits June 8, 2023 12:03

RVQ parameters in config

d919579

add slow test

027ee65

Merge branch 'encodec' of https://github.com/hollance/transformers in…

6d8319c

…to encodec

patrickvonplaten reviewed Jun 8, 2023

View reviewed changes

tests/models/encodec/test_modeling_encodec.py Outdated Show resolved Hide resolved

ArthurZucker and others added 7 commits June 8, 2023 10:21

add lstm init and test_init

b12741d

Add save & load

9fe5d98

finish EncodecModel

d169637

remove decoder_input_values as they are ont used anywhere (not remove…

b744397

…d from doc yet)

fix test feature extraction model name

548a5eb

Merge branch 'encodec' of https://github.com/hollance/transformers in…

ab24b2b

…to encodec

Add better slow test

ebe61e3

ArthurZucker added 13 commits June 13, 2023 21:45

update based on review

a1b3723

update

eb3427f

update mdx

c833e52

fix bug and tests

01da412

fixup

31a7e7e

fix doctest

5073411

remove comment

768d149

more nits

e57aae1

add more coverage for test_truncation_and_padding

4e69fee

fixup

5ffa9af

add last test

b25ce23

fix text

84f8c5c

nits

c0162b4

ArthurZucker requested a review from amyeroberts June 14, 2023 09:37

amyeroberts reviewed Jun 14, 2023

View reviewed changes

ArthurZucker and others added 7 commits June 14, 2023 15:29

Update tests/models/encodec/test_modeling_encodec.py

14d5e3e

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

take care of the last comments

edff1a6

typo

f5332dc

fix test

7d9fcc0

Merge branch 'encodec' of https://github.com/hollance/transformers in…

9d00eb2

…to encodec

nits

54f9d4c

fixup

e6b8333

ArthurZucker requested a review from amyeroberts June 14, 2023 14:23

amyeroberts approved these changes Jun 14, 2023

View reviewed changes

src/transformers/models/encodec/feature_extraction_encodec.py Outdated Show resolved Hide resolved

Update src/transformers/models/encodec/feature_extraction_encodec.py

b187648

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ArthurZucker merged commit 0c3fdcc into huggingface:main Jun 14, 2023

sgugger changed the title ~~[WIP] add EnCodec model~~ Add EnCodec model Jun 14, 2023

fxmarty reviewed Mar 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EnCodec model #23655

Add EnCodec model #23655

hollance commented May 22, 2023

amyeroberts left a comment

amyeroberts Jun 14, 2023

ArthurZucker Jun 14, 2023

amyeroberts Jun 14, 2023

amyeroberts Jun 14, 2023

amyeroberts Jun 14, 2023

ArthurZucker commented Jun 14, 2023

amyeroberts left a comment

fxmarty Mar 26, 2024

Add EnCodec model #23655

Add EnCodec model #23655

Conversation

hollance commented May 22, 2023

What does this PR do?

Before submitting

Who can review?

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jun 14, 2023

Choose a reason for hiding this comment

ArthurZucker Jun 14, 2023

Choose a reason for hiding this comment

amyeroberts Jun 14, 2023

Choose a reason for hiding this comment

amyeroberts Jun 14, 2023

Choose a reason for hiding this comment

amyeroberts Jun 14, 2023

Choose a reason for hiding this comment

ArthurZucker commented Jun 14, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

fxmarty Mar 26, 2024

Choose a reason for hiding this comment