Adding FlaxNoRepeatNGramLogitsProcessor #29677

giganttheo · 2024-03-15T14:03:45Z

What does this PR do?

Adding the no repeat n-gram logits processor to Flax, compatible with jitting.
I also added the test test_no_repeat_ngram_dist_processor, adapted from the torch one, and added the FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted.

All the tests are passing, as RUN_SLOW=1 pytest -sv tests/generation/test_flax_logits_process.py prints:

======================================= test session starts ========================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.4.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/transformers
configfile: pyproject.toml
plugins: anyio-3.7.1
collected 9 items                                                                                  

tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_forced_bos_token_logits_processor PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_forced_eos_token_logits_processor PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_min_length_dist_processor PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_no_repeat_ngram_dist_processor PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_processor_list PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_processor_list_jitted PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_temperature_dist_warper PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_top_k_dist_warper PASSED
tests/generation/test_flax_logits_process.py::LogitsProcessorTest::test_top_p_dist_warper PASSED

========================================= warnings summary =========================================
../../usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py:1373
  /usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: doctest_glob
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================== 9 passed, 1 warning in 14.73s ===================================

Note: in order to work properly within beam search, this processor needs the fix proposed in PR #29636 for the bug discussed in #29635

Let me know if you have any comments or questions regarding this feature.

Who can review?

@gante
@sanchit-gandhi

…est_processor_list_jitted tests

…pdate description of get_previous_ngrams

…ed with jittable version

gante

Thank you for opening the PR 🔥 In general looks good to me, I've added a few questions before approving.

And thank you for keeping the exact same tests as in our PT counterpart, it makes maintenance much simpler 🙌

src/transformers/generation/flax_logits_process.py

src/transformers/generation/flax_utils.py

gante · 2024-03-18T12:10:49Z

src/transformers/generation/flax_logits_process.py

+        )
+
+        data = jnp.ones((all_update_indices.shape[0],), dtype=jnp.uint16)
+        data = data * (jnp.arange(data.shape[0]) < batch_size * (cur_len - (self.ngram_size - 1)))  # ignore the n-grams not yet generated


Perhaps we could slice input_ids before creating all_update_indices, i.e. input_ids = input_ids[:, :cur_len], and save some time/memory when creating all_update_indices.

Or is the result slower, because cur_length changes each iteration?

from my experience with jax, in order for the code to work with jit, you cannot use arrays of shapes that are not fixed. This is why i opted to pad the indices to have a known size. There may be more efficient ways to do it but I found that this is working, as opposed to [] slices, or even dynamic slices, because cur_length changes.

I see :) I made the suggestion based on my past XLA+TF experience -- slicing input_ids = input_ids[:, :cur_len] is allowed there (example)

But then again, JAX is usually stricter. Let's keep as you suggested 🤗

i think it also works in jax in some situations, but in this case the error when using jit is kind of explicit

IndexError: Array slice indices must have static start/stop/step to be used with NumPy indexing syntax. Found slice(None, Traced<ShapedArray(int32[], weak_type=True)>with<DynamicJaxprTrace(level=1/0)>, None). To index a statically sized array at a dynamic position, try lax.dynamic_slice/dynamic_update_slice (JAX does not support dynamically sized arrays within JIT compiled functions).

anyway, i took some time to benchmark different ways of doing this kind of operation, and found that in this instance, using jax.lax.fori_loop to update the all_update_indices is significantly faster than using dynamic slices updates (with this + removed useless operations, i measured >10x speedup for the jitted function). there might still be room for improvement

src/transformers/generation/flax_logits_process.py

gante · 2024-03-18T12:31:51Z

The red CI can be fixed by running make fixup on your terminal (in the transformers folder), then committing the changes

This reverts commit 09b70d7.

gante

Thank you for iterating 💛

…update indices using jax.lax.fori_loop

amyeroberts

Thanks for adding!

I just one request to update a test. I realise this is inherited, but it really should be addressed as the test is v. confusing

amyeroberts · 2024-03-21T11:38:45Z

src/transformers/generation/flax_logits_process.py

+            return val.at[i].set(
+                jnp.array(
+                    [
+                        b,
+                    ]
+                    + [jnp.array(input_ids)[b, pos + j] for j in range(self.ngram_size)]
+                )
+            )


nit - can all be one line

Suggested change

return val.at[i].set(

jnp.array(

[

b,

]

+ [jnp.array(input_ids)[b, pos + j] for j in range(self.ngram_size)]

)

)

return val.at[i].set(jnp.array([b] + [jnp.array(input_ids)[b, pos + j] for j in range(self.ngram_size)]))

i think this was formatted like this by the black / ruff formatter. But if that passes the tests, I agree that it is clearer this way

amyeroberts · 2024-03-27T14:31:48Z

src/transformers/generation/flax_logits_process.py

+        shape = (batch_size * (seq_len - (self.ngram_size - 1)), self.ngram_size + 1)
+        all_update_indices = jax.lax.fori_loop(
+            0, batch_size * (cur_len - (self.ngram_size - 1)), body_fun, jnp.zeros(shape, dtype=input_ids.dtype)
+        )
+
+        # ignore the n-grams not yet generated
+        data = (
+            jnp.arange(batch_size * (seq_len - (self.ngram_size - 1))) < batch_size * (cur_len - (self.ngram_size - 1))
+        ).astype("float32")
+
+        return sparse.BCOO((data, all_update_indices), shape=(batch_size,) + (vocab_size,) * self.ngram_size)


Rather than calculate (seq_len - (self.ngram_size - 1) and (cur_len - (self.ngram_size - 1) several times, it'll be easier to read and follow this code if they're set to variables and then used

amyeroberts · 2024-03-27T14:40:33Z

tests/generation/test_flax_logits_process.py

+        # 2-gram would forbid 2nd and 3rd token (1,2) at 1st batch and 1st token (0) at 2nd batch
+        self.assertListEqual(jnp.isinf(filtered_scores_2_gram).tolist(), [[False, True, True], [True, False, False]])
+
+        # 3-gram would forbid no token at 1st batch and 1st token (0) at 2nd batch
+        self.assertListEqual(jnp.isinf(filtered_scores_3_gram).tolist(), [[False, False, False], [True, False, False]])


I realise these are copied from other parts of the library, but the structure here is really quite confusing.

I'm assuming that:

By batch we mean sample in a minibatch i.e. "1st batch" is [1, 1, 2, 1]

By tokens we mean token ids

The output values e.g. [False, True, True] are across the vocab

When the token ids are referred to e.g. "2nd and 3rd token at 1st batch" what we're referring to are the token ids in the vocabulary [0, 1, 2] and NOT the 2nd, 3rd token ids in [1, 1, 2, 1]. The comment makes this really confusing by 1) having the same positional values for the sample in the batch as in the vocab 2) saying "at 1st batch". I'd strongly recommend rewriting this to remove this ambiguity.

the tests were copied with minimal modifications from the pytorch ones so i didn't change these comments. I agree that the descriptions could be clearer.

I think more comprehensive tests could also be a good idea. For example, i didn't see at first that my first code iteration didn't work in the case where a n-gram appears > 1 time, which is not a case that is tested, so my code passed the tests.

Good point. As these are so close to the PT & TF tests, lets leave as-is for now. If you have a test in mind and are willing to open a follow-up PR to add, I'd be very happy to review :)

amyeroberts

Thanks for adding this!

Thinking back on it - let's not block waiting for the test reworks, this can be done in follow-ups.

Just needs a make fixup run to resolve the quality checks and we should be good to merge!

HuggingFaceDocBuilderDev · 2024-04-02T09:26:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-04-02T09:39:28Z

Merging as there is a green light and green CI! 🥳

* fix issue with logit processor in beam search in Flax * adding FlaxNoRepeatNGramLogitsProcessor class + unit test * style correction and code verification * add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests * fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams * replace non-jit compatible masking of ngrams that are not yet generated with jittable version * Revert "fix issue with logit processor in beam search in Flax" This reverts commit 09b70d7. * add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor * change the method of casting to boolean of banned tokens indices * fix code style * remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop * remove useless loop iterations * set some variables that were calculated and used multiple times * fix format

giganttheo added 6 commits March 13, 2024 16:37

fix issue with logit processor in beam search in Flax

09b70d7

adding FlaxNoRepeatNGramLogitsProcessor class + unit test

037bcc2

style correction and code verification

e63313c

add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and t…

2a8ef7b

…est_processor_list_jitted tests

fix an issue where ngrams are banned only if they appear ==1 time + u…

022b83e

…pdate description of get_previous_ngrams

replace non-jit compatible masking of ngrams that are not yet generat…

b33c4d0

…ed with jittable version

gante reviewed Mar 18, 2024

View reviewed changes

giganttheo added 2 commits March 18, 2024 15:03

Revert "fix issue with logit processor in beam search in Flax"

1ee47f4

This reverts commit 09b70d7.

add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor

b410517

giganttheo mentioned this pull request Mar 18, 2024

fix issue with logit processor during beam search in Flax #29636

Merged

giganttheo added 2 commits March 19, 2024 20:03

change the method of casting to boolean of banned tokens indices

26900e0

fix code style

8a6828c

gante approved these changes Mar 20, 2024

View reviewed changes

gante requested a review from amyeroberts March 20, 2024 20:27

giganttheo added 2 commits March 21, 2024 02:05

remove some useless operations + significantly faster computation of …

6c4a9e2

…update indices using jax.lax.fori_loop

remove useless loop iterations

9a5c610

amyeroberts reviewed Mar 27, 2024

View reviewed changes

set some variables that were calculated and used multiple times

1b99d0c

amyeroberts approved these changes Mar 28, 2024

View reviewed changes

fix format

382b52d

ArthurZucker merged commit fed27ff into huggingface:main Apr 2, 2024
21 checks passed

giganttheo deleted the feature/ngram_blocking_flax branch April 21, 2024 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding FlaxNoRepeatNGramLogitsProcessor #29677

Adding FlaxNoRepeatNGramLogitsProcessor #29677

giganttheo commented Mar 15, 2024 •

edited

Loading

gante left a comment

gante Mar 18, 2024

giganttheo Mar 18, 2024

gante Mar 20, 2024

giganttheo Mar 21, 2024

gante commented Mar 18, 2024

gante left a comment

amyeroberts left a comment

amyeroberts Mar 21, 2024

giganttheo Mar 27, 2024

amyeroberts Mar 27, 2024

amyeroberts Mar 27, 2024

giganttheo Mar 27, 2024

amyeroberts Mar 28, 2024

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Apr 2, 2024

ArthurZucker commented Apr 2, 2024

Adding FlaxNoRepeatNGramLogitsProcessor #29677

Adding FlaxNoRepeatNGramLogitsProcessor #29677

Conversation

giganttheo commented Mar 15, 2024 • edited Loading

What does this PR do?

Who can review?

gante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante commented Mar 18, 2024

gante left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 2, 2024

ArthurZucker commented Apr 2, 2024

giganttheo commented Mar 15, 2024 •

edited

Loading