FlaxGPTNeo #12493

patil-suraj · 2021-07-04T05:28:26Z

What does this PR do?

This PR adds the Flax version of GPTNeo. For local attention, it uses the fix proposed by @finetuneanon in #11630.

Thanks a lot, @finetuneanon for proposing the solution, it's especially important in JAX/Flax where we can't have dynamic shapes.

Official GPTNeo flax checkpoints are up on the hub and slow tests are passing.

patrickvonplaten · 2021-07-05T12:39:52Z

src/transformers/models/gpt_neo/modeling_flax_gpt_neo.py

+
+        self.causal_mask = make_causal_mask(jnp.ones((1, config.max_position_embeddings), dtype="bool"), dtype="bool")
+        if self.attention_type == "local":
+            self.causal_mask = self.causal_mask ^ jnp.tril(self.causal_mask, -config.window_size)


maybe an additional comment here would be nice

patrickvonplaten

Awesome - very clean!

LysandreJik

This is very nice, thanks for working on it @patil-suraj!

LysandreJik · 2021-07-05T14:03:58Z

tests/test_modeling_flax_gpt_neo.py

+                    prepared_inputs_dict["attention_mask"][batch_idx, :start_index] = 0
+                    prepared_inputs_dict["attention_mask"][batch_idx, start_index:] = 1
+                pt_model = pt_model_class(config).eval()
+                fx_model = model_class(config, dtype=jnp.float32)


Is fx_model a common name for Flax models? It reminds of torch.fx

Aah, yeah this is confusing. Maybe we could use flx or just flax for flax models. (cc @patrickvonplaten )

patil-suraj added 4 commits July 4, 2021 10:56

flax gpt neo

da3e218

fix query scaling

de09e7b

update generation test

c1649f8

use flax model for test

b827105

patil-suraj requested review from sgugger, LysandreJik and patrickvonplaten July 4, 2021 09:31

patrickvonplaten reviewed Jul 5, 2021

View reviewed changes

patrickvonplaten approved these changes Jul 5, 2021

View reviewed changes

LysandreJik approved these changes Jul 5, 2021

View reviewed changes

patil-suraj merged commit 7a259c1 into huggingface:master Jul 6, 2021

patil-suraj deleted the flax-gpt-neo branch July 6, 2021 13:25

StellaAthena mentioned this pull request Aug 4, 2021

Simplify GPT-Neo local attention implementation #11630

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlaxGPTNeo #12493

FlaxGPTNeo #12493

patil-suraj commented Jul 4, 2021 •

edited

Loading

patrickvonplaten Jul 5, 2021

patrickvonplaten Jul 5, 2021

patrickvonplaten left a comment

LysandreJik left a comment

LysandreJik Jul 5, 2021

patil-suraj Jul 5, 2021

FlaxGPTNeo #12493

FlaxGPTNeo #12493

Conversation

patil-suraj commented Jul 4, 2021 • edited Loading

What does this PR do?

patrickvonplaten Jul 5, 2021

Choose a reason for hiding this comment

patrickvonplaten Jul 5, 2021

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Jul 5, 2021

Choose a reason for hiding this comment

patil-suraj Jul 5, 2021

Choose a reason for hiding this comment

patil-suraj commented Jul 4, 2021 •

edited

Loading