Fix/adapters 2 #83

Waino · 2024-12-02T11:18:36Z

Some more fixes to adapters

Create adapters before the StackXCoder, to ensure no accidental duplication
Prevent LoRA adapter from taking wrapped layer as its child
Flag --log_model_structure for debugging architecture issues

Logs the architecture of the model, and the distributed components.

Now both training and translation use `--gpu_rank 0`. Closes #82

TimotheeMickus

LGTM.

I have complaints on the cosmetic side (I'm really not into type hints when declaring vars) but nothing that serious

TimotheeMickus · 2024-12-02T19:17:29Z

mammoth/opts.py

@@ -548,7 +554,7 @@ def _add_train_general_opts(parser):
        type=float,
        default=[0.3],
        nargs='+',
-        help="Dropout probability; applied in LSTM stacks.",
+        help="Dropout probability; applied in LSTM stacks. (Probably legacy?)",


should be, since we're now delegating this to xtransformers

TimotheeMickus · 2024-12-02T19:22:38Z

mammoth/utils/parse.py

@@ -27,7 +36,7 @@ def _validate_adapters(cls, opts):
        """Parse corpora specified in data field of YAML file."""
        if not opts.adapters:
            return
-        adapter_opts = yaml.safe_load(opts.adapters)
+        adapter_opts = yaml_or_dict(opts.adapters, name='opts.adapters')


not sure about this name thing — I get that you want to display a less cryptic message, but in practice we devs are the only ones who'll ever get to see that TypeError ... and you're using that function once? the stacktrace should be unambiguous enough.

I forgot to apply this to the other similar locations where yaml.safe_load is applied. Fixed now.

This crashed when loading opts.adapters from a checkpoint. I'm not sure why those other usages didn't crash earlier, but better safe than sorry.

TimotheeMickus · 2024-12-02T19:24:44Z

mammoth/model_builder.py

@@ -338,7 +380,12 @@ def build_model(
    )

    model.to(device)
-    # logger.info(model)
+    if opts.log_model_structure:
+        logger.info(model)


feels like this should be debug rather than info, but ok, given your new flags

Yes, this should be logger.debug. However, we still haven't fixed the way we set up logging (AFAIK), so debugs are never visible.

TimotheeMickus · 2024-12-02T19:25:04Z

mammoth/model_builder.py

+        for component in task_queue_manager.get_my_distributed_components():
+            logger.info(component)
+        for name, p in model.named_parameters():
+            print(f'{p.requires_grad} {name}')


why direct print here?

TimotheeMickus · 2024-12-02T19:26:23Z

mammoth/model_builder.py

@@ -314,13 +340,28 @@ def build_model(
        device = torch.device("cpu")
    logger.info(device)

+    enc_adapters_by_name: Optional[Dict[str, Adapter]] = build_adapters(


the type hint at the var declaration is a bit much. maybe worth a comment insteaed? you already have a type hint for the return value of build_adapters

TimotheeMickus · 2024-12-02T19:35:39Z

mammoth/model_builder.py

+        component for component in my_components
+        if isinstance(component, distributed_xcoder_class)
+    ]
+    attention_layer_blocks: Dict[int, Dict[str, AdaptedAttentionLayers]] = defaultdict(dict)


still not sold on the type-hint, but gets a pass for the nested structure i suppose... not pretty.

TimotheeMickus · 2024-12-02T19:37:33Z

mammoth/model_builder.py

-    single_task: if a task_id string is given, the built model contains only the components necessary for that task.
-    token_embs: to tie encoder and decoder embeddings, pass existing embeddings here.
-    """
+) -> Optional[Dict[str, Adapter]]:


docstring this.

TimotheeMickus · 2024-12-02T19:38:07Z

mammoth/model_builder.py

-    """
+) -> Optional[Dict[str, Adapter]]:
+    # Create AdapterLayer objects and Adapter objects
+    adapters_by_name: Optional[Dict[str, Adapter]]


not into type hints declarations

TimotheeMickus · 2024-12-02T19:40:19Z

mammoth/model_builder.py

@@ -190,25 +152,89 @@ def build_xcoder(
                )
            else:
                raise ValueError(f'Unrecognized adapter_type {adapter_opts["adapter_type"]}')
+            layer_stack_index = adapter_params['layer_stack_index']


i would directly get from this dict on line 159 but sure

TimotheeMickus · 2024-12-02T19:44:37Z

mammoth/model_builder.py

+    distributed_xcoder_class: type
+    if side == Side.encoder:
+        distributed_xcoder_class = DistributedEncoderAttentionLayersBlock
+    else:


so e.g. side=None maps to the decoder blocks? feels like a wasted opportunity to halt and catch fire.

According to the type hint, side is not optional. But sure, let's doublecheck.

TimotheeMickus · 2024-12-02T19:46:40Z

on an unrelated note, it might be helpful to just have a logging_opts just like we have a model_opts and a data_opts

Waino · 2024-12-09T10:37:48Z

I'm really not into type hints when declaring vars

I've configured nvim to do linting and type checking automatically, and in most projects I aim for zero type checking errors (in mammoth, this is far from the case). I've developed a habit of type hinting a lot because of this.

TimotheeMickus · 2024-12-09T11:05:58Z

I'm really not into type hints when declaring vars

I've configured nvim to do linting and type checking automatically, and in most projects I aim for zero type checking errors (in mammoth, this is far from the case). I've developed a habit of type hinting a lot because of this.

If you want to enforce typehinting in this project, I think you'd need to incorporate the relevant testing in the actions workflow

Waino added 3 commits November 25, 2024 11:28

new opt --log_model_structure

bd2eb9b

Logs the architecture of the model, and the distributed components.

Attempt to prevent accidental duplication of parameters

fc0c8c8

Commented synthdata.template.yaml

2143ff0

Waino requested a review from TimotheeMickus December 2, 2024 11:18

Waino force-pushed the fix/adapters-2 branch from 1304286 to 2143ff0 Compare December 2, 2024 11:20

Waino mentioned this pull request Dec 2, 2024

Update documentation #85

Merged

Waino added 2 commits December 2, 2024 19:26

Unify the opt for using gpu

beede4c

Now both training and translation use `--gpu_rank 0`. Closes #82

Allow translation to read from gzip

41b4014

TimotheeMickus approved these changes Dec 2, 2024

View reviewed changes

Code review fixes

081f510

Waino merged commit 56ac097 into main Dec 9, 2024
2 checks passed

Waino deleted the fix/adapters-2 branch December 9, 2024 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/adapters 2 #83

Fix/adapters 2 #83

Waino commented Dec 2, 2024

TimotheeMickus left a comment

TimotheeMickus Dec 2, 2024

TimotheeMickus Dec 2, 2024

Waino Dec 9, 2024

TimotheeMickus Dec 2, 2024

Waino Dec 9, 2024

TimotheeMickus Dec 2, 2024

TimotheeMickus Dec 2, 2024

TimotheeMickus Dec 2, 2024

TimotheeMickus Dec 2, 2024

TimotheeMickus Dec 2, 2024

TimotheeMickus Dec 2, 2024

TimotheeMickus Dec 2, 2024

Waino Dec 9, 2024

TimotheeMickus commented Dec 2, 2024

Waino commented Dec 9, 2024

TimotheeMickus commented Dec 9, 2024

Fix/adapters 2 #83

Fix/adapters 2 #83

Conversation

Waino commented Dec 2, 2024

TimotheeMickus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimotheeMickus commented Dec 2, 2024

Waino commented Dec 9, 2024

TimotheeMickus commented Dec 9, 2024