LayoutLM Token Classification not learning #8524

AntPeixe · 2020-11-13T16:04:14Z

Environment info

transformers version: 3.4.0
Platform: in docker based on image: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
Python version: 3.7.9
PyTorch version (GPU?): 1.5.1+cu92 (True)
Tensorflow version (GPU?): 2.2.0-rc0 (True)
Using GPU in script?: True
Using distributed or parallel set-up in script?:

Information

Model I am using (Bert, XLNet ...): LayoutLMForTokenClassification

The problem arises when using: my own scripts

The tasks I am working on is: my own task
NER task. I've reproduced the implementation of Dataset, compute metrics (and other helper functions) as in the original repo microsoft/layoutlm repo

When initially trying with the original repo and training script the model managed to learn and provided reasonable results after very few epochs. After implementing with Huggingface the model doesn't learn at all even after a much higher number of epochs.

To reproduce

Model loading and trainer configuration:

config = LayoutLMConfig.from_pretrained(
    <path_layoutlm_base_uncased>,
    num_labels=<num_labels>,
    cache_dir=None
)
model = LayoutLMForTokenClassification.from_pretrained(
    <path_layoutlm_base_uncased>
    from_tf=bool(".ckpt" in <path_layoutlm_base_uncased>),
    config=config,
    cache_dir=None,
)

device = torch.device("cuda")
model.train().to(device)

TrainingArguments(
    output_dir=<pytorch_model_dir>,  # output directory
    do_train=True,
    do_eval=True,
    do_predict=False,
    evaluation_strategy=EvaluationStrategy.EPOCH,
    num_train_epochs=<epochs>,  # total # of training epochs
    per_device_train_batch_size=<batch_size>,  # batch size per device during training
    per_device_eval_batch_size=<batch_size>,  # batch size for evaluation
    weight_decay=<weight_decay>,  # strength of weight decay
    learning_rate=<learning_rate>,
    adam_epsilon=<adam_epsilon>,
    logging_dir=<profile_logs>,  # Tensorboard log directory
    logging_steps=0,  # it logs when running evaluation so no need to log on step interval
    save_steps=0,
    seed=seed,
    overwrite_output_dir=True,
    disable_tqdm=False,
    load_best_model_at_end=True,
    save_total_limit=10,
    fp16=True,
)

trainer = MetaMazeTrainer(
    model=model,  # the instantiated 🤗 Transformers model to be trained
    args=training_args,  # training arguments, defined above
    train_dataset=train_dataset,  # training dataset
    eval_dataset=test_dataset,  # evaluation dataset
    compute_metrics=compute_metrics,
)

Expected behavior

Similar results to the original repo as the given the same parameters to the trainer and the Dataset being the same after processing the data.

Is this due to the ongoing integration of this model? Is the setup wrong?

The text was updated successfully, but these errors were encountered:

aleksandra-sp · 2020-11-19T08:41:31Z

Is there any update on this issue?

NielsRogge · 2021-01-04T09:06:52Z

Hi there!

I have been investigating the model by making integration tests, and turns out it outputs the same tensors as the original repository on the same input data, so there are no issues (tested this both for the base model - LayoutLMModel as well as the models with heads on top - LayoutLMForTokenClassification and LayoutLMForSequenceClassification).

However, the model is poorly documented in my opinion, I needed to first look at the original repository to understand everything. I made a demo notebook that showcases how to fine-tune HuggingFace's LayoutLMForTokenClassification on the FUNSD dataset (a sequence labeling task): https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb

Let me know if this helps you!

hasansalimkanmaz · 2021-01-04T09:55:15Z

I have experienced the same issue, I realized that model files from here are different than the weights in the original repo. I was using weights from the original repo and the model couldn't load them at the start of the training. So, I was starting from a random model instead of a pre-trained one. That's why it is not learning much in a down-stream task.

I solved the issue by using model files from huggingface

github-actions · 2021-03-06T00:16:24Z

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

hasansalimkanmaz mentioned this issue Jan 4, 2021

different embedding weights for base-uncased with different transformers versions #8866

Closed

NielsRogge mentioned this issue Jan 8, 2021

Improve LayoutLM #9476

Merged

4 tasks

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this as completed Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LayoutLM Token Classification not learning #8524

LayoutLM Token Classification not learning #8524

AntPeixe commented Nov 13, 2020

aleksandra-sp commented Nov 19, 2020

NielsRogge commented Jan 4, 2021 •

edited

Loading

hasansalimkanmaz commented Jan 4, 2021

github-actions bot commented Mar 6, 2021

LayoutLM Token Classification not learning #8524

LayoutLM Token Classification not learning #8524

Comments

AntPeixe commented Nov 13, 2020

Environment info

Information

To reproduce

Expected behavior

aleksandra-sp commented Nov 19, 2020

NielsRogge commented Jan 4, 2021 • edited Loading

hasansalimkanmaz commented Jan 4, 2021

github-actions bot commented Mar 6, 2021

NielsRogge commented Jan 4, 2021 •

edited

Loading