You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to fine tune a mT5 model (google/mT5 series models) on a custom dataset that follows the text format given in your documentation for the column data loader. I have been trying to figure out what is happening but I think there is some problem in the way the model is being loaded/saved. I am sharing my files that have changes done to them (uses the base template of this example).
importinspectimportjsonimportloggingimportosimportsysfromdataclassesimportdataclass, fieldimporttorchfromtransformersimportHfArgumentParserimportflairfromflairimportset_seedfromflair.embeddingsimportTransformerWordEmbeddingsfromflair.modelsimportSequenceTaggerfromflair.trainersimportModelTrainerfromflair.datasetsimportColumnCorpuslogger=logging.getLogger("flair")
logger.setLevel(level="INFO")
@dataclassclassModelArguments:
model_name_or_path: str=field(
metadata={"help": "The model checkpoint for weights initialization."},
)
layers: str=field(default="-1", metadata={"help": "Layers to be fine-tuned."})
subtoken_pooling: str=field(
default="first",
metadata={"help": "Subtoken pooling strategy used for fine-tuned."},
)
hidden_size: int=field(default=256, metadata={"help": "Hidden size for NER model."})
use_crf: bool=field(default=False, metadata={"help": "Whether to use a CRF on-top or not."})
@dataclassclassTrainingArguments:
num_epochs: int=field(default=10, metadata={"help": "The number of training epochs."})
batch_size: int=field(default=8, metadata={"help": "Batch size used for training."})
mini_batch_chunk_size: int=field(
default=1,
metadata={"help": "If smaller than batch size, batches will be chunked."},
)
learning_rate: float=field(default=5e-05, metadata={"help": "Learning rate"})
seed: int=field(default=42, metadata={"help": "Seed used for reproducible fine-tuning results."})
device: str=field(default="cuda:0", metadata={"help": "CUDA device string."})
weight_decay: float=field(default=0.0, metadata={"help": "Weight decay for optimizer."})
embeddings_storage_mode: str=field(default="none", metadata={"help": "Defines embedding storage method."})
@dataclassclassFlertArguments:
context_size: int=field(default=0, metadata={"help": "Context size when using FLERT approach."})
respect_document_boundaries: bool=field(
default=False,
metadata={"help": "Whether to respect document boundaries or not when using FLERT."},
)
@dataclassclassDataArguments:
dataset_name: str=field(metadata={"help": "Flair NER dataset name."})
dataset_arguments: str=field(default="", metadata={"help": "Dataset arguments for Flair NER dataset."})
output_dir: str=field(
default="resources/taggers/ner",
metadata={"help": "Defines output directory for final fine-tuned model."},
)
defget_flair_corpus(data_args):
ner_task_mapping= {}
forname, objininspect.getmembers(flair.datasets.sequence_labeling):
ifinspect.isclass(obj):
ifname.startswith("NER") orname.startswith("CONLL") orname.startswith("WNUT"):
ner_task_mapping[name] =objdataset_args= {}
dataset_name=data_args.dataset_nameifdata_args.dataset_arguments:
dataset_args=json.loads(data_args.dataset_arguments)
ifdataset_namenotinner_task_mapping:
raiseValueError(f"Dataset name {dataset_name} is not a valid Flair datasets name!")
returnner_task_mapping[dataset_name](**dataset_args)
defmain():
parser=HfArgumentParser((ModelArguments, TrainingArguments, FlertArguments, DataArguments))
iflen(sys.argv) ==2andsys.argv[1].endswith(".json"):
(
model_args,
training_args,
flert_args,
data_args,
) =parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
(
model_args,
training_args,
flert_args,
data_args,
) =parser.parse_args_into_dataclasses()
set_seed(training_args.seed)
flair.device=training_args.devicecolumns= {0: 'tokens', 1: 'ner'}
corpus: Corpus=ColumnCorpus('some_directory/astrobert_models/Model_3(mT5)', columns,
train_file='train-80txt',
test_file='test-10.txt',
dev_file='val-10.txt'
)
logger.info(corpus)
tag_type: str="ner"tag_dictionary=corpus.make_label_dictionary(tag_type, add_unk=False)
logger.info(tag_dictionary)
embeddings=TransformerWordEmbeddings(
model=model_args.model_name_or_path,
layers=model_args.layers,
subtoken_pooling=model_args.subtoken_pooling,
fine_tune=True,
allow_long_sentences=True,
use_context=flert_args.context_size,
respect_document_boundaries=flert_args.respect_document_boundaries,
)
tagger=SequenceTagger(
hidden_size=model_args.hidden_size,
embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type=tag_type,
use_crf=model_args.use_crf,
use_rnn=False,
allow_unk_predictions=True,
reproject_embeddings=True,
)
trainer=ModelTrainer(tagger, corpus)
trainer.fine_tune(
data_args.output_dir,
learning_rate=training_args.learning_rate,
mini_batch_size=training_args.batch_size,
mini_batch_chunk_size=training_args.mini_batch_chunk_size,
max_epochs=training_args.num_epochs,
embeddings_storage_mode=training_args.embeddings_storage_mode,
weight_decay=training_args.weight_decay,
param_selection_mode=False,
use_final_model_for_eval=False,
save_final_model=False,
)
torch.save(model_args, os.path.join(data_args.output_dir, "model_args.bin"))
torch.save(training_args, os.path.join(data_args.output_dir, "training_args.bin"))
# finally, print model card for informationtagger.print_model_card()
if__name__=="__main__":
main()
This uses the google/mT5-large model to fine tune but I am using the google/mT5-base which is similar architecture but less parameters.
Also, this is using the add-t5-encoder-support branch for running the code.
2024-09-02 22:12:56,024 Reading data from some_directory/astrobert_models/Model_3(mT5)
2024-09-02 22:12:56,024 Train: some_directory/astrobert_models/Model_3(mT5)/train-80.txt
2024-09-02 22:12:56,025 Dev: some_directory/astrobert_models/Model_3(mT5)/val-10.txt
2024-09-02 22:12:56,025 Test: some_directory/astrobert_models/Model_3(mT5)/test-10.txt
2024-09-02 22:13:02,297 Corpus: 2028 train + 226 dev + 251 test sentences
2024-09-02 22:13:02,298 Computing label dictionary. Progress:
2028it [00:00, 22607.38it/s]
2024-09-02 22:13:02,408 Dictionary created for label 'ner' with 31 values: Organization (seen 9269 times), Citation (seen 7050 times), Person (seen 4895 times), Grant (seen 4199 times), Wavelength (seen 3773 times), CelestialObject (seen 3035 times), Formula (seen 2860 times), Model (seen 2531 times), Telescope (seen 1929 times), Location (seen 1817 times), Software (seen 1154 times), Observatory (seen 1036 times), Survey (seen 1034 times), Instrument (seen 912 times), CelestialObjectRegion (seen 619 times), ComputingFacility (seen 496 times), Fellowship (seen 495 times), Dataset (seen 448 times), Collaboration (seen 370 times), EntityOfFutureInterest (seen 347 times)
2024-09-02 22:13:02,408 Dictionary with 31 tags: Organization, Citation, Person, Grant, Wavelength, CelestialObject, Formula, Model, Telescope, Location, Software, Observatory, Survey, Instrument, CelestialObjectRegion, ComputingFacility, Fellowship, Dataset, Collaboration, EntityOfFutureInterest, URL, Archive, Database, TextGarbage, Mission, CelestialRegion, Proposal, Identifier, Tag, ObservationalTechniques, Event
/home/bob2/.local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:560: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
/home/bob2/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
2024-09-02 22:13:08,487 SequenceTagger predicts: Dictionary with 126 tags: <unk>, O, S-Organization, B-Organization, E-Organization, I-Organization, S-Citation, B-Citation, E-Citation, I-Citation, S-Person, B-Person, E-Person, I-Person, S-Grant, B-Grant, E-Grant, I-Grant, S-Wavelength, B-Wavelength, E-Wavelength, I-Wavelength, S-CelestialObject, B-CelestialObject, E-CelestialObject, I-CelestialObject, S-Formula, B-Formula, E-Formula, I-Formula, S-Model, B-Model, E-Model, I-Model, S-Telescope, B-Telescope, E-Telescope, I-Telescope, S-Location, B-Location, E-Location, I-Location, S-Software, B-Software, E-Software, I-Software, S-Observatory, B-Observatory, E-Observatory, I-Observatory
2024-09-02 22:13:09,364 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): T5EncoderModel(
(shared): Embedding(250112, 768)
(encoder): T5Stack(
(embed_tokens): Embedding(250112, 768)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
(relative_attention_bias): Embedding(32, 12)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(word_dropout): WordDropout(p=0.05)
(locked_dropout): LockedDropout(p=0.5)
(embedding2nn): Linear(in_features=1536, out_features=1536, bias=True)
(linear): Linear(in_features=1536, out_features=128, bias=True)
(loss_function): ViterbiLoss()
(crf): CRF()
)"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Corpus: "Corpus: 2028 train + 226 dev + 251 test sentences"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Parameters:
2024-09-02 22:13:09,365 - learning_rate: "0.000050"
2024-09-02 22:13:09,365 - mini_batch_size: "4"
2024-09-02 22:13:09,365 - patience: "3"
2024-09-02 22:13:09,365 - anneal_factor: "0.5"
2024-09-02 22:13:09,365 - max_epochs: "5"
2024-09-02 22:13:09,365 - shuffle: "True"
2024-09-02 22:13:09,365 - train_with_dev: "False"
2024-09-02 22:13:09,365 - batch_growth_annealing: "False"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Model training base path: "content/mt5-large"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,366 Device: cuda:0
2024-09-02 22:13:09,366 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,366 Embeddings storage mode: none
2024-09-02 22:13:09,366 ----------------------------------------------------------------------------------------------------
2024-09-02 22:14:22,599 epoch 1 - iter 50/507 - loss 5.21869016 - samples/sec: 2.73 - lr: 0.000010
2024-09-02 22:15:31,374 epoch 1 - iter 100/507 - loss 4.76969707 - samples/sec: 2.91 - lr: 0.000020
2024-09-02 22:16:44,454 epoch 1 - iter 150/507 - loss 3.84992501 - samples/sec: 2.74 - lr: 0.000030
2024-09-02 22:17:57,165 epoch 1 - iter 200/507 - loss 3.22765532 - samples/sec: 2.75 - lr: 0.000040
2024-09-02 22:19:07,797 epoch 1 - iter 250/507 - loss 2.81055829 - samples/sec: 2.83 - lr: 0.000049
2024-09-02 22:20:24,791 epoch 1 - iter 300/507 - loss 2.47280144 - samples/sec: 2.60 - lr: 0.000049
2024-09-02 22:21:34,641 epoch 1 - iter 350/507 - loss 2.25822920 - samples/sec: 2.86 - lr: 0.000048
2024-09-02 22:22:49,561 epoch 1 - iter 400/507 - loss 2.06685372 - samples/sec: 2.67 - lr: 0.000047
2024-09-02 22:24:04,744 epoch 1 - iter 450/507 - loss 1.91565943 - samples/sec: 2.66 - lr: 0.000046
2024-09-02 22:25:15,756 epoch 1 - iter 500/507 - loss 1.80107189 - samples/sec: 2.82 - lr: 0.000045
2024-09-02 22:25:23,133 ----------------------------------------------------------------------------------------------------
2024-09-02 22:25:23,133 EPOCH 1 done: loss 1.7909 - lr 0.000045
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [00:37<00:00, 1.50it/s]
2024-09-02 22:26:01,071 Evaluating as a multi-label problem: False
2024-09-02 22:26:01,123 DEV : loss 0.40007856488227844 - f1-score (micro avg) 0.4167
2024-09-02 22:26:01,134 BAD EPOCHS (no improvement): 4
2024-09-02 22:26:01,134 saving best model
2024-09-02 22:26:02,344 ----------------------------------------------------------------------------------------------------
2024-09-02 22:27:14,119 epoch 2 - iter 50/507 - loss 0.66224097 - samples/sec: 2.79 - lr: 0.000043
2024-09-02 22:28:26,077 epoch 2 - iter 100/507 - loss 0.66289136 - samples/sec: 2.78 - lr: 0.000042
2024-09-02 22:29:43,508 epoch 2 - iter 150/507 - loss 0.66188128 - samples/sec: 2.58 - lr: 0.000041
2024-09-02 22:30:56,096 epoch 2 - iter 200/507 - loss 0.64561237 - samples/sec: 2.76 - lr: 0.000040
2024-09-02 22:32:07,025 epoch 2 - iter 250/507 - loss 0.63093977 - samples/sec: 2.82 - lr: 0.000039
2024-09-02 22:33:13,665 epoch 2 - iter 300/507 - loss 0.62267017 - samples/sec: 3.00 - lr: 0.000038
2024-09-02 22:34:27,071 epoch 2 - iter 350/507 - loss 0.61492844 - samples/sec: 2.72 - lr: 0.000037
2024-09-02 22:35:41,670 epoch 2 - iter 400/507 - loss 0.60867990 - samples/sec: 2.68 - lr: 0.000036
2024-09-02 22:36:53,006 epoch 2 - iter 450/507 - loss 0.60102799 - samples/sec: 2.80 - lr: 0.000035
2024-09-02 22:38:06,344 epoch 2 - iter 500/507 - loss 0.59238830 - samples/sec: 2.73 - lr: 0.000034
2024-09-02 22:38:15,044 ----------------------------------------------------------------------------------------------------
2024-09-02 22:38:15,045 EPOCH 2 done: loss 0.5919 - lr 0.000034
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:26<00:00, 1.52s/it]
2024-09-02 22:39:41,854 Evaluating as a multi-label problem: False
2024-09-02 22:39:41,895 DEV : loss 0.258797824382782 - f1-score (micro avg) 0.6063
2024-09-02 22:39:41,907 BAD EPOCHS (no improvement): 4
2024-09-02 22:39:41,907 saving best model
2024-09-02 22:39:50,782 ----------------------------------------------------------------------------------------------------
2024-09-02 22:40:52,674 epoch 3 - iter 50/507 - loss 0.53751011 - samples/sec: 3.23 - lr: 0.000032
2024-09-02 22:42:07,553 epoch 3 - iter 100/507 - loss 0.51292905 - samples/sec: 2.67 - lr: 0.000031
2024-09-02 22:43:15,788 epoch 3 - iter 150/507 - loss 0.52074144 - samples/sec: 2.93 - lr: 0.000030
2024-09-02 22:44:29,978 epoch 3 - iter 200/507 - loss 0.50887246 - samples/sec: 2.70 - lr: 0.000029
2024-09-02 22:45:44,776 epoch 3 - iter 250/507 - loss 0.50465450 - samples/sec: 2.67 - lr: 0.000028
2024-09-02 22:46:53,595 epoch 3 - iter 300/507 - loss 0.49652591 - samples/sec: 2.91 - lr: 0.000027
2024-09-02 22:48:03,269 epoch 3 - iter 350/507 - loss 0.49103096 - samples/sec: 2.87 - lr: 0.000026
2024-09-02 22:49:22,787 epoch 3 - iter 400/507 - loss 0.48587132 - samples/sec: 2.52 - lr: 0.000025
2024-09-02 22:50:40,318 epoch 3 - iter 450/507 - loss 0.47988559 - samples/sec: 2.58 - lr: 0.000024
2024-09-02 22:51:53,871 epoch 3 - iter 500/507 - loss 0.47534172 - samples/sec: 2.72 - lr: 0.000022
2024-09-02 22:52:02,896 ----------------------------------------------------------------------------------------------------
2024-09-02 22:52:02,896 EPOCH 3 done: loss 0.4754 - lr 0.000022
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:27<00:00, 1.53s/it]
2024-09-02 22:53:30,026 Evaluating as a multi-label problem: False
2024-09-02 22:53:30,067 DEV : loss 0.22028639912605286 - f1-score (micro avg) 0.6517
2024-09-02 22:53:30,079 BAD EPOCHS (no improvement): 4
2024-09-02 22:53:30,079 saving best model
2024-09-02 22:53:39,030 ----------------------------------------------------------------------------------------------------
2024-09-02 22:54:58,710 epoch 4 - iter 50/507 - loss 0.42972222 - samples/sec: 2.51 - lr: 0.000021
2024-09-02 22:56:09,934 epoch 4 - iter 100/507 - loss 0.42529253 - samples/sec: 2.81 - lr: 0.000020
2024-09-02 22:57:18,254 epoch 4 - iter 150/507 - loss 0.41949796 - samples/sec: 2.93 - lr: 0.000019
2024-09-02 22:58:35,158 epoch 4 - iter 200/507 - loss 0.41590241 - samples/sec: 2.60 - lr: 0.000018
2024-09-02 22:59:42,396 epoch 4 - iter 250/507 - loss 0.42134116 - samples/sec: 2.97 - lr: 0.000017
2024-09-02 23:00:51,994 epoch 4 - iter 300/507 - loss 0.42124508 - samples/sec: 2.87 - lr: 0.000016
2024-09-02 23:02:06,538 epoch 4 - iter 350/507 - loss 0.41991969 - samples/sec: 2.68 - lr: 0.000015
2024-09-02 23:03:16,007 epoch 4 - iter 400/507 - loss 0.41864415 - samples/sec: 2.88 - lr: 0.000014
2024-09-02 23:04:30,849 epoch 4 - iter 450/507 - loss 0.41877229 - samples/sec: 2.67 - lr: 0.000012
2024-09-02 23:05:43,238 epoch 4 - iter 500/507 - loss 0.41600581 - samples/sec: 2.76 - lr: 0.000011
2024-09-02 23:05:52,670 ----------------------------------------------------------------------------------------------------
2024-09-02 23:05:52,670 EPOCH 4 done: loss 0.4157 - lr 0.000011
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:27<00:00, 1.53s/it]
2024-09-02 23:07:20,127 Evaluating as a multi-label problem: False
2024-09-02 23:07:20,169 DEV : loss 0.20156854391098022 - f1-score (micro avg) 0.6764
2024-09-02 23:07:20,181 BAD EPOCHS (no improvement): 4
2024-09-02 23:07:20,181 saving best model
2024-09-02 23:07:29,094 ----------------------------------------------------------------------------------------------------
2024-09-02 23:08:41,206 epoch 5 - iter 50/507 - loss 0.41014725 - samples/sec: 2.77 - lr: 0.000010
2024-09-02 23:09:55,703 epoch 5 - iter 100/507 - loss 0.40355902 - samples/sec: 2.68 - lr: 0.000009
2024-09-02 23:11:06,169 epoch 5 - iter 150/507 - loss 0.40052907 - samples/sec: 2.84 - lr: 0.000008
2024-09-02 23:12:16,356 epoch 5 - iter 200/507 - loss 0.40273058 - samples/sec: 2.85 - lr: 0.000007
2024-09-02 23:13:28,812 epoch 5 - iter 250/507 - loss 0.39995092 - samples/sec: 2.76 - lr: 0.000006
2024-09-02 23:14:41,129 epoch 5 - iter 300/507 - loss 0.39412877 - samples/sec: 2.77 - lr: 0.000005
2024-09-02 23:15:54,505 epoch 5 - iter 350/507 - loss 0.39045605 - samples/sec: 2.73 - lr: 0.000004
2024-09-02 23:17:07,290 epoch 5 - iter 400/507 - loss 0.39085101 - samples/sec: 2.75 - lr: 0.000002
2024-09-02 23:18:20,001 epoch 5 - iter 450/507 - loss 0.38970339 - samples/sec: 2.75 - lr: 0.000001
2024-09-02 23:19:30,506 epoch 5 - iter 500/507 - loss 0.38807320 - samples/sec: 2.84 - lr: 0.000000
2024-09-02 23:19:42,705 ----------------------------------------------------------------------------------------------------
2024-09-02 23:19:42,705 EPOCH 5 done: loss 0.3880 - lr 0.000000
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:27<00:00, 1.53s/it]
2024-09-02 23:21:09,993 Evaluating as a multi-label problem: False
2024-09-02 23:21:10,034 DEV : loss 0.19652396440505981 - f1-score (micro avg) 0.6858
2024-09-02 23:21:10,046 BAD EPOCHS (no improvement): 4
2024-09-02 23:21:10,046 saving best model
2024-09-02 23:21:20,453 ----------------------------------------------------------------------------------------------------
2024-09-02 23:21:20,454 loading file content/mt5-large/best-model.pt
Traceback (most recent call last):
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/run_ner.py", line 382, in <module>
main()
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/run_ner.py", line 363, in main
trainer.fine_tune(
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/trainers/trainer.py", line 919, in fine_tune
return self.train(
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/trainers/trainer.py", line 836, in train
final_score = self.final_test(
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/trainers/trainer.py", line 949, in final_test
self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict())
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/nn/model.py", line 142, in load
state = torch.load(f, map_location="cpu")
File "/home/bob2/.local/lib/python3.10/site-packages/torch/serialization.py", line 1025, in load
return _load(opened_zipfile,
File "/home/bob2/.local/lib/python3.10/site-packages/torch/serialization.py", line 1446, in _load
result = unpickler.load()
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/transformer.py", line 1004, in __setstate__
embedding = self.create_from_state(saved_config=config, **state)
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/token.py", line 62, in create_from_state
return cls(**state)
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/token.py", line 49, in __init__
TransformerEmbeddings.__init__(
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/transformer.py", line 810, in __init__
self.tokenizer = self._tokenizer_from_bytes(tokenizer_data)
File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/transformer.py", line 335, in _tokenizer_from_bytes
return AutoTokenizer.from_pretrained(temp_dir, add_prefix_space=True)
File "/home/bob2/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 880, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/bob2/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2110, in from_pretrained
return cls._from_pretrained(
File "/home/bob2/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2336, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/bob2/.local/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5_fast.py", line 120, in __init__
super().__init__(
File "/home/bob2/.local/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 124, in __init__
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File "/home/bob2/.local/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5.py", line 151, in __init__
self.sp_model.Load(vocab_file)
File "/home/bob2/.local/lib/python3.10/site-packages/sentencepiece/__init__.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/home/bob2/.local/lib/python3.10/site-packages/sentencepiece/__init__.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
Screenshots
No response
Additional Context
Please let me know if you need any more context or maybe a very small dataset to reproduce the results for this output. Thanks in advance for any assistance.
Environment
Versions:
Flair
0.13.1
Pytorch
2.3.1+cu121
Transformers
4.41.2
GPU
True
The text was updated successfully, but these errors were encountered:
Describe the bug
Hello,
I was trying to fine tune a mT5 model (google/mT5 series models) on a custom dataset that follows the text format given in your documentation for the column data loader. I have been trying to figure out what is happening but I think there is some problem in the way the model is being loaded/saved. I am sharing my files that have changes done to them (uses the base template of this example).
To Reproduce
run_ner.py
(I am trying to reproduce results from this repo: https://github.com/MLlab4CS/Astro-mT5/tree/main)This uses the
google/mT5-large
model to fine tune but I am using thegoogle/mT5-base
which is similar architecture but less parameters.Also, this is using the
add-t5-encoder-support
branch for running the code.Expected behavior
Expected behaviour is that these parameters:
should allow me to save the best model and run the tests on this. But I am unable to do so.
Logs and Stack traces
Command to invoke the training (fine tuning)
Stack Trace with the training log:
Screenshots
No response
Additional Context
Please let me know if you need any more context or maybe a very small dataset to reproduce the results for this output. Thanks in advance for any assistance.
Environment
Versions:
Flair
0.13.1
Pytorch
2.3.1+cu121
Transformers
4.41.2
GPU
True
The text was updated successfully, but these errors were encountered: