Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TypeError: Object of type int64 is not JSON serializable #24340

Merged
merged 10 commits into from
Jun 27, 2023
Merged

Fix TypeError: Object of type int64 is not JSON serializable #24340

merged 10 commits into from
Jun 27, 2023

Conversation

xiaoli
Copy link
Contributor

@xiaoli xiaoli commented Jun 18, 2023

What does this PR do?

Fixed that "TypeError: Object of type int64 is not JSON serializable"

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

@amyeroberts
Copy link
Collaborator

Hi @xiaoli, thanks for opening this PR.

Could you provide some more information about when the error occurs? Does this happen when running with the values from the example readme?

@xiaoli
Copy link
Contributor Author

xiaoli commented Jun 20, 2023

Hi @amyeroberts, it happened on executing ./run_no_trainer.sh, and everything works smoothly but the last step of that saving results into JSON file.

I got this error:
TypeError: Object of type int64 is not JSON serializable, so this commit is trying to fix that.

This was happened on my Ubuntu 22.04 workstation.

@xiaoli
Copy link
Contributor Author

xiaoli commented Jun 20, 2023

(transformers) ➜  token-classification git:(main) ./run_no_trainer.sh && echo $(date +%d.%m.%y-%H:%M:%S)
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `0`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
06/20/2023 10:54:40 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: mps

Mixed precision type: no

Downloading builder script: 100%|████████████████████████████████████████████| 9.57k/9.57k [00:00<00:00, 8.80MB/s]
Downloading metadata: 100%|██████████████████████████████████████████████████| 3.73k/3.73k [00:00<00:00, 9.41MB/s]
Downloading readme: 100%|████████████████████████████████████████████████████| 12.3k/12.3k [00:00<00:00, 16.9MB/s]
Downloading and preparing dataset conll2003/conll2003 to /Users/xiaoliwang/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98...
Downloading data: 100%|████████████████████████████████████████████████████████| 983k/983k [00:00<00:00, 3.57MB/s]
Generating train split:   0%|                                                    | 0/14041 [00:00<?, ? examples/s]06/20/2023 10:54:47 - INFO - datasets_modules.datasets.conll2003.9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98.conll2003 - ⏳ Generating examples from = /Users/xiaoliwang/.cache/huggingface/datasets/downloads/extracted/31a52031f62b2a9281d3b6c2723006e2fa05b33157a4249729067b79f7aa068a/train.txt
Generating validation split:   0%|                                                | 0/3250 [00:00<?, ? examples/s]06/20/2023 10:54:48 - INFO - datasets_modules.datasets.conll2003.9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98.conll2003 - ⏳ Generating examples from = /Users/xiaoliwang/.cache/huggingface/datasets/downloads/extracted/31a52031f62b2a9281d3b6c2723006e2fa05b33157a4249729067b79f7aa068a/valid.txt
Generating test split:   0%|                                                      | 0/3453 [00:00<?, ? examples/s]06/20/2023 10:54:48 - INFO - datasets_modules.datasets.conll2003.9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98.conll2003 - ⏳ Generating examples from = /Users/xiaoliwang/.cache/huggingface/datasets/downloads/extracted/31a52031f62b2a9281d3b6c2723006e2fa05b33157a4249729067b79f7aa068a/test.txt
Dataset conll2003 downloaded and prepared to /Users/xiaoliwang/.cache/huggingface/datasets/conll2003/conll2003/1.0.0/9a4d16a94f8674ba3466315300359b0acd891b68b6c8743ddf60b9c702adce98. Subsequent calls will reuse this data.
100%|█████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1282.14it/s]
loading configuration file config.json from cache at /Users/xiaoliwang/.cache/huggingface/hub/models--bert-base-uncased/snapshots/a265f773a47193eed794233aa2a0f0bb6d3eaa63/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5",
    "6": "LABEL_6",
    "7": "LABEL_7",
    "8": "LABEL_8"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5,
    "LABEL_6": 6,
    "LABEL_7": 7,
    "LABEL_8": 8
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading configuration file config.json from cache at /Users/xiaoliwang/.cache/huggingface/hub/models--bert-base-uncased/snapshots/a265f773a47193eed794233aa2a0f0bb6d3eaa63/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading file vocab.txt from cache at /Users/xiaoliwang/.cache/huggingface/hub/models--bert-base-uncased/snapshots/a265f773a47193eed794233aa2a0f0bb6d3eaa63/vocab.txt
loading file tokenizer.json from cache at /Users/xiaoliwang/.cache/huggingface/hub/models--bert-base-uncased/snapshots/a265f773a47193eed794233aa2a0f0bb6d3eaa63/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /Users/xiaoliwang/.cache/huggingface/hub/models--bert-base-uncased/snapshots/a265f773a47193eed794233aa2a0f0bb6d3eaa63/tokenizer_config.json
loading configuration file config.json from cache at /Users/xiaoliwang/.cache/huggingface/hub/models--bert-base-uncased/snapshots/a265f773a47193eed794233aa2a0f0bb6d3eaa63/config.json
Model config BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.31.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

Downloading model.safetensors: 100%|███████████████████████████████████████████| 440M/440M [00:22<00:00, 19.8MB/s]
loading weights file model.safetensors from cache at /Users/xiaoliwang/.cache/huggingface/hub/models--bert-base-uncased/snapshots/a265f773a47193eed794233aa2a0f0bb6d3eaa63/model.safetensors
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForTokenClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
06/20/2023 10:55:15 - INFO - __main__ - Sample 622 of the training set: {'input_ids': [101, 2522, 6657, 15222, 6962, 1015, 19739, 20486, 2072, 1014, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'labels': [-100, 3, -100, -100, -100, 0, 3, -100, -100, 0, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]}.
06/20/2023 10:55:15 - INFO - __main__ - Sample 12142 of the training set: {'input_ids': [101, 2019, 26354, 4861, 2056, 2008, 9779, 9048, 2015, 1010, 2007, 2095, 1011, 2203, 2727, 7045, 1997, 2149, 1002, 2184, 1012, 1023, 2454, 1998, 10067, 1997, 1002, 2184, 1012, 1019, 2454, 1010, 2052, 2022, 3205, 2006, 1996, 5548, 4518, 3863, 1010, 2021, 2106, 2025, 2360, 2043, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'labels': [-100, 0, 3, 0, 0, 0, 3, -100, -100, 0, 0, 0, -100, -100, 0, 0, 0, 7, -100, 0, -100, -100, 0, 0, 0, 0, 0, 0, -100, -100, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]}.
06/20/2023 10:55:15 - INFO - __main__ - Sample 4570 of the training set: {'input_ids': [101, 2117, 2679, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'labels': [-100, 0, 0, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]}.
Downloading builder script: 100%|████████████████████████████████████████████| 6.34k/6.34k [00:00<00:00, 9.02MB/s]
06/20/2023 10:55:18 - INFO - __main__ - ***** Running training *****
06/20/2023 10:55:18 - INFO - __main__ -   Num examples = 14041
06/20/2023 10:55:18 - INFO - __main__ -   Num Epochs = 3
06/20/2023 10:55:18 - INFO - __main__ -   Instantaneous batch size per device = 8
06/20/2023 10:55:18 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 8
06/20/2023 10:55:18 - INFO - __main__ -   Gradient Accumulation steps = 1
06/20/2023 10:55:18 - INFO - __main__ -   Total optimization steps = 5268
 33%|███████████████████████▋                                               | 1756/5268 [24:08<1:29:30,  1.53s/it]epoch 0: {'LOC_precision': 0.9499192245557351, 'LOC_recall': 0.9602612955906369, 'LOC_f1': 0.9550622631293991, 'LOC_number': 1837, 'MISC_precision': 0.8572972972972973, 'MISC_recall': 0.8600867678958786, 'MISC_f1': 0.858689767190038, 'MISC_number': 922, 'ORG_precision': 0.8539482879105521, 'ORG_recall': 0.9112602535421327, 'ORG_f1': 0.8816738816738816, 'ORG_number': 1341, 'PER_precision': 0.9776810016330975, 'PER_recall': 0.9766177270255574, 'PER_f1': 0.9771490750816105, 'PER_number': 1839, 'overall_precision': 0.9214876033057852, 'overall_recall': 0.9387102205758545, 'overall_f1': 0.9300191842522312, 'overall_accuracy': 0.9868336482091035}
 67%|████████████████████████████████████████████████▋                        | 3512/5268 [50:27<18:04,  1.62it/s]epoch 1: {'LOC_precision': 0.9637760702524698, 'LOC_recall': 0.9559063690800218, 'LOC_f1': 0.9598250888220825, 'LOC_number': 1837, 'MISC_precision': 0.8524251805985552, 'MISC_recall': 0.89587852494577, 'MISC_f1': 0.8736118455843469, 'MISC_number': 922, 'ORG_precision': 0.892675852066715, 'ORG_recall': 0.9179716629381058, 'ORG_f1': 0.9051470588235293, 'ORG_number': 1341, 'PER_precision': 0.9721925133689839, 'PER_recall': 0.9885807504078303, 'PER_f1': 0.9803181450525748, 'PER_number': 1839, 'overall_precision': 0.9322847682119205, 'overall_recall': 0.9481394174103385, 'overall_f1': 0.940145254194841, 'overall_accuracy': 0.9880217361665661}
100%|███████████████████████████████████████████████████████████████████████| 5268/5268 [1:15:39<00:00,  1.44it/s]epoch 2: {'LOC_precision': 0.9538378958668814, 'LOC_recall': 0.9673380511703865, 'LOC_f1': 0.9605405405405405, 'LOC_number': 1837, 'MISC_precision': 0.8783351120597652, 'MISC_recall': 0.8926247288503254, 'MISC_f1': 0.8854222700376547, 'MISC_number': 922, 'ORG_precision': 0.9074759437453738, 'ORG_recall': 0.9142431021625652, 'ORG_f1': 0.9108469539375927, 'ORG_number': 1341, 'PER_precision': 0.9751619870410367, 'PER_recall': 0.9820554649265906, 'PER_f1': 0.978596586290978, 'PER_number': 1839, 'overall_precision': 0.9381975678827253, 'overall_recall': 0.94830779592524, 'overall_f1': 0.9432255903533747, 'overall_accuracy': 0.9891513935687436}
Configuration saved in /tmp/test-ner/config.json
Model weights saved in /tmp/test-ner/pytorch_model.bin
tokenizer config file saved in /tmp/test-ner/tokenizer_config.json
Special tokens file saved in /tmp/test-ner/special_tokens_map.json
Traceback (most recent call last):
  File "/Users/xiaoliwang/repo/research/huggingface/transformers/examples/pytorch/token-classification/run_ner_no_trainer.py", line 784, in <module>
    main()
  File "/Users/xiaoliwang/repo/research/huggingface/transformers/examples/pytorch/token-classification/run_ner_no_trainer.py", line 780, in main
    json.dump(all_results, f)
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/json/encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/json/encoder.py", line 439, in _iterencode
    o = _default(o)
        ^^^^^^^^^^^
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type int64 is not JSON serializable
100%|███████████████████████████████████████████████████████████████████████| 5268/5268 [1:17:11<00:00,  1.14it/s]
Traceback (most recent call last):
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/site-packages/accelerate/commands/launch.py", line 969, in launch_command
    simple_launcher(args)
  File "/Users/xiaoliwang/development/miniforge3/envs/transformers/lib/python3.11/site-packages/accelerate/commands/launch.py", line 625, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/xiaoliwang/development/miniforge3/envs/transformers/bin/python3.11', 'run_ner_no_trainer.py', '--model_name_or_path', 'bert-base-uncased', '--dataset_name', 'conll2003', '--output_dir', '/tmp/test-ner', '--pad_to_max_length', '--task_name', 'ner', '--return_entity_level_metrics']' returned non-zero exit status 1.

I have reproduced this on my Macbook Air M1 with mps accleration enabled. The full error messages have been posted above here, same as on my Ubuntu workstation.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoli Thanks for explaining and adding this fix!

Could we instead do the conversion of the np.int64 values in all_results before saving, this way we don't blindly try to serialize everything to int within the json.dump call?

@xiaoli
Copy link
Contributor Author

xiaoli commented Jun 22, 2023

@amyeroberts Thanks for your comments!

I think your idea is good, and I understand that your intention is obviously to avoid that int convertment of everything.

But according to this page https://docs.python.org/3/library/json.html

If specified, default should be a function that gets called for objects that can’t otherwise be serialized. 
It should return a JSON encodable version of the object or raise a [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError). 
If not specified, [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) is raised.

From my understanding, this default parameter is just likely giving a new converter function, and in this case that function is a concise int(), yes, that's it. I think we don't need to write a new handler function to handling all different object types here, because we only cannot handle/serialize the np.int64 here.
So in the future if we have something more than that, I could definitely to write a new hanlder to take good care of them, hence for the time being, I think default=int is a good enough solution :)

@xiaoli
Copy link
Contributor Author

xiaoli commented Jun 26, 2023

Hi @amyeroberts, I have changed that a little bit as you mentioned before :)

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 26, 2023

The documentation is not available anymore as the PR was closed or merged.

@amyeroberts
Copy link
Collaborator

@xiaoli For the quality CI checks, you'll need to run make style at the top level of the repo and push any changes that are applied. Once this is done, CI should all be green and branch good to merge in 👍

@xiaoli
Copy link
Contributor Author

xiaoli commented Jun 26, 2023

@xiaoli For the quality CI checks, you'll need to run make style at the top level of the repo and push any changes that are applied. Once this is done, CI should all be green and branch good to merge in 👍

@amyeroberts Thanks for intructions, but I am afraid that so many files being changed after make style execution:

(transformers) ➜  transformers git:(main) ✗ git status
On branch main
Your branch is ahead of 'origin/main' by 8 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   examples/research_projects/codeparrot/scripts/human_eval.py
	modified:   examples/research_projects/fsner/src/fsner/tokenizer_utils.py
	modified:   examples/research_projects/jax-projects/big_bird/prepare_natural_questions.py
	modified:   examples/research_projects/luke/run_luke_ner_no_trainer.py
	modified:   examples/research_projects/lxmert/modeling_frcnn.py
	modified:   examples/research_projects/visual_bert/modeling_frcnn.py
	modified:   src/transformers/generation/logits_process.py
	modified:   src/transformers/generation/tf_logits_process.py
	modified:   src/transformers/generation/tf_utils.py
	modified:   src/transformers/keras_callbacks.py
	modified:   src/transformers/models/bert/convert_bert_pytorch_checkpoint_to_original_tf.py
	modified:   src/transformers/models/bigbird_pegasus/convert_bigbird_pegasus_tf_to_pytorch.py
	modified:   src/transformers/models/deta/modeling_deta.py
	modified:   src/transformers/models/dpr/tokenization_dpr.py
	modified:   src/transformers/models/dpr/tokenization_dpr_fast.py
	modified:   src/transformers/models/pegasus/convert_pegasus_tf_to_pytorch.py
	modified:   src/transformers/models/sam/processing_sam.py
	modified:   tests/generation/test_framework_agnostic.py
	modified:   tests/models/codegen/test_modeling_codegen.py
	modified:   tests/models/data2vec/test_modeling_data2vec_audio.py
	modified:   tests/models/encodec/test_modeling_encodec.py
	modified:   tests/models/gpt2/test_modeling_gpt2.py
	modified:   tests/models/gptj/test_modeling_gptj.py
	modified:   tests/models/hubert/test_modeling_hubert.py
	modified:   tests/models/mctct/test_modeling_mctct.py
	modified:   tests/models/rwkv/test_modeling_rwkv.py
	modified:   tests/models/sew/test_modeling_sew.py
	modified:   tests/models/sew_d/test_modeling_sew_d.py
	modified:   tests/models/speecht5/test_modeling_speecht5.py
	modified:   tests/models/unispeech/test_modeling_unispeech.py
	modified:   tests/models/unispeech_sat/test_modeling_unispeech_sat.py
	modified:   tests/models/wav2vec2/test_modeling_flax_wav2vec2.py
	modified:   tests/models/wav2vec2/test_modeling_wav2vec2.py
	modified:   tests/models/wav2vec2_conformer/test_modeling_wav2vec2_conformer.py
	modified:   tests/models/wavlm/test_modeling_wavlm.py
	modified:   tests/models/whisper/test_modeling_whisper.py
	modified:   tests/onnx/test_onnx.py
	modified:   tests/test_modeling_tf_common.py
	modified:   tests/test_tokenization_common.py
	modified:   tests/trainer/test_trainer_seq2seq.py
	modified:   utils/check_copies.py
	modified:   utils/create_dummy_models.py
	modified:   utils/tests_fetcher.py

no changes added to commit (use "git add" and/or "git commit -a")

@xiaoli
Copy link
Contributor Author

xiaoli commented Jun 26, 2023

@amyeroberts make style changes are committed, thank you 😁

@amyeroberts amyeroberts merged commit 239ace1 into huggingface:main Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants