You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The general is the same as follows, except that OUTPUT_DIR is changed to your decompressed bert based directory, and the batch_size is set to 8 since I'm running on V100-16GB. I also changed the MAX_EVAL_EXAMPLES=100 because it takes too much time to get the full eval performance.
BERT_PRETRAINED_DIR='/search/odin/Data/pre-trained-models/bert/uncased_L-12_H-768_A-12/'OUTPUT_DIR='/search/odin/Data/marco-passage-ranking/models/BERT_Base_trained_on_MSMARCO/'DATA_DIR='/search/odin/Data/marco-passage-ranking/tfrecord/'USE_TPU=FalseDO_TRAIN=False# Whether to run training.DO_EVAL=True# Whether to run evaluation.TRAIN_BATCH_SIZE=8EVAL_BATCH_SIZE=8LEARNING_RATE=1e-6NUM_TRAIN_STEPS=100NUM_WARMUP_STEPS=40000MAX_SEQ_LENGTH=512SAVE_CHECKPOINTS_STEPS=10ITERATIONS_PER_LOOP=100NUM_TPU_CORES=8BERT_CONFIG_FILE=os.path.join(BERT_PRETRAINED_DIR, 'bert_config.json')
INIT_CHECKPOINT=os.path.join(BERT_PRETRAINED_DIR, 'bert_model.ckpt')
MSMARCO_OUTPUT=False# Write the predictions to a MS-MARCO-formatted file.MAX_EVAL_EXAMPLES=100# Maximum number of examples to be evaluated.NUM_EVAL_DOCS=1000# Number of docs per query in the dev and eval files.METRICS_MAP= ['MAP', 'RPrec', 'NDCG', 'MRR', 'MRR@10']
Logging
The logging and performance are listed as follows. My concerns are:
Is the model loaded from your fine-tuned checkpoint properly? BTW, no logging info like *INIT_FROM_CKPT* occurs.
Why is the trained model performance so poor? MRR@10 = 0.01 for the top 100 eval examples. Is that expected? Since I only run for 100 eval examples (100 * 1000 entries are actually predicted.)
If the model is loaded improperly, how shall I load the model instead? Any example code?
WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:101: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
W1214 14:42:01.364466 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:101: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W1214 14:42:01.366216 140473733826368 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fc2139b9cb0>) includes params argument, but params are not passed to Estimator.
W1214 14:42:01.737018 140473733826368 estimator.py:1994] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fc2139b9cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '/search/odin/Data/marco-passage-ranking/models/BERT_Base_trained_on_MSMARCO/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc2104ed710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I1214 14:42:01.739134 140473733826368 estimator.py:212] Using config: {'_model_dir': '/search/odin/Data/marco-passage-ranking/models/BERT_Base_trained_on_MSMARCO/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 10, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fc2104ed710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I1214 14:42:01.740160 140473733826368 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W1214 14:42:01.740942 140473733826368 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:***** Running evaluation *****
I1214 14:42:01.741715 140473733826368 <ipython-input-3-e0f70c5ba30e>:280] ***** Running evaluation *****
INFO:tensorflow: Batch size = 8
I1214 14:42:01.742430 140473733826368 <ipython-input-3-e0f70c5ba30e>:281] Batch size = 8
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W1214 14:42:01.750143 140473733826368 deprecation.py:506] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenSequenceFeature is deprecated. Please use tf.io.FixedLenSequenceFeature instead.
W1214 14:42:01.832499 140473733826368 module_wrapper.py:139] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenSequenceFeature is deprecated. Please use tf.io.FixedLenSequenceFeature instead.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.
W1214 14:42:01.833697 140473733826368 module_wrapper.py:139] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
W1214 14:42:01.834634 140473733826368 module_wrapper.py:139] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:190: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W1214 14:42:02.021957 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:190: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:458: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W1214 14:42:02.024990 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:458: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:743: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1214 14:42:02.076209 140473733826368 deprecation.py:323] From /search/odin/Codes/marco-passage-ranking/modeling.py:743: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1214 14:42:02.077801 140473733826368 deprecation.py:323] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /search/odin/Codes/marco-passage-ranking/modeling.py:314: The name tf.erf is deprecated. Please use tf.math.erf instead.
W1214 14:42:02.173272 140473733826368 module_wrapper.py:139] From /search/odin/Codes/marco-passage-ranking/modeling.py:314: The name tf.erf is deprecated. Please use tf.math.erf instead.
WARNING:tensorflow:From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1214 14:42:04.507863 140473733826368 deprecation.py:323] From /root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:Read 10000 examples in 136 secs. Metrics so far:
W1214 14:44:17.816093 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 10000 examples in 136 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:44:17.818014 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.00090869 0. 0.07455417 0.00085925 0. ]
W1214 14:44:17.818776 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00090869 0. 0.07455417 0.00085925 0. ]
WARNING:tensorflow:Read 20000 examples in 262 secs. Metrics so far:
W1214 14:46:24.263339 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 20000 examples in 262 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:46:24.265303 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.00100144 0. 0.08361872 0.00097672 0. ]
W1214 14:46:24.266042 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00100144 0. 0.08361872 0.00097672 0. ]
WARNING:tensorflow:Read 30000 examples in 388 secs. Metrics so far:
W1214 14:48:30.680611 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 30000 examples in 388 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:48:30.682710 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.00108026 0. 0.0901455 0.00106377 0. ]
W1214 14:48:30.683455 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00108026 0. 0.0901455 0.00106377 0. ]
WARNING:tensorflow:Read 40000 examples in 515 secs. Metrics so far:
W1214 14:50:37.156615 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 40000 examples in 515 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:50:37.158547 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.00102465 0. 0.08564832 0.00101229 0. ]
W1214 14:50:37.159287 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00102465 0. 0.08564832 0.00101229 0. ]
WARNING:tensorflow:Read 50000 examples in 641 secs. Metrics so far:
W1214 14:52:43.648328 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 50000 examples in 641 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:52:43.650337 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.00102337 0. 0.08508468 0.00101348 0. ]
W1214 14:52:43.651077 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00102337 0. 0.08508468 0.00101348 0. ]
WARNING:tensorflow:Read 60000 examples in 768 secs. Metrics so far:
W1214 14:54:50.169556 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 60000 examples in 768 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:54:50.171495 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.00112496 0. 0.08702416 0.00111672 0. ]
W1214 14:54:50.172239 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00112496 0. 0.08702416 0.00111672 0. ]
WARNING:tensorflow:Read 70000 examples in 894 secs. Metrics so far:
W1214 14:56:56.690977 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 70000 examples in 894 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:56:56.692907 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.00109861 0. 0.0863206 0.00109154 0. ]
W1214 14:56:56.693676 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.00109861 0. 0.0863206 0.00109154 0. ]
WARNING:tensorflow:Read 80000 examples in 1021 secs. Metrics so far:
W1214 14:59:03.240334 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 80000 examples in 1021 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 14:59:03.242281 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.01356263 0.0125 0.09699456 0.01355645 0.0125 ]
W1214 14:59:03.243049 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.01356263 0.0125 0.09699456 0.01355645 0.0125 ]
WARNING:tensorflow:Read 90000 examples in 1148 secs. Metrics so far:
W1214 15:01:09.778834 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 90000 examples in 1148 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 15:01:09.780769 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.01214304 0.01111111 0.09414841 0.01213754 0.01111111]
W1214 15:01:09.781500 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.01214304 0.01111111 0.09414841 0.01213754 0.01111111]
WARNING:tensorflow:Read 100000 examples in 1274 secs. Metrics so far:
W1214 15:03:16.311386 140473733826368 <ipython-input-3-e0f70c5ba30e>:355] Read 100000 examples in 1274 secs. Metrics so far:
WARNING:tensorflow:MAP RPrec NDCG MRR MRR@10
W1214 15:03:16.313336 140473733826368 <ipython-input-3-e0f70c5ba30e>:356] MAP RPrec NDCG MRR MRR@10
WARNING:tensorflow:[0.01104599 0.01 0.09408713 0.01104105 0.01 ]
W1214 15:03:16.314079 140473733826368 <ipython-input-3-e0f70c5ba30e>:357] [0.01104599 0.01 0.09408713 0.01104105 0.01 ]
INFO:tensorflow:Eval dev:
I1214 15:03:16.407423 140473733826368 <ipython-input-3-e0f70c5ba30e>:368] Eval dev:
INFO:tensorflow:MAP RPrec NDCG MRR MRR@10
I1214 15:03:16.408445 140473733826368 <ipython-input-3-e0f70c5ba30e>:369] MAP RPrec NDCG MRR MRR@10
INFO:tensorflow:[0.01104599 0.01 0.09408713 0.01104105 0.01 ]
I1214 15:03:16.409163 140473733826368 <ipython-input-3-e0f70c5ba30e>:370] [0.01104599 0.01 0.09408713 0.01104105 0.01 ]
An exception has occurred, use %tb to see the full traceback.
SystemExit
/root/Softwares/anaconda3/envs/tf1.15/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3426: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
The text was updated successfully, but these errors were encountered:
Is the model loaded from your fine-tuned checkpoint properly? BTW, no logging info like INIT_FROM_CKPT occurs.
It seems that the checkpoint is not being loaded.
Why is the trained model performance so poor? MRR@10 = 0.01 for the top 100 eval examples. Is that expected? Since I only run for 100 eval examples (100 * 1000 entries are actually predicted.)
MRR@10 should be at least 0.30.
If the model is loaded improperly, how shall I load the model instead? Any example code?
I would first try to use a "dummy" path in which no checkpoint exists. If the log is identical to what you have now, then the problem is in BERT_PRETRAINED_DIR.
Hi, I use the colab code exactly from your demo.
Model config
The general is the same as follows, except that
OUTPUT_DIR
is changed to your decompressed bert based directory, and thebatch_size
is set to 8 since I'm running on V100-16GB. I also changed theMAX_EVAL_EXAMPLES=100
because it takes too much time to get the full eval performance.Logging
The logging and performance are listed as follows. My concerns are:
*INIT_FROM_CKPT*
occurs.MRR@10 = 0.01
for the top 100 eval examples. Is that expected? Since I only run for 100 eval examples (100 * 1000 entries are actually predicted.)The text was updated successfully, but these errors were encountered: