Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow 2.6 error with JAX/FLAX implementation #14265

Closed
stefan-it opened this issue Nov 3, 2021 · 8 comments
Closed

TensorFlow 2.6 error with JAX/FLAX implementation #14265

stefan-it opened this issue Nov 3, 2021 · 8 comments
Assignees

Comments

@stefan-it
Copy link
Collaborator

stefan-it commented Nov 3, 2021

Hi guys,

this is probably a TPU-related bug and appears when using the JAX/FLAX implementation in combination with TensorFlow in version 2.6.0 and 2.6.1:

Traceback (most recent call last):                                  
  File "/home/stefan/transformers/src/transformers/file_utils.py", line 2147, in _get_module                                                                                            
    return importlib.import_module("." + module_name, self.__name__)                                                                                                                    
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)                                                                                                                         
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import     
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load                                                                                                                     
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked                                                                                                                     
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                                                          
  File "/home/stefan/transformers/src/transformers/modeling_tf_utils.py", line 637, in <module>                                                                                         
    class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, PushToHubMixin):                                                                                      
  File "/home/stefan/dev/lib/python3.8/site-packages/tensorflow/python/util/lazy_loader.py", line 62, in __getattr__                                                                    
    module = self._load()                     
  File "/home/stefan/dev/lib/python3.8/site-packages/tensorflow/python/util/lazy_loader.py", line 45, in _load                                                                          
    module = importlib.import_module(self.__name__)                                                                                                                                                                                                                                                                                                                             
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module                                                                                                           
    return _bootstrap._gcd_import(name[level:], package, level)             
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import                                                                                                                       
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load                                                                                                                     
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked                                                                                                            
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                                                          
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import                                                                                                                       
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load                                                                                                                     
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked                                                                                                            
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                                                          
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import                                                                                                                       
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked  
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import                                                                                                                       
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load                                                                                                                     
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked   
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module                                                                                                                                                                                                                                                                                                       
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed                                                                                                          
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/__init__.py", line 25, in <module>                                                                                           
    from keras import models       
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/models.py", line 20, in <module>
    from keras import metrics as metrics_module           
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/metrics.py", line 26, in <module>                                                                                            
    from keras import activations                                                                                                                                                       
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/activations.py", line 20, in <module>                             
    from keras.layers import advanced_activations                                                                                                                                       
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/layers/__init__.py", line 23, in <module>                                
    from keras.engine.input_layer import Input                                                                                                                                          
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/engine/input_layer.py", line 21, in <module>                                  
    from keras.engine import base_layer                                                                                                                                                 
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/engine/base_layer.py", line 43, in <module>                                            
    from keras.mixed_precision import loss_scale_optimizer                                                                                                                              
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/mixed_precision/loss_scale_optimizer.py", line 18, in <module>             
    from keras import optimizers                                                                                                                                                        
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/optimizers.py", line 26, in <module>                                                                 
    from keras.optimizer_v2 import adadelta as adadelta_v2                                                                                                                              
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/optimizer_v2/adadelta.py", line 22, in <module>                                                                                                                                                                                                                                                                      
    from keras.optimizer_v2 import optimizer_v2                                                                                                                                         
  File "/home/stefan/dev/lib/python3.8/site-packages/keras/optimizer_v2/optimizer_v2.py", line 36, in <module>                                                                 
    keras_optimizers_gauge = tf.__internal__.monitoring.BoolGauge(                                                                                                                      
  File "/home/stefan/dev/lib/python3.8/site-packages/tensorflow/python/eager/monitoring.py", line 360, in __init__                                               
    super(BoolGauge, self).__init__('BoolGauge', _bool_gauge_methods,                                                                                                                   
  File "/home/stefan/dev/lib/python3.8/site-packages/tensorflow/python/eager/monitoring.py", line 135, in __init__                                                                                                                                                                                                                                                              
    self._metric = self._metric_methods[self._label_length].create(*args)                                                                                                               
tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists.

The above exception was the direct cause of the following exception:                                                                                                                    
                                                                                                                                                                                        
Traceback (most recent call last):                                                                                                                                                      
  File "/home/stefan/transformers/src/transformers/file_utils.py", line 2147, in _get_module                                                                                                                                                                                                                                                                                    
    return importlib.import_module("." + module_name, self.__name__)                                                                                                                    
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/stefan/transformers/src/transformers/models/__init__.py", line 19, in <module> 
    from . import (
  File "/home/stefan/transformers/src/transformers/models/layoutlm/__init__.py", line 22, in <module>
    from .configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
  File "/home/stefan/transformers/src/transformers/models/layoutlm/configuration_layoutlm.py", line 22, in <module>
    from ...onnx import OnnxConfig, PatchingSpec
  File "/home/stefan/transformers/src/transformers/onnx/__init__.py", line 17, in <module>
    from .convert import export, validate_model_outputs
  File "/home/stefan/transformers/src/transformers/onnx/convert.py", line 23, in <module>
    from .. import PreTrainedModel, PreTrainedTokenizer, TensorType, TFPreTrainedModel, is_torch_available
  File "<frozen importlib._bootstrap>", line 1039, in _handle_fromlist
  File "/home/stefan/transformers/src/transformers/file_utils.py", line 2137, in __getattr__ 
    module = self._get_module(self._class_to_module[name])
  File "/home/stefan/transformers/src/transformers/file_utils.py", line 2149, in _get_module 
    raise RuntimeError(
RuntimeError: Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
Another metric with the same name already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_mlm_flax.py", line 45, in <module>
    from transformers import (
  File "<frozen importlib._bootstrap>", line 1039, in _handle_fromlist
  File "/home/stefan/transformers/src/transformers/file_utils.py", line 2137, in __getattr__ 
    module = self._get_module(self._class_to_module[name])
  File "/home/stefan/transformers/src/transformers/file_utils.py", line 2149, in _get_module 
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.auto because of the following error (look up to see its traceback):
Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
Another metric with the same name already exists.

I could reproduce it using the run_mlm_flax.py example, e.g. with:

python3 run_mlm_flax.py --model_type bert --config_name /mnt/datasets/bert-base-historic-multilingual-64k-cased --tokenizer_name /mnt/datasets/bert-base-historic-multilingual-64k-cased --train_file /mnt/datasets/hlms/bl_1800-1900_extracted.txt --validation_file /mnt/datasets/hlms/english_validation.txt --max_seq_length 512 --per_device_train_batch_size 16 --learning_rate 1e-4 --num_train_epochs 10 --preprocessing_num_workers 16 --output_dir /mnt/datasets/bert-base-historic-multilingual-64k-cased-512 --save_steps 2500 --eval_steps 2500 --warmup_steps 10000

It does not appear when using TensorFlow in version 2.5.0. I'm using latest master version of both Transformers and Datasets.

@patrickvonplaten
Copy link
Contributor

Thanks a lot for the issue @stefan-it ! Would it be fine for now for you to stick to TensorFlow version 2.5.0?
We'll definitely take a look and try to fix it asap, but might take some days since @patil-suraj is on holiday right now

@ftesser
Copy link

ftesser commented Nov 4, 2021

Hi all, I get a similar issue, and I think is related to this issue:

tensorflow/tensorflow#52922

Hopefully will be solved by TF 2.6.2:

This would be fixed in ~12 hours by a release of TF 2.6.2 patch release and TF 2.7.0 release.

tensorflow/tensorflow#52922 (comment)

@frgfm
Copy link

frgfm commented Nov 4, 2021

Hello there 👋

I happen to have encountered about the same problem on a CI build job today, and wasn't occurring yesterday. So I investigated, and the culprit seems to be keras 2.7 and not tensorflow: keras-team/keras#15585

On my end, the solution was to constraint the version index of keras to <2.7 but I'll report back if a more stable fix is implemented 👍

@patrickvonplaten
Copy link
Contributor

@avital @skye @marcvanzee - I think there seems to be a problem with the new keras release and JAX on TPU. Could you guys maybe check? :-)

@marcvanzee
Copy link
Contributor

Hi @patrickvonplaten, I checked the issue. Perhaps I missed something, but it doesn't look like a Flax/JAX/TPU issue to me.

I could indeed reproduce the problem on my machine with a similar stack trace, but from reading the stack trace, it seems like there is a conflict in importing modules from Keras.

What makes you think this is related to JAX on TPU?

@marcvanzee
Copy link
Contributor

I managed to reproduce this issue by installing tensorflow==2.6.1 and keras==2.7 and running:

from keras import optimizers

(Suggested in keras-team/keras#15579)

@patrickvonplaten
Copy link
Contributor

Gottcha! Sorry, yeah in this case, it does not seem to be related to JAX/FLAX at all, but tensorflow. Sorry for pinging you here

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants