-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue when saving TF model with a tokenizer as a custom layer #422
Comments
Am I the only one to get this error ? |
Is it just the BertTokenizer? I'll pass this on to somebody more familiar with Keras. |
Thanks! I missed this over the holidays. We'll take a took. |
I am also running into this issue and a similar work-around. In particular, I found that the BertTokenizer needs to be wrapped in a Lambda layer: class TspBertTokenizer(keras.layers.Layer):
def __init__(self, vocab_file, cls_token_id=None, sep_token_id=None, **kwargs):
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as K
import tensorflow_text as text
super(TspBertTokenizer, self).__init__(**kwargs)
self.vocab_file = vocab_file
bert_tokenizer = text.BertTokenizer(self.vocab_file, token_out_type=tf.int32, lower_case=True)
self.tokenize = keras.layers.Lambda(lambda text_input: bert_tokenizer.tokenize(text_input), name="bert_tokenizer")
basic_tokenizer, wordpiece_tokenizer = bert_tokenizer.submodules
self.cls_token_id = cls_token_id if cls_token_id is not None else K.get_value(wordpiece_tokenizer.tokenize("[CLS]")[0]).item()
self.sep_token_id = sep_token_id if sep_token_id is not None else K.get_value(wordpiece_tokenizer.tokenize("[SEP]")[0]).item()
def call(self, nlp_input):
word_tokens = self.tokenize(nlp_input)
flattened_tokens = word_tokens.merge_dims(1, -1)
return flattened_tokens
def get_config(self):
return {
"vocab_file": self.vocab_file,
"cls_token_id": self.cls_token_id,
"sep_token_id": self.sep_token_id,
**super(TspBertTokenizer, self).get_config()
} Then, it can be added to a Keras Layer. I think this functionally works and an export can be done. However, it is not clear if performance is ideal. I get the following: [1,0]<stderr>:WARNING:tensorflow:AutoGraph could not transform <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>> and will run it as-is.
[1,0]<stderr>:Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
[1,0]<stderr>:Cause: Unable to locate the source code of <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
[1,0]<stderr>:To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
[1,0]<stderr>:AutoGraph could not transform <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>> and will run it as-is.
[1,0]<stderr>:Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
[1,0]<stderr>:Cause: Unable to locate the source code of <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
[1,0]<stderr>:To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
[1,0]<stderr>:2021-07-08 16:12:23.837909: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: pericles/nlp_input/cross_nlp/tsp_bert_tokenizer/bert_tokenizer/RaggedFromUniformRowLength/RowPartitionFromUniformRowLength/assert_greater_equal/Assert/AssertGuard/branch_executed/_107 I do not know if any of these warnings degrade performance or hurt model accuracy. Any feedback on if these warnings are an issue or better work-arounds are much appreciated! This is on TF 2.5. Also, I filed tensorflow/models#10115 as a downstream issue as well. See the gist there for the export issue without the Lambda. |
Thanks for the report! I'll take a look at this and see if we can get a fix pushed soon. |
Hi,
I am trying to create a tensorflow model with keras api, when I include the tokenizing process inside the model. It seems to work for the inference locally, but when I am saving the model with
tf.saved_model.save
, I got an error. I am wondering if there is something wrong in my current code, or if it is currently not possible ?My tokenizer which use the BertTokenizer from tensorflow_text (I take the code from some discussion in this forum and modify it) :
My current model :
PS : I am using TF 2.3.1
The text was updated successfully, but these errors were encountered: