You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to apply a preprocessor to remove html tags from the fileds of a Tensorflow dataset. I use this in the implementation of t5 which uses segio.
in my_preprocessor if I use text = row[field_index], it works okay and in training, it prints several rows and continues, but if I use text = normalize_text(row[field_index]) it hangs on an infinite or long loop as if it wants to apply it on the whole dataset rows!
def normalize_text(text):
"""Lowercase and remove quotes and tags from a TensorFlow string."""
text = tf.strings.lower(text)
text = tf.strings.regex_replace(text,"'(.*)'", r"\1")
text = tf.strings.regex_replace(text,"<[^>]+>", " ")
return text
def make_add_field_names_preprocessor(
field_names: Sequence[str], field_indices: Optional[Sequence[int]] = None,
) -> Callable:
def my_preprocessor(ds):
def to_inputs_and_targets(*row):
ret = {}
count=0
tf.print("=======================")
for field_name, field_index in zip(field_names, field_indices):
# if I use this, it works okay
text = row[field_index]
# if I use the following it falls to a long or endless loop
text = normalize_text(row[field_index])
tf.print(count, "Row:",text)
ret[field_name] = text
count+=1
return ret
return ds.map(to_inputs_and_targets,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
return my_preprocessor
Update: The preprocessor gets two field names, "inputs" for row[1], and "targets" for row[2]. I noticed that the problem exists many when I call normalize_text on row[2]
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm trying to apply a preprocessor to remove html tags from the fileds of a Tensorflow dataset. I use this in the implementation of
t5
which usessegio
.in
my_preprocessor
if I usetext = row[field_index]
, it works okay and in training, it prints several rows and continues, but if I usetext = normalize_text(row[field_index])
it hangs on an infinite or long loop as if it wants to apply it on the whole dataset rows!Update: The preprocessor gets two field names, "inputs" for row[1], and "targets" for row[2]. I noticed that the problem exists many when I call
normalize_text
onrow[2]
Beta Was this translation helpful? Give feedback.
All reactions