-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RoFormer] Fix some issues #12397
[RoFormer] Fix some issues #12397
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these fixes. 2 main comments:
- your PR updates the
utils_qa.py
file of the Tensorflow examples. Any reason why this is the case? - the jieba dependency of the tokenizer should be implemented in
file_utils.py
andtesting_utils.py
is_world_process_zero: bool = True, | ||
log_level: Optional[int] = logging.WARNING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you updated this? Same question for the lines below
Don't think this RoFormer PR needs to update the utils_qa.py
file of the Tensorflow examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forked transformers before this pr 276bc14.
And this pr do not run command fix-copies
.
I run command fix-copies
and then this file examples/tensorflow/question-answering/utils_qa.py
change.
import rjieba | ||
import jieba | ||
except ImportError: | ||
raise ImportError( | ||
"You need to install rjieba to use RoFormerTokenizer." | ||
"See https://pypi.org/project/rjieba/ for installation." | ||
"You need to install jieba to use RoFormerTokenizer." | ||
"See https://pypi.org/project/jieba/ for installation." | ||
) | ||
self.jieba = rjieba | ||
self.jieba = jieba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct way of handling the jieba dependency @LysandreJik?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to handle it this way as it's the only model that requires it - and if other models arrive, then to upstream it like it is done for the other models.
try: | ||
import rjieba | ||
import jieba | ||
except ImportError: | ||
raise ImportError( | ||
"You need to install rjieba to use RoFormerTokenizer." | ||
"See https://pypi.org/project/rjieba/ for installation." | ||
"You need to install jieba to use RoFormerTokenizer." | ||
"See https://pypi.org/project/jieba/ for installation." | ||
) | ||
self.jieba = rjieba | ||
self.jieba = jieba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you already have the try: except
block at the init of the tokenizer, is this required here?
tests/test_tokenization_roformer.py
Outdated
def is_rjieba_available(): | ||
return importlib.util.find_spec("rjieba") is not None | ||
def is_jieba_available(): | ||
return importlib.util.find_spec("jieba") is not None | ||
|
||
|
||
def require_rjieba(test_case): | ||
def require_jieba(test_case): | ||
""" | ||
Decorator marking a test that requires Jieba. These tests are skipped when Jieba isn't installed. | ||
""" | ||
if not is_rjieba_available(): | ||
return unittest.skip("test requires rjieba")(test_case) | ||
if not is_jieba_available(): | ||
return unittest.skip("test requires jieba")(test_case) | ||
else: | ||
return test_case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have changed this :) .
is_world_process_zero (:obj:`bool`, `optional`, defaults to :obj:`True`): | ||
Whether this process is the main process or not (used to determine if logging/saves should be done). | ||
log_level (:obj:`int`, `optional`, defaults to ``logging.WARNING``): | ||
``logging`` log level (e.g., ``logging.WARNING``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
logger.setLevel(logging.INFO if is_world_process_zero else logging.WARN) | ||
logger.setLevel(log_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
print(f"Saving predictions to {prediction_file}.") | ||
logger.info(f"Saving predictions to {prediction_file}.") | ||
with open(prediction_file, "w") as writer: | ||
writer.write(json.dumps(all_predictions, indent=4) + "\n") | ||
print(f"Saving nbest_preds to {nbest_file}.") | ||
logger.info(f"Saving nbest_preds to {nbest_file}.") | ||
with open(nbest_file, "w") as writer: | ||
writer.write(json.dumps(all_nbest_json, indent=4) + "\n") | ||
if version_2_with_negative: | ||
print(f"Saving null_odds to {null_odds_file}.") | ||
logger.info(f"Saving null_odds to {null_odds_file}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
logger.setLevel(logging.INFO if is_world_process_zero else logging.WARN) | ||
logger.setLevel(log_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
is_world_process_zero: bool = True, | ||
log_level: Optional[int] = logging.WARNING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
is_world_process_zero (:obj:`bool`, `optional`, defaults to :obj:`True`): | ||
Whether this process is the main process or not (used to determine if logging/saves should be done). | ||
log_level (:obj:`int`, `optional`, defaults to ``logging.WARNING``): | ||
``logging`` log level (e.g., ``logging.WARNING``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
import jieba | ||
|
||
self.jieba = jieba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here for @LysandreJik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woul dbe nice to leave the try/except statement
Ok thanks, I just will let Lysandre review the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but why did you change rjieba
to jieba
? Is the latter better?
import rjieba | ||
import jieba | ||
except ImportError: | ||
raise ImportError( | ||
"You need to install rjieba to use RoFormerTokenizer." | ||
"See https://pypi.org/project/rjieba/ for installation." | ||
"You need to install jieba to use RoFormerTokenizer." | ||
"See https://pypi.org/project/jieba/ for installation." | ||
) | ||
self.jieba = rjieba | ||
self.jieba = jieba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to handle it this way as it's the only model that requires it - and if other models arrive, then to upstream it like it is done for the other models.
import jieba | ||
|
||
self.jieba = jieba |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woul dbe nice to leave the try/except statement
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
@LysandreJik I found
|
cc @Narsil |
@JunnYu Should work now: We do update the API regularly with dependencies, |
@Narsil thank you! |
@LysandreJik
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM! Thanks @JunnYu!
What does this PR do?
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@patil-suraj