-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[T5] T5 in ParlAI #3519
[T5] T5 in ParlAI #3519
Conversation
does anyone know why circle can't find tests to run? did i mess something up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, seems reasonable at a quick glance, but will defer to others with more context for approval
I'm re-requesting reviews because the model now lives in
|
|
Running all gpu tests locally, I get:
we can ignore the first two; the third might be an issue with my downloaded model for that? (the assertion that failed was |
all tests pass!! PR is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me. Thanks for adding this!
parlai/agents/hugging_face/dict.py
Outdated
self.override_special_tokens(opt) | ||
for i in range(self.tokenizer.vocab_size): | ||
token = self.tokenizer._convert_id_to_token(i) | ||
self.add_token(token) | ||
self.freq[token] = 1 | ||
self._unk_token_idx = self.hf_tokenizer.unk_token_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these lines be swapped? Right now _unk_token_idx
from override_special_tokens
is overridden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
except ImportError: | ||
raise ImportError('Please run `pip install transformers`.') | ||
|
||
|
||
version = float('.'.join(transformers.__version__.split('.')[:2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; Can we identify this as a global more distinctly somehow? So that we don't end up mixing this up with some other variable named "version". Perhaps just all caps (VERSION
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i'll make it more explicit (HF_VERSION
)
failing gpu test is flaky |
Patch description
Port of HuggingFace's T5 model for ParlAI. Tagging people I thought might be interested, This PR serves a dual purpose:
The second point is relevant in that I tried writing a generalized version of a
DictionaryAgent
that wraps a HF tokenizer.On that note - I am seeking feedback on whether this would be more appropriate in a
utils
file (e.g.parlai.utils.huggingface
)Testing steps
Extensive CI; HF had some nice integration tests I copied over, and I used similar integration tests as the ones used for BART.