Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated BERT processing to current interface #1223

Merged
merged 1 commit into from
Nov 19, 2020

Conversation

piotrpiekos
Copy link
Contributor

Adapts old code to new interface. For now it's just fine-tuning bert of chosen glue dataset without evaluating on test set

I didn't use SubwordTextEncoder because it gives different token_ids on words that consist of more than one token (e.g. "neglecting").
I didn't upgrade tfds_stream to data_stream, because it might be necessary in BERT models to be able to do processing after batching, so this is to be revisited after implementing more BERT family models.

@google-cla google-cla bot added the cla: yes label Nov 15, 2020
@afrozenator afrozenator added the ready to pull Added when the PR is ready to be merged. label Nov 16, 2020
@afrozenator
Copy link
Contributor

Hi @piotrekp1 - Thanks for adding this! I'll accept it right now without changes, but in a future PR could you move it to trax/data instead and add a simple unit test there? (That package should have examples tests with testdata that will make this easy).

@afrozenator
Copy link
Contributor

@henrykmichalewski as FYI

@copybara-service copybara-service bot merged commit be84553 into google:master Nov 19, 2020
@piotrpiekos
Copy link
Contributor Author

Hi @afrozenator. I'll do that, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull Added when the PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants