Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sentencepiece binding to torchtext, plus example to build torchtext dataset with sentencepiece #597

Merged
merged 71 commits into from
Sep 26, 2019

Commits on Aug 26, 2019

  1. Add sentencepiece dependency to torchtext package

    Guanheng Zhang committed Aug 26, 2019
    Configuration menu
    Copy the full SHA
    1be09e2 View commit details
    Browse the repository at this point in the history
  2. Add generate_sentencepiece_tokenizer_model

    Guanheng Zhang committed Aug 26, 2019
    Configuration menu
    Copy the full SHA
    160930f View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2019

  1. Some tests.

    Guanheng Zhang committed Aug 29, 2019
    Configuration menu
    Copy the full SHA
    3abe2a8 View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2019

  1. Configuration menu
    Copy the full SHA
    4e230e9 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2019

  1. Add a sentencepiece example.

    Guanheng Zhang committed Sep 9, 2019
    Configuration menu
    Copy the full SHA
    030fc4e View commit details
    Browse the repository at this point in the history
  2. Reset text_classification dataset.

    Guanheng Zhang committed Sep 9, 2019
    Configuration menu
    Copy the full SHA
    2a219b0 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'sentence_piece' into example_spm

    Guanheng Zhang committed Sep 9, 2019
    Configuration menu
    Copy the full SHA
    853edbc View commit details
    Browse the repository at this point in the history
  4. Train a model if spm is not provided.

    Guanheng Zhang committed Sep 9, 2019
    Configuration menu
    Copy the full SHA
    69f2ae0 View commit details
    Browse the repository at this point in the history
  5. Allow to call setup_datasets() func

    Guanheng Zhang committed Sep 9, 2019
    Configuration menu
    Copy the full SHA
    7bf8146 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2019

  1. Remove vocab.

    Guanheng Zhang committed Sep 10, 2019
    Configuration menu
    Copy the full SHA
    5f4ee68 View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2019

  1. Minor changes

    Guanheng Zhang committed Sep 11, 2019
    Configuration menu
    Copy the full SHA
    c41eea3 View commit details
    Browse the repository at this point in the history
  2. Delete vocab.

    Guanheng Zhang committed Sep 11, 2019
    Configuration menu
    Copy the full SHA
    eb6fa30 View commit details
    Browse the repository at this point in the history
  3. Add train file.

    Guanheng Zhang committed Sep 11, 2019
    Configuration menu
    Copy the full SHA
    457cd39 View commit details
    Browse the repository at this point in the history
  4. Add args to train.py file.

    Guanheng Zhang committed Sep 11, 2019
    Configuration menu
    Copy the full SHA
    c883dd1 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2019

  1. Add a run script.

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    943538b View commit details
    Browse the repository at this point in the history
  2. flake8

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    d6eae59 View commit details
    Browse the repository at this point in the history
  3. Update the test for get_tokenizer

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    43b78a3 View commit details
    Browse the repository at this point in the history
  4. Add a test file.

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    98470a0 View commit details
    Browse the repository at this point in the history
  5. Fix flake8

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    82ecd71 View commit details
    Browse the repository at this point in the history
  6. Add test to cover generate_sp_tokenizer() function.

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    535be98 View commit details
    Browse the repository at this point in the history
  7. Add docs

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    c787099 View commit details
    Browse the repository at this point in the history
  8. Add spm_data_generator() func.

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    f0da161 View commit details
    Browse the repository at this point in the history
  9. flake8

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    1d40431 View commit details
    Browse the repository at this point in the history
  10. Fix flake8

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    251ac60 View commit details
    Browse the repository at this point in the history
  11. Add test to cover spm_data_generator

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    a2b628c View commit details
    Browse the repository at this point in the history
  12. Minor

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    80b0eec View commit details
    Browse the repository at this point in the history
  13. change printout

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    c375ae3 View commit details
    Browse the repository at this point in the history
  14. skip test_get_tokenizer_sentencepiece in python2 envir.

    Guanheng Zhang committed Sep 12, 2019
    Configuration menu
    Copy the full SHA
    d2d5438 View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2019

  1. Flake8

    Guanheng Zhang committed Sep 13, 2019
    Configuration menu
    Copy the full SHA
    4959eb6 View commit details
    Browse the repository at this point in the history
  2. add coding utf-8

    Guanheng Zhang committed Sep 13, 2019
    Configuration menu
    Copy the full SHA
    f9d5314 View commit details
    Browse the repository at this point in the history
  3. add coding utf-8

    Guanheng Zhang committed Sep 13, 2019
    Configuration menu
    Copy the full SHA
    75015ed View commit details
    Browse the repository at this point in the history
  4. flake8

    Guanheng Zhang committed Sep 13, 2019
    Configuration menu
    Copy the full SHA
    80fc33e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    5fd36de View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2019

  1. Revise based on reviewers' feedback.

    Guanheng Zhang committed Sep 18, 2019
    Configuration menu
    Copy the full SHA
    7505649 View commit details
    Browse the repository at this point in the history
  2. Add sentencepiece transform.

    Guanheng Zhang committed Sep 18, 2019
    Configuration menu
    Copy the full SHA
    556a3bc View commit details
    Browse the repository at this point in the history
  3. Add test to cover SentencePieceTransform.

    Guanheng Zhang committed Sep 18, 2019
    Configuration menu
    Copy the full SHA
    3f48f2b View commit details
    Browse the repository at this point in the history
  4. Remove sentencepiece from get_tokenizer.

    Guanheng Zhang committed Sep 18, 2019
    Configuration menu
    Copy the full SHA
    b7d12ed View commit details
    Browse the repository at this point in the history
  5. Fix lint error.

    Guanheng Zhang committed Sep 18, 2019
    Configuration menu
    Copy the full SHA
    2436af0 View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2019

  1. Update test result.

    Guanheng Zhang committed Sep 19, 2019
    Configuration menu
    Copy the full SHA
    d5e159d View commit details
    Browse the repository at this point in the history
  2. flake8

    Guanheng Zhang committed Sep 19, 2019
    Configuration menu
    Copy the full SHA
    c37b500 View commit details
    Browse the repository at this point in the history
  3. Update README.md

    Guanheng Zhang committed Sep 19, 2019
    Configuration menu
    Copy the full SHA
    f4e4f15 View commit details
    Browse the repository at this point in the history
  4. Switch to torch.nn.Module.

    Guanheng Zhang committed Sep 19, 2019
    Configuration menu
    Copy the full SHA
    ac2a8ee View commit details
    Browse the repository at this point in the history
  5. Add docs for SentencePieceTransform.

    Guanheng Zhang committed Sep 19, 2019
    Configuration menu
    Copy the full SHA
    2f859f4 View commit details
    Browse the repository at this point in the history
  6. Remove spm_data_generator.

    Guanheng Zhang committed Sep 19, 2019
    Configuration menu
    Copy the full SHA
    7fea165 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2019

  1. Add sentencepiece functionals.

    Guanheng Zhang committed Sep 20, 2019
    Configuration menu
    Copy the full SHA
    5d84383 View commit details
    Browse the repository at this point in the history
  2. Fix a failed test.

    Guanheng Zhang committed Sep 20, 2019
    Configuration menu
    Copy the full SHA
    50c5896 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7ffba4a View commit details
    Browse the repository at this point in the history
  4. Remove spm_data_generator docs.

    Guanheng Zhang committed Sep 20, 2019
    Configuration menu
    Copy the full SHA
    1fbda39 View commit details
    Browse the repository at this point in the history

Commits on Sep 23, 2019

  1. move generate_sp_tokenizer to functional.py.

    Guanheng Zhang committed Sep 23, 2019
    Configuration menu
    Copy the full SHA
    3eb82e6 View commit details
    Browse the repository at this point in the history
  2. Fix the docstring of lr-garmma.

    Guanheng Zhang committed Sep 23, 2019
    Configuration menu
    Copy the full SHA
    5812ce6 View commit details
    Browse the repository at this point in the history
  3. Flake8

    Guanheng Zhang committed Sep 23, 2019
    Configuration menu
    Copy the full SHA
    f84a933 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2019

  1. Configuration menu
    Copy the full SHA
    8402863 View commit details
    Browse the repository at this point in the history
  2. transform generator.

    Guanheng Zhang committed Sep 24, 2019
    Configuration menu
    Copy the full SHA
    70f09f9 View commit details
    Browse the repository at this point in the history
  3. Revise

    Guanheng Zhang committed Sep 24, 2019
    Configuration menu
    Copy the full SHA
    4d6881b View commit details
    Browse the repository at this point in the history
  4. fix example

    Guanheng Zhang committed Sep 24, 2019
    Configuration menu
    Copy the full SHA
    59752a7 View commit details
    Browse the repository at this point in the history
  5. Fix flake8

    Guanheng Zhang committed Sep 24, 2019
    Configuration menu
    Copy the full SHA
    0e9844c View commit details
    Browse the repository at this point in the history
  6. use sp transform generator.

    Guanheng Zhang committed Sep 24, 2019
    Configuration menu
    Copy the full SHA
    3e2d8b8 View commit details
    Browse the repository at this point in the history
  7. Merge text_classification and sentencepiece examples.

    Guanheng Zhang committed Sep 24, 2019
    Configuration menu
    Copy the full SHA
    6215f95 View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2019

  1. Add README file.

    Guanheng Zhang committed Sep 25, 2019
    Configuration menu
    Copy the full SHA
    6740b9d View commit details
    Browse the repository at this point in the history
  2. Merge sentencepiece example to text_classification.

    Guanheng Zhang committed Sep 25, 2019
    Configuration menu
    Copy the full SHA
    0d9162a View commit details
    Browse the repository at this point in the history
  3. Add warning message.

    Guanheng Zhang committed Sep 25, 2019
    Configuration menu
    Copy the full SHA
    803f24f View commit details
    Browse the repository at this point in the history
  4. a few comments.

    Guanheng Zhang committed Sep 25, 2019
    Configuration menu
    Copy the full SHA
    ce8b558 View commit details
    Browse the repository at this point in the history
  5. Minor.

    Guanheng Zhang committed Sep 25, 2019
    Configuration menu
    Copy the full SHA
    200a4e4 View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2019

  1. Minor edits.

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    8aabcdd View commit details
    Browse the repository at this point in the history
  2. Change the test names.

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    b93149f View commit details
    Browse the repository at this point in the history
  3. Move transforms to functional.

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    08dd450 View commit details
    Browse the repository at this point in the history
  4. Flake8

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    6a69800 View commit details
    Browse the repository at this point in the history
  5. Change to --sp-vocab-size example train.py.

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    f01e201 View commit details
    Browse the repository at this point in the history
  6. Fix an error in unit tests.

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    20534ea View commit details
    Browse the repository at this point in the history
  7. set vocab size.

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    7136d8c View commit details
    Browse the repository at this point in the history
  8. Fix docs.

    Guanheng Zhang committed Sep 26, 2019
    Configuration menu
    Copy the full SHA
    76f8d8c View commit details
    Browse the repository at this point in the history