Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prefix in get_input_fn / Fix assertion for even number of batch size #121

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

bzantium
Copy link

@bzantium bzantium commented Jul 4, 2019

This may be the solution for Issues #120, #111, #85

Copy link
Author

@bzantium bzantium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add pass_id and task in flags to manipulate file path in train_gpu.py

@Bagdu
Copy link

Bagdu commented Jul 4, 2019

This may be the solution for Issues #120, #111, #85

I have changed code , but i have the same error. TypeError: filenamesmust be atf.data.Datasetoftf.string elements

@bzantium bzantium changed the title Fix prefix in get_input_fn Fix prefix in get_input_fn / Fix assertion for even number of batch size Jul 4, 2019
@bzantium bzantium closed this Jul 4, 2019
@bzantium bzantium reopened this Jul 4, 2019
Copy link

@Bagdu Bagdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

@bzantium
Copy link
Author

bzantium commented Jul 4, 2019

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?
and the file names in the "tfrecords" folder like:
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json
train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.tfrecords

@Bagdu
Copy link

Bagdu commented Jul 4, 2019

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?

step 1 - python data_utils.py --bsz_per_host=32 --num_core_per_host=16 --seq_len=512 --reuse_len=256 --input_glob=books-sentences.txt --save_dir=fix --num_passes=20 --bi_data=True --sp_path=/home/ubuntu/giorgi/sp10m.cased.v3.model --mask_alpha=6 --mask_beta=1 --num_predict=85.
/home/ubuntu/xlnet/fix/tfrecords this is directory where tfrecords is stored.

step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/fix/tfrecords --train_batch_size=32 --seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

@bzantium
Copy link
Author

bzantium commented Jul 4, 2019

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?

step 1 - python data_utils.py --bsz_per_host=32 --num_core_per_host=16 --seq_len=512 --reuse_len=256 --input_glob=books-sentences.txt --save_dir=training --num_passes=20 --bi_data=True --sp_path=/home/ubuntu/giorgi/sp10m.cased.v3.model --mask_alpha=6 --mask_beta=1 --num_predict=85.
/home/ubuntu/xlnet/training/tfrecords this is directory where tfrecords is stored.

step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/training/tfrecords --train_batch_size=32--seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

Can you add space between --train_batch_size=32 and --seq_len=512 for step 2?

@Bagdu
Copy link

Bagdu commented Jul 4, 2019

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?

step 1 - python data_utils.py --bsz_per_host=32 --num_core_per_host=16 --seq_len=512 --reuse_len=256 --input_glob=books-sentences.txt --save_dir=training --num_passes=20 --bi_data=True --sp_path=/home/ubuntu/giorgi/sp10m.cased.v3.model --mask_alpha=6 --mask_beta=1 --num_predict=85.
/home/ubuntu/xlnet/training/tfrecords this is directory where tfrecords is stored.
step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/training/tfrecords --train_batch_size=32--seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

Can you add space between --train_batch_size=32 and --seq_len=512 for step 2?

I made a mistake in copy.

step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/fix/tfrecords --train_batch_size=32 --seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir=my_model .

@Bagdu
Copy link

Bagdu commented Jul 4, 2019

image
This is the tfrecords, which was generated by running data_utils.py.

@bzantium
Copy link
Author

bzantium commented Jul 4, 2019

This is the tfrecords, which was generated by running data_utils.py.

I think you should change flags("uncased") to False in data_utils.py. You can simply change file names
from: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.json
to: train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json
if you don't want to run data_utils.py again.

@Bagdu
Copy link

Bagdu commented Jul 4, 2019

image
This is the tfrecords, which was generated by running data_utils.py.

I think you should change flags.("uncased") to False in data_utils.py. You can simply change file name
from: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.json
to: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json

image

I have changed flags.DEFINE_bool("uncased", False, help="Use uncased inputs or not."), but i have same error: TypeError: filenamesmust be atf.data.Datasetoftf.stringelements.

When i train in gpu, in data_utils.py use_tpu FLAG should be False, i think.

@Bagdu
Copy link

Bagdu commented Jul 5, 2019

This is the tfrecords, which was generated by running data_utils.py.

I think you should change flags("uncased") to False in data_utils.py. You can simply change file names
from: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.json
to: train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json
if you don't want to run data_utils.py again.

Hi, Is here any news about my case?

@bzantium
Copy link
Author

bzantium commented Jul 6, 2019

Hi, Is here any news about my case?

I think everything looks ok... but can you try relative path? like:
sudo python3 train_gpu.py --record_info_dir=fix2/tfrecords --train_batch_size=32 --seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

When restart training, since prev_step is -1, curr_loss for the first print would be wrongly calculated.
@Aaradhyaiitr
Copy link

Aaradhyaiitr commented Jul 9, 2019

Inorder to do pre-training:

Apart from the above mentioned changes, notice and change the following things as well:

  1. Batchsize should be same in both train and data_utils.
  2. In Line 776 data_utils.py change uncased to None (uncased=None).
  3. In Line 236 modelling.py change assert bsz%2 == 0 to tf.debugging.assert_equal(bsz%2,0).
  4. add --uncased=True as an argument in train_gpu.

@bzantium
Copy link
Author

bzantium commented Jul 9, 2019

Inorder to do pre-training:

Apart from the above mentioned changes, notice and change the following things as well:

  1. Batchsize should be same in both train and data_utils.
  2. In Line 776 data_utils.py change uncased to None (uncased=None).
  3. In Line 236 modelling.py change assert bsz%2 == 0 to tf.debugging.assert_equal(bsz%2,0).
  4. add --uncased=True as an argument in train_gpu.

Thank you for the commits summarization!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants