Fix prefix in get_input_fn / Fix assertion for even number of batch size #121

bzantium · 2019-07-04T08:12:27Z

This may be the solution for Issues #120, #111, #85

bzantium

add pass_id and task in flags to manipulate file path in train_gpu.py

Bagdu · 2019-07-04T09:07:43Z

This may be the solution for Issues #120, #111, #85

I have changed code , but i have the same error. TypeError: filenamesmust be atf.data.Datasetoftf.string elements

Bagdu

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

bzantium · 2019-07-04T11:37:08Z

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?
and the file names in the "tfrecords" folder like:
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json
train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.tfrecords

Bagdu · 2019-07-04T11:41:54Z

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?

step 1 - python data_utils.py --bsz_per_host=32 --num_core_per_host=16 --seq_len=512 --reuse_len=256 --input_glob=books-sentences.txt --save_dir=fix --num_passes=20 --bi_data=True --sp_path=/home/ubuntu/giorgi/sp10m.cased.v3.model --mask_alpha=6 --mask_beta=1 --num_predict=85.
/home/ubuntu/xlnet/fix/tfrecords this is directory where tfrecords is stored.

step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/fix/tfrecords --train_batch_size=32 --seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

bzantium · 2019-07-04T11:50:14Z

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?

step 1 - python data_utils.py --bsz_per_host=32 --num_core_per_host=16 --seq_len=512 --reuse_len=256 --input_glob=books-sentences.txt --save_dir=training --num_passes=20 --bi_data=True --sp_path=/home/ubuntu/giorgi/sp10m.cased.v3.model --mask_alpha=6 --mask_beta=1 --num_predict=85.
/home/ubuntu/xlnet/training/tfrecords this is directory where tfrecords is stored.

step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/training/tfrecords --train_batch_size=32--seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

Can you add space between --train_batch_size=32 and --seq_len=512 for step 2?

Bagdu · 2019-07-04T11:55:52Z

i have changed eeverything correct, but i have error :
TypeError: filenamesmust be atf.data.Datasetoftf.string elements.

Can you show me your command line for running data_utils.py and train_gpu.py ?

step 1 - python data_utils.py --bsz_per_host=32 --num_core_per_host=16 --seq_len=512 --reuse_len=256 --input_glob=books-sentences.txt --save_dir=training --num_passes=20 --bi_data=True --sp_path=/home/ubuntu/giorgi/sp10m.cased.v3.model --mask_alpha=6 --mask_beta=1 --num_predict=85.
/home/ubuntu/xlnet/training/tfrecords this is directory where tfrecords is stored.
step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/training/tfrecords --train_batch_size=32--seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

Can you add space between --train_batch_size=32 and --seq_len=512 for step 2?

I made a mistake in copy.

step 2 - sudo python3 train_gpu.py --record_info_dir=/home/ubuntu/xlnet/fix/tfrecords --train_batch_size=32 --seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir=my_model .

Bagdu · 2019-07-04T11:57:41Z

This is the tfrecords, which was generated by running data_utils.py.

bzantium · 2019-07-04T12:03:23Z

This is the tfrecords, which was generated by running data_utils.py.

I think you should change flags("uncased") to False in data_utils.py. You can simply change file names
from: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.json
to: train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json
if you don't want to run data_utils.py again.

Bagdu · 2019-07-04T12:28:34Z

This is the tfrecords, which was generated by running data_utils.py.

I think you should change flags.("uncased") to False in data_utils.py. You can simply change file name
from: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.json
to: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json

I have changed flags.DEFINE_bool("uncased", False, help="Use uncased inputs or not."), but i have same error: TypeError: filenamesmust be atf.data.Datasetoftf.stringelements.

When i train in gpu, in data_utils.py use_tpu FLAG should be False, i think.

Bagdu · 2019-07-05T12:53:07Z

This is the tfrecords, which was generated by running data_utils.py.

I think you should change flags("uncased") to False in data_utils.py. You can simply change file names
from: train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.uncased.bi.alpha-6.beta-1.fnp-85.json
to: train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.tfrecords
record_info-train-0-0.bsz-32.seqlen-512.reuse-256.bi.alpha-6.beta-1.fnp-85.json
if you don't want to run data_utils.py again.

Hi, Is here any news about my case?

bzantium · 2019-07-06T01:58:28Z

Hi, Is here any news about my case?

I think everything looks ok... but can you try relative path? like:
sudo python3 train_gpu.py --record_info_dir=fix2/tfrecords --train_batch_size=32 --seq_len=512 --reuse_len=256 --mem_len=384 --perm_size=256 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --untie_r=True --model_dir='my_model'

When restart training, since prev_step is -1, curr_loss for the first print would be wrongly calculated.

Aaradhyaiitr · 2019-07-09T11:04:02Z

Inorder to do pre-training:

Apart from the above mentioned changes, notice and change the following things as well:

Batchsize should be same in both train and data_utils.
In Line 776 data_utils.py change uncased to None (uncased=None).
In Line 236 modelling.py change assert bsz%2 == 0 to tf.debugging.assert_equal(bsz%2,0).
add --uncased=True as an argument in train_gpu.

bzantium · 2019-07-09T11:18:53Z

Inorder to do pre-training:

Apart from the above mentioned changes, notice and change the following things as well:

Batchsize should be same in both train and data_utils.

In Line 776 data_utils.py change uncased to None (uncased=None).

In Line 236 modelling.py change assert bsz%2 == 0 to tf.debugging.assert_equal(bsz%2,0).

add --uncased=True as an argument in train_gpu.

Thank you for the commits summarization!

bzantium added 2 commits July 4, 2019 17:11

Fix prefix in get_input_fn

9965647

fix errors with get_input_fn in train_gpu

1b61bf2

bzantium commented Jul 4, 2019

View reviewed changes

fix minor issue

4c574d6

fix assertion for even number of batch size

7ef2f2d

bzantium changed the title ~~Fix prefix in get_input_fn~~ Fix prefix in get_input_fn / Fix assertion for even number of batch size Jul 4, 2019

bzantium closed this Jul 4, 2019

bzantium reopened this Jul 4, 2019

Bagdu reviewed Jul 4, 2019

View reviewed changes

change default value of uncased to False

1ea884d

Fix curr_loss calculation

0391d1e

When restart training, since prev_step is -1, curr_loss for the first print would be wrongly calculated.

This was referenced Nov 20, 2019

Merge from bzantium vochicong/xlnet#1

Merged

Merging various fixes for Colab, Cloud TPU, TPU Pod, ... #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prefix in get_input_fn / Fix assertion for even number of batch size #121

Fix prefix in get_input_fn / Fix assertion for even number of batch size #121

bzantium commented Jul 4, 2019 •

edited

Loading

bzantium left a comment •

edited

Loading

Bagdu commented Jul 4, 2019 •

edited

Loading

Bagdu left a comment

bzantium commented Jul 4, 2019 •

edited

Loading

Bagdu commented Jul 4, 2019 •

edited

Loading

bzantium commented Jul 4, 2019 •

edited

Loading

Bagdu commented Jul 4, 2019

Bagdu commented Jul 4, 2019

bzantium commented Jul 4, 2019 •

edited

Loading

Bagdu commented Jul 4, 2019 •

edited

Loading

Bagdu commented Jul 5, 2019

bzantium commented Jul 6, 2019 •

edited

Loading

Aaradhyaiitr commented Jul 9, 2019 •

edited

Loading

bzantium commented Jul 9, 2019

Fix prefix in get_input_fn / Fix assertion for even number of batch size #121

Are you sure you want to change the base?

Fix prefix in get_input_fn / Fix assertion for even number of batch size #121

Conversation

bzantium commented Jul 4, 2019 • edited Loading

bzantium left a comment • edited Loading

Choose a reason for hiding this comment

Bagdu commented Jul 4, 2019 • edited Loading

Bagdu left a comment

Choose a reason for hiding this comment

bzantium commented Jul 4, 2019 • edited Loading

Bagdu commented Jul 4, 2019 • edited Loading

bzantium commented Jul 4, 2019 • edited Loading

Bagdu commented Jul 4, 2019

Bagdu commented Jul 4, 2019

bzantium commented Jul 4, 2019 • edited Loading

Bagdu commented Jul 4, 2019 • edited Loading

Bagdu commented Jul 5, 2019

bzantium commented Jul 6, 2019 • edited Loading

Aaradhyaiitr commented Jul 9, 2019 • edited Loading

bzantium commented Jul 9, 2019

bzantium commented Jul 4, 2019 •

edited

Loading

bzantium left a comment •

edited

Loading

Bagdu commented Jul 4, 2019 •

edited

Loading

bzantium commented Jul 4, 2019 •

edited

Loading

Bagdu commented Jul 4, 2019 •

edited

Loading

bzantium commented Jul 4, 2019 •

edited

Loading

bzantium commented Jul 4, 2019 •

edited

Loading

Bagdu commented Jul 4, 2019 •

edited

Loading

bzantium commented Jul 6, 2019 •

edited

Loading

Aaradhyaiitr commented Jul 9, 2019 •

edited

Loading