Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

early stopping invalid #535

Closed
zhang-ge-hao opened this issue Mar 31, 2023 · 4 comments
Closed

early stopping invalid #535

zhang-ge-hao opened this issue Mar 31, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@zhang-ge-hao
Copy link
Contributor

zhang-ge-hao commented Mar 31, 2023

item detail
model megatron-345m
image nvcr.io/nvidia/pytorch:21.11-py3
command python /workspace/FasterTransformer/examples/pytorch/gpt/multi_gpu_gpt_example.py --output_len 512 --max_batch_size 1 --end_id 13 --time

I set 13 as end_id, this id corresponds to full-stop punctuation in English. I expect to trigger the early stop mechanism.

Result. It takes 1543.90 ms. Seemed like there is no early stopping:

Loading layer_num from config.ini,    previous: 24,    current: 24
Loading max_seq_len from config.ini,    previous: 1024,    current: 1024
Loading weights_data_type from config.ini,    previous: fp32,    current: fp32
Loading head_num from config.ini,    previous: 16,    current: 16
Loading size_per_head from config.ini,    previous: 64,    current: 64
Loading tensor_para_size from config.ini,    previous: 1,    current: 1

=================== Arguments ===================
layer_num.....................: 24
input_len.....................: 1
output_len....................: 512
head_num......................: 16
size_per_head.................: 64
vocab_size....................: 50304
beam_width....................: 1
top_k.........................: 1
top_p.........................: 0.0
temperature...................: 1.0
len_penalty...................: 0.0
beam_search_diversity_rate....: 0.0
tensor_para_size..............: 1
pipeline_para_size............: 1
ckpt_path.....................: ../models/megatron-models/c-model/345m/1-gpu
lib_path......................: ./lib/libth_transformer.so
vocab_file....................: ../models/gpt2-vocab.json
merges_file...................: ../models/gpt2-merges.txt
start_id......................: 50256
end_id........................: 13
max_batch_size................: 1
repetition_penalty............: 1.0
presence_penalty..............: 0.0
min_length....................: 0
max_seq_len...................: 1024
inference_data_type...........: fp32
time..........................: True
sample_input_file.............: None
sample_output_file............: None
enable_random_seed............: False
skip_end_tokens...............: False
detokenize....................: True
use_jieba_tokenizer...........: False
int8_mode.....................: 0
weights_data_type.............: fp32
return_cum_log_probs..........: 0
shared_contexts_ratio.........: 1.0
banned_words..................: 
use_gpt_decoder_ops...........: False
=================================================

[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] Skip NCCL initialization since requested tensor/pipeline parallel sizes are equals to 1.
[FT][INFO] Device NVIDIA TITAN RTX
[INFO] batch 0, beam 0:
[Context]
<|endoftext|>

[Output]


The first of the two-day conference, which will be held at the University of California, Berkeley, will be held on Thursday, March 15, from 9 a.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

[INFO] GPT time costs: 1543.90 ms
@zhang-ge-hao zhang-ge-hao added the bug Something isn't working label Mar 31, 2023
@zhang-ge-hao
Copy link
Contributor Author

zhang-ge-hao commented Mar 31, 2023

I found that some early stopping code was removed after v5.3.

The code in ParallelGpt.cc which was removed:

if (*generation_should_stop_) {
    break;
}

@byshiue
Copy link
Collaborator

byshiue commented Mar 31, 2023

This is a know issue, please refer this issue #487.

@zhang-ge-hao
Copy link
Contributor Author

zhang-ge-hao commented Mar 31, 2023

@byshiue

Well, I fixed this issue in my scenario. You can take a look at my PR #536 and check if this fix is completely correct for FT.

And I updated my description at the top, making it make more sense.

The case takes 1543.90 ms originally, but 114.79 ms after early stopping activated.

@zhang-ge-hao
Copy link
Contributor Author

@byshiue

Oh, I see that you have fixed this problem in the same way, but the fix causes another problem.

@zhang-ge-hao zhang-ge-hao closed this as not planned Won't fix, can't repro, duplicate, stale Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants