Revert "Force to generate "inference count" tokens" #455

peterchen-intel · 2024-05-21T13:30:09Z

Reverts #289 to unblock the release.
Since it causes the performance regression of some models. (WIP to investigate the reason)

This reverts commit ee0f75a.

* Fix noise images generated for '--num' > 1 in Stable Diffusion sample (openvinotoolkit#441) Fixes openvinotoolkit#405 * update optimum intel commit in llm bench (openvinotoolkit#444) * Fix an attempt to add a string value to a numerical value (openvinotoolkit#447) * output no hook data warning when it is text gen model (openvinotoolkit#449) * Fix md5 hash for env that does not support usedforsecurity arg (openvinotoolkit#445) I got an error running benchmarking on my working machine (python3.8, ubuntu20) due to unsupported args for hashlib. ``` [ ERROR ] An exception occurred [ INFO ] Traceback (most recent call last): File "benchmark.py", line 532, in main iter_data_list, pretrain_time = CASE_TO_BENCH[model_args['use_case']](model_path, framework, args.device, model_args, args.num_iters) File "benchmark.py", line 194, in run_text_generation_benchmark run_text_generation(input_text, num, model, tokenizer, args, iter_data_list, warmup_md5, prompt_idx, bench_hook, model_precision, proc_id) File "benchmark.py", line 131, in run_text_generation result_md5_list.append(hashlib.md5(result_text.encode(), usedforsecurity=False).hexdigest()) TypeError: openssl_md5() takes at most 1 argument (2 given) ``` Based on this [StackOverflow issue](https://stackoverflow.com/questions/54717862/how-do-i-know-if-the-usedforsecurity-flag-is-supported-by-hashlib-md5), not all clients support this argument and usage hashlib.new("md5") vs hashlib.md5 should be safe for usage in both cases * fix path based configuration (openvinotoolkit#456) * Revert "Force to generate "inference count" tokens" (openvinotoolkit#455) Reverts openvinotoolkit#289 to unblock the release. Since it causes the performance regression of some models. (WIP to investigate the reason) * enable * libtbb-dev * move * slash * install * core_genai_dev * remove export * rreorganaise components * add SOVERSION, and requirements-build.txt * repalce SKBUILD with EXCLUDE_FROM_ALL because the effect is the same * fix NAMELINK_COMPONENT * remove extraline * add soft restrictions * Fix build to unblock packaging * verify beam search 1st token optimization (openvinotoolkit#426) The minimum version of transformers to get 1st and 2nd tokens latency is v4.40-release. * Output median min and avg values to csv (openvinotoolkit#450) Co-authored-by: Chen Peter <peter.chen@intel.com> * improve naming * install samples * remove quotes * use main target name because an alias can't be specified in cmake --target * define CMAKE_BUILD_PARALLEL_LEVEL * Ensure ./requirements-build.txt won't outdate * Use ./requirements-build.txt in python lib build * Add missing && * Test Debug * add matrix for windows_genai_package * openvino_tokenizers from form * update openvino_tokenizers * update openvino_tokenizers * update openvino_tokenizers * revert openvino_tokenizers * tokenizers from fork * update tokenizers * centos7_2024.2.0.dev * copy target * revert tokenizers * reapply useful changes * copy so only * Update tokenizers, centos7_2024.2.0.dev * single thread * ubuntu22 * nightyl * --pre --extra-index-url * update tokenizers * space * move --pre --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly * release tokenizers * merge * downgrade tokenizers * downgrade * two steps * downgrade tokenizers * dont setupvars * source * fix * submodule * releases/2024/2 tokenizers * fix-2 * rebase * use make * comment * CMAKE_GENERATOR=Unix Makefiles * update openvino * space * optimum-cli from fork * different commit * from branch * remove exrtra-index for SD * reorder pip install * revert unwanted changes * Ubuntu-22 * openvino_tokenizers~=2024.2.0.0 * remove -pre . --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly * upgrade to prerelease * revert requirements.txt * remove --pre, setupvars * get openvino_tokenizers._ext_path * take release pybind, fix soversion, and tokenizers folder * spelling * dont copy libs * put ov_tokenizers_path back * GENAI_BUILD_DIR=../../build * Add extension near to genai library * include openvino/util/file_util.hpp * get_absolute_file_path * remove namepsace * # include <limits.h> * more than one . * till next dot * _ext_path * -1 * +1 * +1 * path * ext name * with_openvino_tokenizers * char * revert test * tokenizers from form * update fork * lib * fix cherry-pick * update fork * dont spoil source dir * Generator expressions to disable appending a per-configuration subdirectory * remove versions * fix path * try * try * verbose * spelling * rename file * remove build.tool-args * Release * dont speciify targets * revert 81ec069 * Update tests * No rule to make target package * skip step * test tokenizers are loaded * CPU * dont test Debug * retrigger * minor * 16-cores * retrigger * retrigger * retrigger * -x * str * less verbose * less verbose * less * less * more * no cache * conflicts * cache * export * cached save * rename * rename * 16-cores * no larg * save memory * retrigger * predownload * comment * export name * exports * supress * revert * test_operator_wit_callback_batch_fail * run test_beam_search_decoding only * test_decoding * remove Phi * all * add return bool to streamer to stop generation * add return bool to streamer to stop generation * add return bool to streamer to stop generation * add return bool to streamer to stop generation * dont test StopCriteria.EARLY because it fails * update * remove sudo apt-get install libtbb-dev * submodule from fork * update submodule * update submodule * update submodule * update submodule * set upstream submodule, add copyright headers, shorten commands * space * dir link * retrigger * update * skip * test * put optimum-intel[openvino] back * flake8 * flake8 * optimum[openvino]==1.20.0 * update tests/python_tests/requirements.txt --------- Co-authored-by: Yaroslav Tarkan <yaroslav.tarkan@intel.com> Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> Co-authored-by: guozhong wang <guozhong.wang@intel.com> Co-authored-by: Chen Peter <peter.chen@intel.com> Co-authored-by: Pavel Esir <pavel.esir@intel.com>

commit adec0e0 Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 11 14:32:45 2024 +0400 Remove extra token desc commit a64f30a Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 11 13:36:01 2024 +0400 Working sampler commit 05048ff Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 11 13:23:43 2024 +0400 check commit e349418 Merge: bfaa55a 0b1ce98 Author: Irina Efode <irina.efode@intel.com> Date: Mon Jun 10 23:11:58 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into penalties commit 0b1ce98 Merge: 16d857e 2da1556 Author: Ilya Lavrenov <ilya.lavrenov@intel.com> Date: Mon Jun 10 18:52:20 2024 +0400 Merge pull request openvinotoolkit#21 from iefode/n_support Support num_return_seq for multinomial case commit bfaa55a Author: Irina Efode <irina.efode@intel.com> Date: Mon Jun 10 17:42:01 2024 +0400 Fix tests commit fa0efb6 Author: Irina Efode <irina.efode@intel.com> Date: Mon Jun 10 16:41:04 2024 +0400 Config tests commit 7551303 Author: Irina Efode <irina.efode@intel.com> Date: Mon Jun 10 15:34:14 2024 +0400 Implement LogitTransformers. todo config check commit 16d857e Merge: 76148c5 1ee4f38 Author: Ilya Lavrenov <ilya.lavrenov@intel.com> Date: Mon Jun 10 10:41:27 2024 +0200 Merge remote-tracking branch 'upstream/master' into ct-beam-search commit 1ee4f38 Author: guozhong wang <guozhong.wang@intel.com> Date: Sun Jun 9 18:26:57 2024 +0800 Add option --prompt_index (openvinotoolkit#481) Run the corresponding prompt according to the option prompt index commit 9902928 Author: Pavel Esir <pavel.esir@gmail.com> Date: Fri Jun 7 20:57:47 2024 +0200 Generate pipeline (openvinotoolkit#334) LLM return logits with probabilities of each token, these probabilities can be converted to tokens/words with different technics: greedy decoding, beam search decoding, random sampling, etc. This requires writing user unfriendly post-processing even for the simplest scenario of greedy decoding. In order to make live easier we we combined all decoding scenarios into a single function call, where the decoding method and parameters are specified by arguments. In this PR we provide a user friendly API for text generation inspired by `generate` method from HuggingFace transformers library. - [x] enable calling tokenizers/detokenizers from LLMPipeline - [ ] add callback for streaming mode - done partially, need to improve - [x] rewritten samples with the current approach: [causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83](https://github.com/pavel-esir/openvino.genai/blob/generate_pipeline/text_generation/causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83) - [x] Multibatch greedy decoding - [ ] Speculative decoding - [ ] Grouped Beam Search decoding: ready for batch 1, need to rebase multibatch support after merging openvinotoolkit#349 - [x] Random sampling Example 1: Greedy search generation ``` LLMPipeline pipe(model_path, device); // Will try to load config from generation_config.json. // but if not found default velues for gready search will be used GenerationConfig config = pipe.generation_config(); cout << pipe(prompt, config.max_new_tokens(20)); ``` Example 2: TextStreaming mode ``` LLMPipeline pipe(model_path, device); GenerationConfig config = pipe.generation_config(); auto text_streamer = TextStreamer{pipe}; auto text_streamer_callback = [&text_streamer](std::vector<int64_t>&& tokens, LLMPipeline& pipe){ text_streamer.put(tokens[0]); }; pipe(prompt, config.max_new_tokens(20).set_callback(text_streamer_callback)); text_streamer.end(); ``` CVS-132907 CVS-137920 --------- Co-authored-by: Wovchena <vladimir.zlobin@intel.com> Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Co-authored-by: Alexander Suvorov <alexander.suvorov@intel.com> Co-authored-by: Yaroslav Tarkan <yaroslav.tarkan@intel.com> Co-authored-by: Xiake Sun <xiake.sun@intel.com> Co-authored-by: wenyi5608 <93560477+wenyi5608@users.noreply.github.com> Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> Co-authored-by: guozhong wang <guozhong.wang@intel.com> Co-authored-by: Chen Peter <peter.chen@intel.com> commit 2da1556 Author: Irina Efode <irina.efode@intel.com> Date: Thu Jun 6 19:24:45 2024 +0400 library/src/continuous_batching_pipeline.cpp commit 7b48fa4 Author: Irina Efode <irina.efode@intel.com> Date: Thu Jun 6 15:03:05 2024 +0400 enable streaming for greedy commit 5c601e0 Author: Irina Efode <irina.efode@intel.com> Date: Thu Jun 6 13:29:47 2024 +0400 Comments commit 4f73d36 Author: Irina Efode <irina.efode@intel.com> Date: Wed Jun 5 22:46:04 2024 +0400 Enable frequency and presence penalties commit 5e49c46 Author: Irina Efode <irina.efode@intel.com> Date: Wed Jun 5 11:56:31 2024 +0400 Fix python tests commit eb4a219 Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 4 22:38:43 2024 +0400 fix assert place commit f4d8461 Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 4 22:22:37 2024 +0400 Correct accumulation commit 55448a1 Merge: 1128792 76148c5 Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 4 18:56:42 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into n_support commit 1128792 Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 4 18:52:38 2024 +0400 test commit e245041 Author: Irina Efode <irina.efode@intel.com> Date: Tue Jun 4 18:52:03 2024 +0400 Apply comments commit 561cde0 Author: guozhong wang <guozhong.wang@intel.com> Date: Tue Jun 4 16:27:08 2024 +0800 using sdpa for statble diffusion (openvinotoolkit#458) Co-authored-by: Chen Peter <peter.chen@intel.com> commit 04510d4 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jun 3 17:37:41 2024 +0000 Bump optimum[openvino] from 1.19.2 to 1.20.0 in /text_generation/causal_lm/cpp (openvinotoolkit#467) commit db4a88f Merge: e5d33f5 b63bda2 Author: Irina Efode <irina.efode@intel.com> Date: Mon Jun 3 13:17:32 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into n_support commit e5d33f5 Merge: fe29df9 bcdcefc Author: Irina Efode <irina.efode@intel.com> Date: Fri May 31 14:11:13 2024 +0400 Merge remote-tracking branch 'ilavrenov_upstream/ct-beam-search' into n_support commit fe29df9 Author: Irina Efode <irina.efode@intel.com> Date: Fri May 31 14:06:51 2024 +0400 Tests + Readme commit 7af72aa Author: Irina Efode <irina.efode@intel.com> Date: Wed May 29 15:16:23 2024 +0400 Squashed commit of the following: commit 28af66d Author: Anastasiia Pnevskaia <anastasiia.pnevskaia@intel.com> Date: Tue May 28 15:40:15 2024 +0200 Added cache_size to python binding of scheduler config. commit 65a793a Author: Anastasiia Pnevskaia <anastasiia.pnevskaia@intel.com> Date: Tue May 28 15:12:16 2024 +0200 Fixed tests. commit 033558e Author: Irina Efode <irina.efode@intel.com> Date: Wed May 29 00:40:48 2024 +0400 One more change commit dbae0bf Merge: f992591 2c2799f Author: Irina Efode <irina.efode@intel.com> Date: Wed May 29 00:38:52 2024 +0400 Merge master, without py tests commit a5b14c7 Author: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com> Date: Tue May 28 16:15:42 2024 +0200 grammar corrector support in WWB (openvinotoolkit#462) This PR introduces support for `AutoForSeq2SeqLM` models in WWB. Previously, WWB only supported `AutoForCasualLM`, assuming that the `generate` method copies the prompt to the output. But AutoForSeq2SeqLM generates output differently: there is no copy of the prompt, and it directly generates output. The fix was checked on the [example](https://gist.github.com/ljaljushkin/5a489a27cd0020ddbd42ea7ae54be688). It evaluates grammar correction with Seq2Seq model using WWB. commit f992591 Author: Irina Efode <irina.efode@intel.com> Date: Tue May 28 17:39:17 2024 +0400 tmp commit 7e771f1 Author: Liwenke <wenkex.li@intel.com> Date: Tue May 28 15:24:15 2024 +0800 Note for wikitext data set connection issue (openvinotoolkit#452) Co-authored-by: Chen Peter <peter.chen@intel.com> commit 24ef06e Author: guozhong wang <guozhong.wang@intel.com> Date: Tue May 28 14:23:19 2024 +0800 Force to generate more tokens (openvinotoolkit#457) commit 1ed7539 Author: guozhong wang <guozhong.wang@intel.com> Date: Tue May 28 09:44:45 2024 +0800 Correct flan-t5 output size (openvinotoolkit#451) openvinotoolkit#358 --------- Co-authored-by: Chen Peter <peter.chen@intel.com> commit b5a9f28 Author: Irina Efode <irina.efode@intel.com> Date: Mon May 27 23:48:03 2024 +0400 Extend in beam support commit edc53e5 Author: Irina Efode <irina.efode@intel.com> Date: Fri May 24 17:59:48 2024 +0400 remove extra commit 9038308 Author: Irina Efode <irina.efode@intel.com> Date: Fri May 24 16:20:13 2024 +0400 Improve multinomial commit c453e3e Author: Irina Efode <irina.efode@intel.com> Date: Fri May 24 15:42:48 2024 +0400 Support num_return_seq for multinomial case commit e6f05c6 Author: guozhong wang <guozhong.wang@intel.com> Date: Thu May 23 11:36:50 2024 +0800 Output median min and avg values to csv (openvinotoolkit#450) Co-authored-by: Chen Peter <peter.chen@intel.com> commit 25909cc Author: guozhong wang <guozhong.wang@intel.com> Date: Thu May 23 11:12:27 2024 +0800 verify beam search 1st token optimization (openvinotoolkit#426) The minimum version of transformers to get 1st and 2nd tokens latency is v4.40-release. commit 03e78fe Author: Chen Peter <peter.chen@intel.com> Date: Wed May 22 13:06:11 2024 +0800 Revert "Force to generate "inference count" tokens" (openvinotoolkit#455) Reverts openvinotoolkit#289 to unblock the release. Since it causes the performance regression of some models. (WIP to investigate the reason) commit 05a0f36 Author: Ekaterina Aidova <ekaterina.aidova@intel.com> Date: Tue May 21 20:33:26 2024 +0400 fix path based configuration (openvinotoolkit#456) commit 41b07d3 Author: Ekaterina Aidova <ekaterina.aidova@intel.com> Date: Fri May 17 06:20:18 2024 +0400 Fix md5 hash for env that does not support usedforsecurity arg (openvinotoolkit#445) I got an error running benchmarking on my working machine (python3.8, ubuntu20) due to unsupported args for hashlib. ``` [ ERROR ] An exception occurred [ INFO ] Traceback (most recent call last): File "benchmark.py", line 532, in main iter_data_list, pretrain_time = CASE_TO_BENCH[model_args['use_case']](model_path, framework, args.device, model_args, args.num_iters) File "benchmark.py", line 194, in run_text_generation_benchmark run_text_generation(input_text, num, model, tokenizer, args, iter_data_list, warmup_md5, prompt_idx, bench_hook, model_precision, proc_id) File "benchmark.py", line 131, in run_text_generation result_md5_list.append(hashlib.md5(result_text.encode(), usedforsecurity=False).hexdigest()) TypeError: openssl_md5() takes at most 1 argument (2 given) ``` Based on this [StackOverflow issue](https://stackoverflow.com/questions/54717862/how-do-i-know-if-the-usedforsecurity-flag-is-supported-by-hashlib-md5), not all clients support this argument and usage hashlib.new("md5") vs hashlib.md5 should be safe for usage in both cases commit d473e96 Author: guozhong wang <guozhong.wang@intel.com> Date: Fri May 17 10:09:27 2024 +0800 output no hook data warning when it is text gen model (openvinotoolkit#449) commit cad3abb Author: guozhong wang <guozhong.wang@intel.com> Date: Thu May 16 17:28:49 2024 +0800 Fix an attempt to add a string value to a numerical value (openvinotoolkit#447) commit 93f7670 Author: Ekaterina Aidova <ekaterina.aidova@intel.com> Date: Thu May 16 11:49:08 2024 +0400 update optimum intel commit in llm bench (openvinotoolkit#444) commit d73346c Author: Yaroslav Tarkan <yaroslav.tarkan@intel.com> Date: Wed May 15 14:24:30 2024 +0300 Fix noise images generated for '--num' > 1 in Stable Diffusion sample (openvinotoolkit#441) Fixes openvinotoolkit#405

Revert "Force to generate "inference count" tokens (#289)"

1780237

This reverts commit ee0f75a.

github-actions bot added the category: llm_bench Label for tool/llm_bench folder label May 21, 2024

wgzintel approved these changes May 22, 2024

View reviewed changes

Merge branch 'master' into revert-289-guozhong/more_tokens_to_generate

0c1d393

peterchen-intel assigned peterchen-intel and eaidova and unassigned peterchen-intel May 22, 2024

eaidova approved these changes May 22, 2024

View reviewed changes

eaidova merged commit 03e78fe into master May 22, 2024
3 checks passed

eaidova deleted the revert-289-guozhong/more_tokens_to_generate branch May 22, 2024 05:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Force to generate "inference count" tokens" #455

Revert "Force to generate "inference count" tokens" #455

peterchen-intel commented May 21, 2024 •

edited

Loading

Revert "Force to generate "inference count" tokens" #455

Revert "Force to generate "inference count" tokens" #455

Conversation

peterchen-intel commented May 21, 2024 • edited Loading

peterchen-intel commented May 21, 2024 •

edited

Loading