Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge releases/2024/3 into master #731

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
e4637b3
Workaround (#618)
Wovchena Jul 15, 2024
423c8e3
Revert to python3
Wovchena Jul 15, 2024
8ad336c
Revert to python3 (#622)
akladiev Jul 15, 2024
1b1b2f0
Fix cmake Python var name (#624)
Wovchena Jul 15, 2024
70b74ad
Add ContinuousBatchingPipeline constructor similar to LLMPipeline (#604)
Wovchena Jul 15, 2024
f0c2677
Clear beam search info when generate() is finished. (#630)
popovaan Jul 15, 2024
73badf6
Update nncf_utils.py (#616) (#633)
KodiaqQ Jul 16, 2024
25655e3
Workaround cmake packaging (#634)
Wovchena Jul 16, 2024
754f6d7
Save licensing_genai into docs to align with OpenVINO (#637)
Wovchena Jul 16, 2024
e5247e0
Update submodule (#638)
Wovchena Jul 16, 2024
2d1fa3b
Add Llama3 (#620)
Wovchena Jul 17, 2024
489a87d
nightly->rc1 (#621)
Wovchena Jul 17, 2024
67f0467
Add OpenVINOGenAITargets to core_genai_dev COMPONENT (#642)
Wovchena Jul 17, 2024
1969160
Apply todo, initialize detokenizer's cache (#647)
Wovchena Jul 22, 2024
0e0f6a9
Cherry-pick static LLM pipeline changes (#654)
TolyaTalamanov Jul 22, 2024
cb100cb
[Continuous batching] Replace standard max_element call with custom l…
mzegla Jul 11, 2024
f0e4190
wip
pavel-esir Jul 12, 2024
7cab496
add detokenization metric; refactor split to perf_conter & perf_metrics
pavel-esir Jul 19, 2024
bb1113c
refactor structure, add python sample
pavel-esir Jul 22, 2024
7bf42f1
Cherry-pick custom max_element loop (#662)
mzegla Jul 22, 2024
0a8f0d9
add more preicise durations
pavel-esir Jul 22, 2024
bad01b9
Add note for pybind ov::Tensor issue (#659)
as-suvorov Jul 22, 2024
cb0da0a
[OV 24.3]Fix multinomial sample CMakeList (#658)
sammysun0711 Jul 22, 2024
bc92248
add Readme for tests (#664)
pavel-esir Jul 23, 2024
90320f4
add cpp Readme, ensured correct batch processing, add PerfMetrics to …
pavel-esir Jul 23, 2024
aeec730
use MeanStdPair
pavel-esir Jul 23, 2024
56eeafc
[2024.3] Fix symbol encode error (#629)
yatarkan Jul 24, 2024
8934a0e
[release branch] Add infer request queue for tokenizers and allow for…
dkalinowski Jul 24, 2024
12f8e44
Add max_new_tokens to every generate call in src/README.md (#670)
pavel-esir Jul 24, 2024
f9e45e1
Add CB naive chat (#644)
Wovchena Jul 24, 2024
03590c5
return back py::object -> AnyMap (#679)
pavel-esir Jul 24, 2024
53945f7
Update openvino_tokenizers (#680)
Wovchena Jul 24, 2024
a769b33
Allow dev and rc tokenizers (#681)
Wovchena Jul 24, 2024
e449ffe
Fix chat templates with slices, add tokenizer config for `mistralai/M…
yatarkan Jul 25, 2024
406393f
Prefix caching. (#675)
popovaan Jul 26, 2024
c45aed5
Merge remote-tracking branch 'upstream/releases/2024/3' into add_perf…
pavel-esir Jul 26, 2024
be2fdaf
resolve conflicts
pavel-esir Jul 26, 2024
b00bcd8
apply comments
pavel-esir Jul 26, 2024
60e7188
uset getter and cache evaluate results
pavel-esir Jul 26, 2024
e553ef5
update Readme's
pavel-esir Jul 26, 2024
3bfbab5
StaticLLMPipeline dangling models hotfix (#693)
TolyaTalamanov Jul 26, 2024
102f00a
add generation time metrics (#613)
andrei-kochin Jul 26, 2024
06c57b7
Remove Dockerfile (#700)
mzegla Jul 29, 2024
e286469
StaticLLMPipeline - align u4 zero points (#705)
TolyaTalamanov Jul 30, 2024
2a80828
Disable broken test (#707)
Wovchena Jul 31, 2024
d89cdcb
update optimum commit for releases/2024/3 (#711)
eaidova Jul 31, 2024
2428a3a
change commit for optimum
eaidova Jul 31, 2024
1473e7f
Merge branch 'releases/2024/3' into ea/upd_opt_commit
eaidova Jul 31, 2024
8cb12b2
change commit for optimum (#714)
andrei-kochin Jul 31, 2024
2f778f3
Add perf metric docstrings (#713)
pavel-esir Jul 31, 2024
2dc6b64
rc1->rc2 (#695)
Wovchena Jul 31, 2024
3bfdd3f
Docs for version compatibility (#692)
yatarkan Jul 31, 2024
a295fe1
update requirements.txt (#721)
wgzintel Aug 1, 2024
4743003
Merge branch 'releases/2024/3' into merge-releases/2024/3-into-master
Wovchena Aug 2, 2024
b30a262
fix merge
Wovchena Aug 2, 2024
fb80ce7
fix merge
Wovchena Aug 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion samples/python/chat_sample/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ If you encounter an exception indicating a missing "chat template" when launchin
The following template can be used as a default, but it may not work properly with every model:
```
"chat_template": "{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|im_start|>user\n' + message['content'] + '<|im_end|>\n<|im_start|>assistant\n'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|im_end|>\n'}}{% endif %}{% endfor %}",
```
```
8 changes: 5 additions & 3 deletions src/cpp/src/llm_pipeline_static.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,10 @@ StaticLLMPipeline::StaticLLMPipeline(
*/
ov::Core core;
// (1) Read the template model - this will be kvcache model
auto kvcache_model = core.read_model(path / "openvino_model.xml");
// (2) TODO: Expose KV-cache input and output layers from kvcache model
m_kvcache_model = core.read_model(path / "openvino_model.xml");
// (2) Expose KV-cache input and output layers from kvcache model
ov::pass::StatefulToStateless().run_on_model(m_kvcache_model);
align_u4_zp_constants(m_kvcache_model);
TolyaTalamanov marked this conversation as resolved.
Show resolved Hide resolved
// (3) Clone the model - this will be prefill
m_prefill_model = m_kvcache_model->clone();
m_prefill_model->set_friendly_name(m_kvcache_model->get_friendly_name() + "_prefill");
Expand All @@ -179,7 +181,7 @@ StaticLLMPipeline::StaticLLMPipeline(
m_prefill_model, device, extract_config_or_default(config, "PREFILL_CONFIG")
).create_infer_request();
m_kvcache_request = core.compile_model(
kvcache_model, device, extract_config_or_default(config, "GENERATE_CONFIG")
m_kvcache_model, device, extract_config_or_default(config, "GENERATE_CONFIG")
).create_infer_request();
// (7) Initialize tensors
prepare_for_new_conversation();
Expand Down
Loading