b2699 #109

Nexesenex · 2024-04-19T19:12:46Z

No description provided.

* implement olmo architecture * remove unused variable * remove unused moe branch * remove check for weight * remove superfluous moe, bias and rope tensors * clarified comment * fix clamp_kqv setting * remove obsolete parameter name filter

…ntu) (#6748)

* iq1_bn: improve CUDA TG On RTX-3080 TG-128(Bitnet-1.58b-3B) goes from 318 t/s to 340 t/s. I see I have on the front page 301 t/s, so pretty nice improvement since then. * iq2_bn(CUDA): quants are not 4-byte aligned --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

nopperl and others added 3 commits April 19, 2024 11:35

Implement the OLMo architecture (#6741)

9958c81

* implement olmo architecture * remove unused variable * remove unused moe branch * remove check for weight * remove superfluous moe, bias and rope tensors * clarified comment * fix clamp_kqv setting * remove obsolete parameter name filter

server: static: upstream upgrade (#6765)

637e9a8

ci: add ubuntu latest release and fix missing build number (mac & ubu…

0e4802b

…ntu) (#6748)

Nexesenex merged commit c46ce14 into Nexesenex:downstream Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

b2699 #109

b2699 #109

Uh oh!

Nexesenex commented Apr 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

b2699 #109

b2699 #109

Uh oh!

Conversation

Nexesenex commented Apr 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants