Mistral 7B, Albert Q. Jiang+, N/A, arXiv'23 #1309

AkihikoWatanabe · 2024-05-24T03:55:02Z

URL

https://arxiv.org/pdf/2310.06825

Affiliations

Albert Q. Jiang, N/A
Alexandre Sablayrolles, N/A
Arthur Mensch, N/A
Chris Bamford, N/A
Devendra Singh Chaplot, N/A
Diego de las Casas, N/A
Florian Bressand, N/A
Gianna Lengyel, N/A
Guillaume Lample, N/A
Lucile Saulnier, N/A
Lélio Renard Lavaud, N/A
Marie-Anne Lachaux, N/A
Pierre Stock, N/A
Teven Le Scao, N/A
Thibaut Lavril, N/A
Thomas Wang, N/A
Timothée Lacroix, N/A
William El Sayed, N/A

Abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineeredfor superior performance and efficiency. Mistral 7B outperforms Llama 2 13Bacross all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, andcode generation. Our model leverages grouped-query attention (GQA) for fasterinference, coupled with sliding window attention (SWA) to effectively handlesequences of arbitrary length with a reduced inference cost. We also provide amodel fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpassesthe Llama 2 13B -- Chat model both on human and automated benchmarks. Ourmodels are released under the Apache 2.0 license.

Translation (by gpt-3.5-turbo)

Mistral 7B v0.1は、優れたパフォーマンスと効率を実現するために設計された70億パラメータの言語モデルであり、Mistral 7Bは、すべての評価ベンチマークでLlama 2 13Bを上回り、推論、数学、およびコード生成においてLlama 1 34Bを凌駕しています。当社のモデルは、高速な推論のためにグループ化されたクエリアテンション（GQA）を活用し、推論コストを削減しながら任意の長さのシーケンスを効果的に処理するためにスライディングウィンドウアテンション（SWA）を組み合わせています。また、指示に従うように微調整されたモデルであるMistral 7B -- Instructを提供し、これはLlama 2 13B -- Chatモデルを人間および自動化されたベンチマークの両方で上回っています。当社のモデルはApache 2.0ライセンスの下で公開されています。

Summary (by gpt-3.5-turbo)

Mistral 7B v0.1は、70億パラメータの言語モデルであり、高速な推論のためにGQAを活用し、SWAを組み合わせている。また、Mistral 7B -- InstructはLlama 2 13B -- Chatモデルを上回っており、Apache 2.0ライセンスの下で公開されています。

AkihikoWatanabe · 2024-05-24T04:01:20Z

#1237 #1279 などのモデルも参照のこと

モデルのスケールが大きくなると、inferenceのlatencyが遅くなり、計算コストが大きくなりすぎて実用的でないので、小さいパラメータで素早いinference実現したいよね、というモチベーション。
そのために、SlidingWindowAttentionとGroupQueryAttention #1271 を活用している。

より小さいパラメータ数でLlama2を様々なタスクでoutperformし

Instruction Tuningを実施したモデルは、13BモデルよりもChatbotArenaで高いElo Rateを獲得した。

AkihikoWatanabe · 2024-05-24T05:29:32Z

コンテキスト長は8192

AkihikoWatanabe added the Pocket label May 24, 2024

AkihikoWatanabe changed the title ~~Mistral 7B~~ Mistral 7B, Albert Q. Jiang+, N/A, arXiv'23 May 24, 2024

AkihikoWatanabe mentioned this issue May 24, 2024

Gemma: Open Models Based on Gemini Research and Technology, 2024 #1277

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral 7B, Albert Q. Jiang+, N/A, arXiv'23 #1309

Mistral 7B, Albert Q. Jiang+, N/A, arXiv'23 #1309

AkihikoWatanabe commented May 24, 2024 •

edited

Loading

AkihikoWatanabe commented May 24, 2024

AkihikoWatanabe commented May 24, 2024

Mistral 7B, Albert Q. Jiang+, N/A, arXiv'23 #1309

Mistral 7B, Albert Q. Jiang+, N/A, arXiv'23 #1309

Comments

AkihikoWatanabe commented May 24, 2024 • edited Loading

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented May 24, 2024

AkihikoWatanabe commented May 24, 2024

AkihikoWatanabe commented May 24, 2024 •

edited

Loading