Model | Batch | End-to-end throughput [1] | Device throughput [2] | Target |
---|---|---|---|---|
ResNet-50 (fps) | 20 | 2,070 | 7,200 | 10,000 |
BERT-Large (sen/s) | 12 | 362 | 406 | 410 |
Falcon7B-decode (t/s) | 32 | 135 | 135 | 140 |
U-Net | coming soon | |||
T5 small | coming soon | |||
Bloom | coming soon |
[1] - Observed from the host. Includes dispatch overahed and kernel execution time.
[2] - Ignoring host overhead. Kernel execution time only.
Model | Batch | End-to-end throughput [1] | Device throughput [2] | Target |
---|---|---|---|---|
Falcon-7B-decode (t/s/u) | 32 | 6.6 | 11.6 | 14 |
Mistral-7B-decode (t/s/u) | 32 | 3.3 | 12.6 | 14 |
Mamba-2.8B-decode (t/s/u) | 32 | coming soon | 17 | |
Stable Diffusion 1.4 512x512 | 1 | coming soon |
Model | Batch | Throughput |
---|---|---|
Falcon40B | coming soon | |
LLaMA-2-70B | coming soon | |
Mixtral7Bx8 | coming soon | |
ResNet50 (data parallel) | coming soon |
import ttnn
import torch
with ttnn.manage_device(device_id=0) as device:
a = torch.ones((5, 7))
b = torch.ones((1, 7))
a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
output = a + b
output = ttnn.to_torch(output)
print(output)
![TT-Metalium logo](/adifrancescoTT/tt-metal/raw/main/docs/source/common/_static/tt_metalium_w_logo.png)
TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.
Get started with simple kernels.