-
Notifications
You must be signed in to change notification settings - Fork 520
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add Cutlass FP8 Grouped Gemm to Quantize Bench
cla signed
fb-exported
#3530
opened Dec 23, 2024 by
jwfromm
Loading…
Fix out-of-bound load in row scaling
cla signed
fb-exported
#3527
opened Dec 23, 2024 by
htyu
Loading…
change pmt require grad to false when detached
cla signed
fb-exported
#3525
opened Dec 22, 2024 by
duduyi2013
Loading…
remove output dtype restriction in ssd tbe
cla signed
fb-exported
#3524
opened Dec 22, 2024 by
duduyi2013
Loading…
Check src & dst dtypes in allgather to prevent silent failures.
cla signed
fb-exported
#3523
opened Dec 20, 2024 by
ChenheliHua
Loading…
Improve performance of prefill mode FP8 Grouped Gemm
cla signed
fb-exported
#3522
opened Dec 20, 2024 by
jwfromm
Loading…
Add fused_moe kernel to ck_extension
cla signed
fb-exported
#3518
opened Dec 19, 2024 by
sijiac
Loading…
Cherry-pick CK PR #1636 for fp8 GEMM rowwise for 70B Prefill
cla signed
fb-exported
#3517
opened Dec 19, 2024 by
zjing14
Loading…
env variable to select rounding mode
cla signed
fb-exported
#3515
opened Dec 19, 2024 by
hhyuanf
Loading…
Optimzed backward pass for ROCm devices (pt 2)
ciflow/rocm
cla signed
fb-exported
module: rocm
#3511
opened Dec 18, 2024 by
q10
Loading…
Back out "Manual loop unroll for rocm inference"
ciflow/rocm
cla signed
fb-exported
module: rocm
#3506
opened Dec 15, 2024 by
brad-mengchi
Loading…
[fbgemm_gpu] Add support for CUDA 12.6 builds in OSS
cla signed
#3503
opened Dec 13, 2024 by
q10
Loading…
migrate "jagged_flash_attention"
cla signed
fb-exported
#3490
opened Dec 10, 2024 by
brad-mengchi
Loading…
Optimzed backward pass for ROCm devices (#3367)
ciflow/rocm
cla signed
fb-exported
module: rocm
#3468
opened Dec 6, 2024 by
q10
Loading…
Use GEMM kernel for KleidiAI to accelerate FP16Benchmark
cla signed
#3440
opened Dec 3, 2024 by
milpuz01
Loading…
Make check_feature_gate_key PT2 compatible
cla signed
fb-exported
#3426
opened Nov 30, 2024 by
sryap
Loading…
Make check_feature_gate_key PT2 compatible
cla signed
fb-exported
#3425
opened Nov 30, 2024 by
sryap
Loading…
Add NEON and SVE implementations for Float16 conversions
cla signed
#3424
opened Nov 28, 2024 by
annop-w
Loading…
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.