-
Notifications
You must be signed in to change notification settings - Fork 8
Add CI testing and benchmarking #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📊 Test Results for Small Benchmark/Test Suite1593b62 (2025_11_13_02_34_35) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite1593b62 (2025_11_13_02_34_35) IRONCLAD Trendsaxpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_2_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_2_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_2_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_2_channels_2048_tile_128
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_8col
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mha
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_2_cols_2_channels_4096_tile_2048_0
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
swigluNo metrics available. transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
📊 Test Results for Small Benchmark/Test Suitefbf1591 (2025_11_13_03_40_08) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suitefbf1591 (2025_11_13_03_40_08) IRONCLAD Trendsaxpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_2_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_2_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_2_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_2_channels_2048_tile_128
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_8col
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mha
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_2_cols_2_channels_4096_tile_2048_0
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
swigluNo metrics available. transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
📊 Test Results for Small Benchmark/Test Suite469e14c (2025_11_13_15_02_25) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite469e14c (2025_11_13_15_02_25) IRONCLAD Trendsaxpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_2_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_2_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_2_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_2_channels_2048_tile_128
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_8col
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mha
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_2_cols_2_channels_4096_tile_2048_0
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
swigluNo metrics available. transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
📊 Test Results for Small Benchmark/Test Suitec3de77d (2025_11_13_16_08_30) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suitec3de77d (2025_11_13_16_08_30) IRONCLAD Trendsaxpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_2_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_2_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_2_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_2_channels_2048_tile_128
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_8col
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mha
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_2_cols_2_channels_4096_tile_2048_0
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
swigluNo metrics available. transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only point is the .github folder structure, I just prefer to have .yml files with a meaningful name, whatever the approach (composable or reusable_workflow).
📊 Test Results for Small Benchmark/Test Suite56543a2 (2025_11_13_17_59_12) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite56543a2 (2025_11_13_17_59_12) IRONCLAD Trendsaxpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_2_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_2_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_2_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_2_channels_2048_tile_128
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_8col
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mha
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_2_cols_2_channels_4096_tile_2048_0
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
swigluNo metrics available. transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
Added
cion every push to mainPR Merge Checklist
develcommit and pointing todevel.