Add attention_bias
argument in transformer block and transformer layer modules, addressing change in MCore
#9734
Job | Run time |
---|---|
25s | |
22s | |
29s | |
1m 16s |