You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to replicate the performance of stage 1 (88.0% mAP in Table 2 of the paper) by training a model using the code from the ASC repository and replacing their model file (models.py) with the TSM model file (models_stage1_tsm.py) provided in this repository. The best performance I could get was 83.2%7 mAP.
Would you be able to either share the code that you have used to train the first-stage networks or share the training parameters that are different from the first stage of ASC? The parameters that I am looking for are the initial learning rate, the number of steps after which the learning rate drops, the total number of epochs, the method to choose the best-performing model, and the range of random factors which are used to reduce the volume of the audio files in data augmentation. It would be great if you can also share any other differences in your training method from the first stage method of the ASC repository.
The text was updated successfully, but these errors were encountered:
I think you can use the checkpoints we provided to replicate the performance: link
We have a plan to release our Stage-1 code in the future, but we basically used ASC.
For the negative sampling technique, please refer to TalkNet: link. You can simply apply this to audio_clip.
Training params:
initial lr: 1e-3
#epochs: 100
lr_scheduler: cosine annealing
criterion for the best performance: validation loss
I am trying to replicate the performance of stage 1 (88.0% mAP in Table 2 of the paper) by training a model using the code from the ASC repository and replacing their model file (models.py) with the TSM model file (models_stage1_tsm.py) provided in this repository. The best performance I could get was 83.2%7 mAP.
Would you be able to either share the code that you have used to train the first-stage networks or share the training parameters that are different from the first stage of ASC? The parameters that I am looking for are the initial learning rate, the number of steps after which the learning rate drops, the total number of epochs, the method to choose the best-performing model, and the range of random factors which are used to reduce the volume of the audio files in data augmentation. It would be great if you can also share any other differences in your training method from the first stage method of the ASC repository.
The text was updated successfully, but these errors were encountered: