Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Resemblyzer for Speaker Similarity Evaluation & Bug fixes #75

Merged
merged 5 commits into from
Dec 29, 2023

Conversation

Merakist
Copy link
Collaborator

Description

This pull request includes several updates to the speaker similarity evaluation process in the Amphion project, addressing issues with counterintuitive results from the previous RawNet3 model by implementing calculation with Resemblyzer. Additional updates include bug fixes and enhancements for GPU support.

Objective

  • These changes aim to improve the accuracy of speaker similarity evaluations by further implementing Resemblyzer as an additional reference to the current RawNet3 model.
  • The current speaker_similarity.py compares the average characteristics of all files in one directory against the average characteristics of all files in the other directory.
  • The new resemblyzer_similarity.py performs detailed comparisons between individual files across the two directories using Resemblyzer before calculating the average, yielding more accurate results.

Testing

Changes

  • Amphion/bins/calc_metrics.py:

    • Fixed missing "fs" argument in line 160.
    • Added functionality to select between RawNet3 and Resemblyzer models for speaker similarity calculations.
  • Amphion/egs/metrics/run.sh:

    • Added support for automatic GPU allocation for calculating metrics. The script now detects a free GPU and allocates it for model processing.
  • Amphion/evaluation/metrics/similarity/resemblyzer_similarity.py:

    • New script added for computing speaker similarity using the Resemblyzer model.
  • Amphion/env.sh:

    • Included Resemblyzer as a new environment dependency.

Usage

When calculating speaker similarity with Amphion/egs/metrics/run.sh, the user will be prompted to select a model (RawNet3/Resemblyzer). If Resemblyzer is selected, an overall similarity result will be printed in the terminal and per-utterance similarity results will be saved in a .csv file under the dump_dir.

Request

Requesting a review for the proposed changes and subsequent merge into the main branch.

@Merakist Merakist requested a review from lmxue December 29, 2023 08:59
@lmxue lmxue requested a review from VocodexElysium December 29, 2023 12:49
Copy link
Collaborator

@lmxue lmxue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete the repeated code.

@@ -33,6 +40,9 @@ while true; do
esac
done

######## Set CUDA_VISIBLE_DEVICES ###########
export CUDA_VISIBLE_DEVICES=$gpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export CUDA_VISIBLE_DEVICES=$gpu in line44 and CUDA_VISIBLE_DEVICES=$gpu in line47 are repeated. Line 43-Line 44 can be deleted.

@Merakist Merakist requested a review from lmxue December 29, 2023 13:23
Copy link
Collaborator

@lmxue lmxue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good PR template.

@lmxue lmxue changed the title Add Resemblyzer for Speaker Similarity evaluation & Bug fixes Add Resemblyzer for Speaker Similarity Evaluation & Bug fixes Dec 29, 2023
@lmxue lmxue merged commit b4495b2 into open-mmlab:main Dec 29, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants