Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding Custom Data Testing in Audio Fragment Identification #41

Closed
devkya opened this issue Dec 26, 2023 · 1 comment
Closed
Assignees
Labels
question Further information is requested

Comments

@devkya
Copy link

devkya commented Dec 26, 2023

Hello, I found this paper quite interesting and encountered some issues during the testing process with my custom data. I apologize if my questions seem naive, as I lack some knowledge in this area.

In my case, I don't need an algorithm to identify a specific audio from various songs using audio fragments. I have a single audio file (2-3 hours) and need to find where in this file a certain audio fragment (3-5 seconds) begins (I understand from the issue check that I need to customize the process for obtaining the start timestamp).

  1. Is this code suitable for such a scenario?

  2. I trained using the provided dataset mini. Then I used the command python run.py generate --source CUSTOM_SOURCE_ROOT_DIR --output FP_OUTPUT_DIR --skip_dummy to generate fingerprints for my custom data, which is an audio file. Afterward, I wanted to evaluate a short audio fragment (3-5 second wav file) but wasn't sure how to proceed. Also, is this a meaningful process?

  3. Should my custom audio data also be included in the training?

Thank you.

@mimbres
Copy link
Owner

mimbres commented Jan 17, 2024

@devkya Sorry for the late reply😅

  1. Yes, if:

    • your 3-hour-long audio has moderately unique segments,
    • or you want to search all the segments similar to query.
  2. As described in Fingerprint Generation, you should prepare a source directory that contains subfolders named test_query and test_db. You may put 3-hr audio in test_db, and some slices of queries in test_query. In fact, you won't need to slice them by yourself. Fingerprints will be sliced by segments anyway... Then follow the description about Search & Evaluation.

  • EDIT: If you don't assume any specific noise in queries, don't slice it and just reuse the 3-hr audio as query. This way, the frame index of queries and db will exactly match.
  1. No, unless you assume a very different type of data other than music. However, if you assume speech-only data for example, I recommend re-training using some speech-only data. If you assume a specific type of acoustic noise added to your query, you should use the similar type of noise for augmentation in training. The thing is, neural-FP mainly learns what kind of sound sources to discard in similarity search.

Hope this helps, good luck!

@mimbres mimbres added the question Further information is requested label Jan 17, 2024
@mimbres mimbres self-assigned this Jan 17, 2024
@mimbres mimbres closed this as completed Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants