-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wav_to_spectogram.py stops converting before it should #11
Comments
Since I have mentioned that I'm using 3-second segments, I'm interested in what do you think about increasing the pixel_per_second size from 50 to 100? Then I'd have 129x300x1 spectograms, which may result in the C(R)NN being able to detect patterns easier, isn't it? I'm still a newbie in this, sorry! |
Hmm, interesting behaviour... can it be that your english samples contain lots of silence? To your second question: |
I've checked the samples, they seem to be alright. I also tried commenting out the two lines that ignore samples containing lots of silence and the same behaviour was repeated. I also tried it on my whole dataset. The first output is the output after segmentation of the files, which tells how many of them there are. The second output is the wav_to_spectogram.py's output, as you can see, the same thing happened once again. And thanks for the advice regarding the 3-second segments! :) |
I really don't know what the problem is... |
Thanks for replying! If you're interested in taking a look at it, I have included the |
I seem to have figured out what was happening. It isn't an isolated problem for these English samples I used. I eventually got rid of them from my big dataset and tried running the wav_to_spectrogram again and the same thing happened with French. Basically, when the SpectrogramGenerator is run, those segmented files are turned into spectrograms by Sox. What happens there is that, since it calculates the width based on -X (capital X) parameter, which is the pixels per second parameter, it sometimes, for reasons unknown to me, outputs wrong dimension - instead of [129, 150, 1] it does [129, 149, 1]. Therefore, I tried adding the -x (small x) parameter which defines the overall size of the width at this line and the appropriate value for it. It seems to have solved the issue, but I wonder what's your comment on this. If that's fine, I'll make a pull request. Thanks! |
Hmm, interesting problem. I'm not sure but reading the manual page of Sox, it seems that |
Hi,
I'm working with four languages and for each I have downloaded only one video so that I can check that the scripts work as they should before running them on my VM on the cloud.
The issue I have is that the script wav_to_spectogram.py acts weird with one language.
The languages and the number of segmented .wav file for each are:
So, the expected result after running the script is that there will be 38 or 39 .png spectograms for each language since the language with the least number of .wav files is French. It does execute as it should when I run it for all the languages except English:
But running the script with English manages to count only 13 files in English, even though there are 42:
I still haven't come up with an explanation to why it's happening, so any clue would be of a great help!
Here's the sources.yml that I used to download the videos if someone prefers to check it himself.
I'd also note that I'm working with 3-seconds segments, so if someone will be recreating what I am doing, it is important to change the number of seconds by which the files will be splitted. It is on the line 66 in download_youtube.py from:
command = ["ffmpeg", "-y", "-i", f, "-map", "0", "-ac", "1", "-ar", "16000", "-f", "segment", "-segment_time", "10", output_filename]
to:
command = ["ffmpeg", "-y", "-i", f, "-map", "0", "-ac", "1", "-ar", "16000", "-f", "segment", "-segment_time", "3", output_filename]
For the same reason, it is necessary to change the size of the output spectogram on line 70 in wav_to_spectogram.py from:
parser.add_argument('--shape', dest='shape', default=[129, 500, 1], type=int, nargs=3)
to:
parser.add_argument('--shape', dest='shape', default=[129, 150, 1], type=int, nargs=3)
Thank you!
The text was updated successfully, but these errors were encountered: