wav_to_spectogram.py stops converting before it should #11

ibro45 · 2018-11-23T21:09:40Z

Hi,

I'm working with four languages and for each I have downloaded only one video so that I can check that the scripts work as they should before running them on my VM on the cloud.

The issue I have is that the script wav_to_spectogram.py acts weird with one language.
The languages and the number of segmented .wav file for each are:

Croatian - 64
English - 42
French - 39
Spanish - 45

So, the expected result after running the script is that there will be 38 or 39 .png spectograms for each language since the language with the least number of .wav files is French. It does execute as it should when I run it for all the languages except English:

But running the script with English manages to count only 13 files in English, even though there are 42:

I still haven't come up with an explanation to why it's happening, so any clue would be of a great help!

Here's the sources.yml that I used to download the videos if someone prefers to check it himself.

croatian:
  users:
    -
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdEh39boAuP-JPeDR7dy6wih

english:
  users:
    - 
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdHSp1oIY4L_t5xX0dFV3GMH

french:
  users:
    - 
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdEgT-oLhk11Xjbev7Q02F3-
spanish:
  users:
    - 
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdHpKmps4DaomrqXd8VmZV1g

I'd also note that I'm working with 3-seconds segments, so if someone will be recreating what I am doing, it is important to change the number of seconds by which the files will be splitted. It is on the line 66 in download_youtube.py from:

command = ["ffmpeg", "-y", "-i", f, "-map", "0", "-ac", "1", "-ar", "16000", "-f", "segment", "-segment_time", "10", output_filename]

to:

command = ["ffmpeg", "-y", "-i", f, "-map", "0", "-ac", "1", "-ar", "16000", "-f", "segment", "-segment_time", "3", output_filename]

For the same reason, it is necessary to change the size of the output spectogram on line 70 in wav_to_spectogram.py from:

parser.add_argument('--shape', dest='shape', default=[129, 500, 1], type=int, nargs=3)

to:

parser.add_argument('--shape', dest='shape', default=[129, 150, 1], type=int, nargs=3)

Thank you!

The text was updated successfully, but these errors were encountered:

ibro45 · 2018-11-24T15:59:49Z

Since I have mentioned that I'm using 3-second segments, I'm interested in what do you think about increasing the pixel_per_second size from 50 to 100? Then I'd have 129x300x1 spectograms, which may result in the C(R)NN being able to detect patterns easier, isn't it? I'm still a newbie in this, sorry!

Bartzi · 2018-11-27T13:01:24Z

Hmm, interesting behaviour... can it be that your english samples contain lots of silence?
Have a look at this line of code. Everything that contains silence is just skipped.

To your second question:
That depends on the size of the actual regions in the voice samples. It could get better, but it might also not help... You might need to incease the size of the receptive field for the network in order to capture meaningful features.
But it is worth a try =)

ibro45 · 2018-11-30T15:55:42Z

I've checked the samples, they seem to be alright. I also tried commenting out the two lines that ignore samples containing lots of silence and the same behaviour was repeated.

I also tried it on my whole dataset. The first output is the output after segmentation of the files, which tells how many of them there are. The second output is the wav_to_spectogram.py's output, as you can see, the same thing happened once again.

And thanks for the advice regarding the 3-second segments! :)

Bartzi · 2018-12-08T14:32:08Z

I really don't know what the problem is...
the iterator definitely stops when working on english, because of some reasons... but I'm afraid I can not help you further from this end without access to the data...

ibro45 · 2018-12-08T14:38:14Z

Thanks for replying! If you're interested in taking a look at it, I have included the sources.yml's content in the initial post. Each playlist contains just a video per each language whose purpose was testing that everything behaves as it should before running it on the cloud, so it's not going to be a trouble downloading the data.

ibro45 · 2018-12-27T15:32:43Z

I seem to have figured out what was happening.

It isn't an isolated problem for these English samples I used. I eventually got rid of them from my big dataset and tried running the wav_to_spectrogram again and the same thing happened with French.

Basically, when the SpectrogramGenerator is run, those segmented files are turned into spectrograms by Sox. What happens there is that, since it calculates the width based on -X (capital X) parameter, which is the pixels per second parameter, it sometimes, for reasons unknown to me, outputs wrong dimension - instead of [129, 150, 1] it does [129, 149, 1].
(Note that I'm using 3-second segments and 50 pixel per second)

Therefore, I tried adding the -x (small x) parameter which defines the overall size of the width at this line and the appropriate value for it.

It seems to have solved the issue, but I wonder what's your comment on this. If that's fine, I'll make a pull request.

Thanks!

Bartzi · 2019-01-14T08:47:08Z

Hmm,

interesting problem. I'm not sure but reading the manual page of Sox, it seems that -x only sets the maximum width of spectogram. But all in all that should not be a problem, since the audio snippets should always have the same length, so I would be very happy to have a look at a nice PR 😄

ibro45 closed this as completed Jan 28, 2019

ibro45 mentioned this issue May 15, 2019

Cannot predict some files #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wav_to_spectogram.py stops converting before it should #11

wav_to_spectogram.py stops converting before it should #11

ibro45 commented Nov 23, 2018 •

edited

Loading

ibro45 commented Nov 24, 2018 •

edited

Loading

Uh oh!

Bartzi commented Nov 27, 2018

Uh oh!

ibro45 commented Nov 30, 2018

Uh oh!

Bartzi commented Dec 8, 2018

Uh oh!

ibro45 commented Dec 8, 2018

Uh oh!

ibro45 commented Dec 27, 2018

Uh oh!

Bartzi commented Jan 14, 2019

Uh oh!

wav_to_spectogram.py stops converting before it should #11

wav_to_spectogram.py stops converting before it should #11

Comments

ibro45 commented Nov 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ibro45 commented Nov 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bartzi commented Nov 27, 2018

Uh oh!

ibro45 commented Nov 30, 2018

Uh oh!

Bartzi commented Dec 8, 2018

Uh oh!

ibro45 commented Dec 8, 2018

Uh oh!

ibro45 commented Dec 27, 2018

Uh oh!

Bartzi commented Jan 14, 2019

Uh oh!

ibro45 commented Nov 23, 2018 •

edited

Loading

ibro45 commented Nov 24, 2018 •

edited

Loading