Interpreting the result #31

jayakrishnanmm · 2023-09-12T06:20:09Z

After following the inference steps, I got below values for u1-u5,p, w1-w3

u1=tensor([[1.7443]]) u2=tensor([[1.5404]]) u3=tensor([[1.7297]]) u4=tensor([[1.7074]]) u5=tensor([[1.7606]]) p=tensor([[[1.1559],
[1.2266],
[1.2165],
[1.1115],
[1.1052],
[1.1074],
[1.0690],
[1.2223],
[1.0949],
[1.1671],
[1.0795],
[1.2557],
[1.0595],
[1.1116],
[1.1818],
[1.1300],
[1.2001],
[1.1101],
[1.1616],
[1.0864],
[1.1390],
[0.7162],
[0.8037],
[0.8568],
[0.8601],
[0.8054],
[0.8418],
[0.8683],
[0.7827],
[0.8825],
[0.6441],
[0.7901],
[0.7464],
[0.6433],
[0.8020],
[0.8223],
[0.7503],
[0.7563],
[0.8885],
[0.8561],
[0.8105],
[0.8625],
[0.8481],
[0.8317],
[0.8435],
[0.8590],
[0.8139],
[0.7567],
[0.8845],
[0.8129]]]) w1=tensor([[[ 0.1104],
[ 0.2297],
[ 0.2281],
[ 0.0758],
[ 0.0577],
[ 0.1400],
[-0.0202],
[ 0.1290],
[ 0.0133],
[ 0.2836],
[ 0.0878],
[ 0.3509],
[ 0.0595],
[ 0.0864],
[ 0.1327],
[ 0.0924],
[ 0.1755],
[ 0.0542],
[ 0.1502],
[ 0.0426],
[ 0.1247],
[ 0.9526],
[ 1.0063],
[ 1.0826],
[ 1.0663],
[ 0.9944],
[ 1.0674],
[ 1.1030],
[ 1.0209],
[ 1.0798],
[ 0.8870],
[ 1.0020],
[ 0.9713],
[ 0.8827],
[ 1.0125],
[ 1.0476],
[ 0.9834],
[ 0.9916],
[ 1.1105],
[ 1.0714],
[ 1.0451],
[ 1.0725],
[ 1.0760],
[ 1.0540],
[ 1.0640],
[ 1.0696],
[ 1.0384],
[ 0.9810],
[ 1.0873],
[ 1.0260]]]) w2=tensor([[[0.6134],
[0.7956],
[0.9271],
[0.6699],
[0.5889],
[0.6262],
[0.4851],
[0.6197],
[0.5322],
[0.9736],
[0.7261],
[1.0064],
[0.5336],
[0.6623],
[0.6925],
[0.6142],
[0.7239],
[0.5258],
[0.6993],
[0.5545],
[0.7373],
[0.9153],
[0.9858],
[1.0829],
[1.0741],
[1.0285],
[1.0639],
[1.0860],
[0.9937],
[1.1015],
[0.8865],
[1.0654],
[0.9615],
[0.9004],
[0.9985],
[1.0304],
[0.9705],
[0.9877],
[1.0782],
[1.0342],
[1.0029],
[1.0279],
[1.0328],
[1.0081],
[1.0391],
[1.0626],
[1.0167],
[0.9367],
[1.0728],
[1.0083]]]) w3=tensor([[[0.9717],
[1.0951],
[1.1173],
[0.9834],
[0.9371],
[0.9385],
[0.8971],
[1.0128],
[0.9022],
[1.1262],
[0.9963],
[1.1767],
[0.9003],
[0.9701],
[0.9989],
[0.9520],
[1.0238],
[0.9401],
[1.0122],
[0.9360],
[1.0347],
[1.0048],
[1.0965],
[1.1611],
[1.1419],
[1.1097],
[1.1247],
[1.1732],
[1.0983],
[1.1891],
[0.9894],
[1.1176],
[1.0471],
[0.9793],
[1.0938],
[1.1114],
[1.0798],
[1.0866],
[1.2085],
[1.1529],
[1.0992],
[1.1474],
[1.1448],
[1.1297],
[1.1249],
[1.1632],
[1.1026],
[1.0581],
[1.1813],
[1.1074]]])

Now, how to interpret this result ?

YuanGongND · 2023-09-12T06:45:56Z

Hi there,

Please check paper Section 3.1.,

We re-scale utterance and word-level scores to 0-2, making them on the same scale as the phoneme scores.

So all scores should be in the range 0-2 (though outliers are possible).

p is the phone-level score, should be one value for each phone.

u is utterance score, from 1-5 should be: https://github.com/YuanGongND/gopt/blob/bed909daf8eca035095871e51642525acc5b9b55/src/traintest.py#L39C5-L39C84

w is word score, from 1-3 should be:

gopt/src/traintest.py

Line 41 in bed909d

word_header_score = ['accuracy', 'stress', 'total']

-Yuan

jayakrishnanmm · 2023-09-12T08:04:38Z

Hi,
Thanks for the clarification. But still didn't get why p,w1-w3 have size of 50. Is this the size of English phones dict ? Where is it defined ?

YuanGongND · 2023-09-13T06:34:32Z

The word score is propagated to the phone level, i.e., the word scores you get are still at the phone-level, they just get word-level supervision. see

gopt/src/prep_data/gen_seq_data_word.py

Lines 31 to 55 in bed909d

    
           def process_feat_seq_word(feat, keys, labels): 
        
               key_set = [] 
        
               for i in range(keys.shape[0]): 
        
                   cur_key = keys[i].split('.')[0] 
        
                   key_set.append(cur_key) 
        
               utt_cnt = len(list(set(key_set))) 
        
               print('In total utterance number : ' + str(utt_cnt)) 
        
               # -1 means n/a 
        
               seq_label = np.zeros([utt_cnt, 50, 4]) - 1 
        
               prev_utt_id = keys[0].split('.')[0] 
        
               row = 0 
        
               for i in range(feat.shape[0]): 
        
                   cur_utt_id, cur_tok_id = keys[i].split('.')[0], int(keys[i].split('.')[1]) 
        
                   if cur_utt_id != prev_utt_id: 
        
                       row += 1 
        
                       prev_utt_id = cur_utt_id 
        
                   seq_label[row, cur_tok_id, 0:3] = labels[i, 3:6] 
        
                   seq_label[row, cur_tok_id, 3] = labels[i, 1] 
        
               return seq_label

on how we process that in training.

For inference, it is easy - you just need to average the scores of each word, e.g., for [word1](phone 1,2,3,4) [word2][phone 5,6], you will get 4 word scores for word 1; and 2 word scores for word 2, average the 4 scores for word 1 and 2 scores for word 2.

50 is the sequence cutoff length, it is totally not related to the phone vocabulary, which is also not 50.

YuanGongND · 2023-09-13T06:35:24Z

You would expect the word-level scores and phone-level scores of same length.

jayakrishnanmm · 2023-09-13T07:03:39Z

got it. How about the unused phone/word level scores ? are they junk values ?

YuanGongND · 2023-09-15T08:19:21Z

I cannot recall if the code automatically trim the padded tokens, but you should ignore the scores on the padded tokens.

jayakrishnanmm · 2023-09-16T01:41:33Z

Yes, I did that.

YuanGongND added the question Further information is requested label Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpreting the result #31

Interpreting the result #31

jayakrishnanmm commented Sep 12, 2023

YuanGongND commented Sep 12, 2023 •

edited

Loading

jayakrishnanmm commented Sep 12, 2023 •

edited

Loading

YuanGongND commented Sep 13, 2023 •

edited

Loading

YuanGongND commented Sep 13, 2023

jayakrishnanmm commented Sep 13, 2023

YuanGongND commented Sep 15, 2023

jayakrishnanmm commented Sep 16, 2023

Interpreting the result #31

Interpreting the result #31

Comments

jayakrishnanmm commented Sep 12, 2023

YuanGongND commented Sep 12, 2023 • edited Loading

jayakrishnanmm commented Sep 12, 2023 • edited Loading

YuanGongND commented Sep 13, 2023 • edited Loading

YuanGongND commented Sep 13, 2023

jayakrishnanmm commented Sep 13, 2023

YuanGongND commented Sep 15, 2023

jayakrishnanmm commented Sep 16, 2023

YuanGongND commented Sep 12, 2023 •

edited

Loading

jayakrishnanmm commented Sep 12, 2023 •

edited

Loading

YuanGongND commented Sep 13, 2023 •

edited

Loading