Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting the result #31

Open
jayakrishnanmm opened this issue Sep 12, 2023 · 7 comments
Open

Interpreting the result #31

jayakrishnanmm opened this issue Sep 12, 2023 · 7 comments
Labels
question Further information is requested

Comments

@jayakrishnanmm
Copy link

After following the inference steps, I got below values for u1-u5,p, w1-w3

u1=tensor([[1.7443]]) u2=tensor([[1.5404]]) u3=tensor([[1.7297]]) u4=tensor([[1.7074]]) u5=tensor([[1.7606]]) p=tensor([[[1.1559],
[1.2266],
[1.2165],
[1.1115],
[1.1052],
[1.1074],
[1.0690],
[1.2223],
[1.0949],
[1.1671],
[1.0795],
[1.2557],
[1.0595],
[1.1116],
[1.1818],
[1.1300],
[1.2001],
[1.1101],
[1.1616],
[1.0864],
[1.1390],
[0.7162],
[0.8037],
[0.8568],
[0.8601],
[0.8054],
[0.8418],
[0.8683],
[0.7827],
[0.8825],
[0.6441],
[0.7901],
[0.7464],
[0.6433],
[0.8020],
[0.8223],
[0.7503],
[0.7563],
[0.8885],
[0.8561],
[0.8105],
[0.8625],
[0.8481],
[0.8317],
[0.8435],
[0.8590],
[0.8139],
[0.7567],
[0.8845],
[0.8129]]]) w1=tensor([[[ 0.1104],
[ 0.2297],
[ 0.2281],
[ 0.0758],
[ 0.0577],
[ 0.1400],
[-0.0202],
[ 0.1290],
[ 0.0133],
[ 0.2836],
[ 0.0878],
[ 0.3509],
[ 0.0595],
[ 0.0864],
[ 0.1327],
[ 0.0924],
[ 0.1755],
[ 0.0542],
[ 0.1502],
[ 0.0426],
[ 0.1247],
[ 0.9526],
[ 1.0063],
[ 1.0826],
[ 1.0663],
[ 0.9944],
[ 1.0674],
[ 1.1030],
[ 1.0209],
[ 1.0798],
[ 0.8870],
[ 1.0020],
[ 0.9713],
[ 0.8827],
[ 1.0125],
[ 1.0476],
[ 0.9834],
[ 0.9916],
[ 1.1105],
[ 1.0714],
[ 1.0451],
[ 1.0725],
[ 1.0760],
[ 1.0540],
[ 1.0640],
[ 1.0696],
[ 1.0384],
[ 0.9810],
[ 1.0873],
[ 1.0260]]]) w2=tensor([[[0.6134],
[0.7956],
[0.9271],
[0.6699],
[0.5889],
[0.6262],
[0.4851],
[0.6197],
[0.5322],
[0.9736],
[0.7261],
[1.0064],
[0.5336],
[0.6623],
[0.6925],
[0.6142],
[0.7239],
[0.5258],
[0.6993],
[0.5545],
[0.7373],
[0.9153],
[0.9858],
[1.0829],
[1.0741],
[1.0285],
[1.0639],
[1.0860],
[0.9937],
[1.1015],
[0.8865],
[1.0654],
[0.9615],
[0.9004],
[0.9985],
[1.0304],
[0.9705],
[0.9877],
[1.0782],
[1.0342],
[1.0029],
[1.0279],
[1.0328],
[1.0081],
[1.0391],
[1.0626],
[1.0167],
[0.9367],
[1.0728],
[1.0083]]]) w3=tensor([[[0.9717],
[1.0951],
[1.1173],
[0.9834],
[0.9371],
[0.9385],
[0.8971],
[1.0128],
[0.9022],
[1.1262],
[0.9963],
[1.1767],
[0.9003],
[0.9701],
[0.9989],
[0.9520],
[1.0238],
[0.9401],
[1.0122],
[0.9360],
[1.0347],
[1.0048],
[1.0965],
[1.1611],
[1.1419],
[1.1097],
[1.1247],
[1.1732],
[1.0983],
[1.1891],
[0.9894],
[1.1176],
[1.0471],
[0.9793],
[1.0938],
[1.1114],
[1.0798],
[1.0866],
[1.2085],
[1.1529],
[1.0992],
[1.1474],
[1.1448],
[1.1297],
[1.1249],
[1.1632],
[1.1026],
[1.0581],
[1.1813],
[1.1074]]])

Now, how to interpret this result ?

@YuanGongND
Copy link
Owner

YuanGongND commented Sep 12, 2023

Hi there,

Please check paper Section 3.1.,

We re-scale utterance and word-level scores to 0-2, making them on the same scale as the phoneme scores.

So all scores should be in the range 0-2 (though outliers are possible).

p is the phone-level score, should be one value for each phone.

u is utterance score, from 1-5 should be: https://github.com/YuanGongND/gopt/blob/bed909daf8eca035095871e51642525acc5b9b55/src/traintest.py#L39C5-L39C84

w is word score, from 1-3 should be:

word_header_score = ['accuracy', 'stress', 'total']

-Yuan

@YuanGongND YuanGongND added the question Further information is requested label Sep 12, 2023
@jayakrishnanmm
Copy link
Author

jayakrishnanmm commented Sep 12, 2023

Hi,
Thanks for the clarification. But still didn't get why p,w1-w3 have size of 50. Is this the size of English phones dict ? Where is it defined ?

@YuanGongND
Copy link
Owner

YuanGongND commented Sep 13, 2023

The word score is propagated to the phone level, i.e., the word scores you get are still at the phone-level, they just get word-level supervision. see

def process_feat_seq_word(feat, keys, labels):
key_set = []
for i in range(keys.shape[0]):
cur_key = keys[i].split('.')[0]
key_set.append(cur_key)
utt_cnt = len(list(set(key_set)))
print('In total utterance number : ' + str(utt_cnt))
# -1 means n/a
seq_label = np.zeros([utt_cnt, 50, 4]) - 1
prev_utt_id = keys[0].split('.')[0]
row = 0
for i in range(feat.shape[0]):
cur_utt_id, cur_tok_id = keys[i].split('.')[0], int(keys[i].split('.')[1])
if cur_utt_id != prev_utt_id:
row += 1
prev_utt_id = cur_utt_id
seq_label[row, cur_tok_id, 0:3] = labels[i, 3:6]
seq_label[row, cur_tok_id, 3] = labels[i, 1]
return seq_label
on how we process that in training.

For inference, it is easy - you just need to average the scores of each word, e.g., for [word1](phone 1,2,3,4) [word2][phone 5,6], you will get 4 word scores for word 1; and 2 word scores for word 2, average the 4 scores for word 1 and 2 scores for word 2.

50 is the sequence cutoff length, it is totally not related to the phone vocabulary, which is also not 50.

@YuanGongND
Copy link
Owner

You would expect the word-level scores and phone-level scores of same length.

@jayakrishnanmm
Copy link
Author

got it. How about the unused phone/word level scores ? are they junk values ?

@YuanGongND
Copy link
Owner

I cannot recall if the code automatically trim the padded tokens, but you should ignore the scores on the padded tokens.

@jayakrishnanmm
Copy link
Author

Yes, I did that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants