You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the kaldilm library to convert arpa files to FST text format to make an n-gram language model with FSTs. In the Arpa file, you can see the entire n-gram along with its probabilities. However, in the FST txt format, you just see numbers. It's quite easy to find out which number represents which n-gram when the data is limited, but as the amount of data grows, it becomes harder to understand which state represents which history (of previous words).
One possible solution would be to perform DFS on the graph and label each state according to the previous states and the arcs between them, but it would take a long time when the data is large (and hence, the model is large as well). It would be much easier if I had a dictionary that shows what each state number in the final FST text file represents.
Note: I know we have an argument called --keep-symbols but it only stores information about the arcs while I'm interested in knowing what each state represents.
Is there a way to figure out which state represents which history in the FST text file? Thank you for your help.
The text was updated successfully, but these errors were encountered:
I am using the
kaldilm
library to convert arpa files to FST text format to make an n-gram language model with FSTs. In the Arpa file, you can see the entire n-gram along with its probabilities. However, in the FST txt format, you just see numbers. It's quite easy to find out which number represents which n-gram when the data is limited, but as the amount of data grows, it becomes harder to understand which state represents which history (of previous words).One possible solution would be to perform DFS on the graph and label each state according to the previous states and the arcs between them, but it would take a long time when the data is large (and hence, the model is large as well). It would be much easier if I had a dictionary that shows what each state number in the final FST text file represents.
Note: I know we have an argument called
--keep-symbols
but it only stores information about the arcs while I'm interested in knowing what each state represents.Is there a way to figure out which state represents which history in the FST text file? Thank you for your help.
The text was updated successfully, but these errors were encountered: