How to figure out which state represents which history in FST text file? #31

Banaf89 · 2023-04-24T09:32:07Z

I am using the kaldilm library to convert arpa files to FST text format to make an n-gram language model with FSTs. In the Arpa file, you can see the entire n-gram along with its probabilities. However, in the FST txt format, you just see numbers. It's quite easy to find out which number represents which n-gram when the data is limited, but as the amount of data grows, it becomes harder to understand which state represents which history (of previous words).

One possible solution would be to perform DFS on the graph and label each state according to the previous states and the arcs between them, but it would take a long time when the data is large (and hence, the model is large as well). It would be much easier if I had a dictionary that shows what each state number in the final FST text file represents.

Note: I know we have an argument called --keep-symbols but it only stores information about the arcs while I'm interested in knowing what each state represents.

Is there a way to figure out which state represents which history in the FST text file? Thank you for your help.

The text was updated successfully, but these errors were encountered:

csukuangfj · 2023-04-24T09:34:52Z

Is there a way to figure out which state represents which history in the FST text file? Thank you for your help.

That information is an implementation detail and can be accessed in C++ only.
please read the c++ code if you want to learn more.

kaldilm/kaldilm/csrc/arpa_lm_compiler.cc

Line 142 in d2fff41

HistoryMap history_;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to figure out which state represents which history in FST text file? #31

How to figure out which state represents which history in FST text file? #31

Banaf89 commented Apr 24, 2023

csukuangfj commented Apr 24, 2023

How to figure out which state represents which history in FST text file? #31

How to figure out which state represents which history in FST text file? #31

Comments

Banaf89 commented Apr 24, 2023

csukuangfj commented Apr 24, 2023