-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a utility to assign weights to a compiled transducer based on a corpus #16
Comments
Is it similar to supervised tagger training? @flammie |
Write the utility to process tagged corpus and the binary lttoolbox file and return weighted analyses. Closes apertium#16
Write the utility to process tagged corpus and the binary lttoolbox file and return weighted analyses. Closes apertium#16
Pretty much I'd say, a unigram tagger should work exactly the same if I haven't missed anything. |
Write the utility to process tagged corpus and the binary lttoolbox file and return weighted analyses. Closes apertium#16
Write the utility to process tagged corpus and the binary lttoolbox file and return weighted analyses. Closes apertium#16
Write the utility to process tagged corpus and the binary lttoolbox file and return weighted analyses. Closes apertium#16
So one way this could work is:
Questions:
|
I don't think even openfst has a defined intersection of weighted or two-tape automata, they just do the encoded intersection where a:b::W is treated as a special symbol in an automata intersection. It might be possible to add weights by way of intersection algorithm at least when the automata were mostly synchronised, otherwise I'd just do with composing. For the experiments I published on weighing automata we did compose(A, B), or at most compose(minus'(A', B'), B) which does something similar to priority union I guess. It required some trickery though. One could even just do the union(A, B) since B is gold corpus with good tags, right? In compose method you mainly lose if there is non 1:1 relation from the direction you compose I think, e.g. if you have foo+X:bar foo+X:baz. The part of A that doesn't get weighted by corpus should usually receive the penalty weight of unseen tokens. |
The reason for not just doing I was thinking compose would work, but we also don't have an implementation of |
#161 adds a compose (optional on matching sub-paths), though not very extensively tested :) also I have no idea what the expected value of composed weights would be |
I think weights are just added together in our WFSAs? Or theoretically using the weight structure's semiring's collect operation but we've always used the tropical semiring which is just +. |
So whatever operation you use on weights when following arcs should be used when composing? And if you want to compose |
@flammie newest uses + |
I imagine it will be called
lt-reweight
It should have two arguments:
grn.automorf.bin
grn.tagged
Where grn.tagged looks like:
And the output of the analyser for e.g.
poytugañeʼẽ
is:So, the analyses should be weighted
The text was updated successfully, but these errors were encountered: