-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write::Treex renders the iset structure invalid #23
Comments
I tracked the problem down but I don't see an easy solution. The I understand that this hack helps the writer make sure that the list of values appears in the Treex file as But I don't like the idea that any code outside of the Lingua::Interset::FeatureStructure (or even outside Lingua::Interset) accesses the attributes of the FeatureStructure object directly. Any thoughts? |
Do you have a minimal test case? Some treex oneliner? |
Yes. First a correct example:
prints this: # sent_id a_tree-en-s1-root 1 who who PRON _ PronType=Int,Rel 0 root _ _ Then add the bug (Write::Treex):
prints this: # sent_id a_tree-en-s1-root 1 who who PRON _ PronType=Int|rel 0 root _ _ |
No, it calls One workaround would be to check if Write::Treex (there is an ugly way how to do it) is the last block and if not, then deserialize iset here. I will think about a better solution. |
Can't it be deserialized always, after the file's been written? |
NB: The problem is not urgent for me. Once I know that the bug is not directly in Interset, I can easily adjust the scenario and work around it. But of course we want to solve it eventually. |
That would slow down all Write::Treex. More writers in a scenario is a rare usecase (but fully legal, so I agree we should fix this issue). |
I know it would slow it down (although I do not know how much) but does it really matter if it is the last block in the scenario? |
I did no benchmark, but you need to access all nodes in the document and try to reset all
In most cases yes. If more documents are to be processed by the same job, you need to wait until the writer finishes. Even in single-document processing on one machine I always wait until treex finishes before I (or some script) try to access the output treex file. |
This bug surfaces when Interset is used after the document is written. Normally it does not happen because Write::Treex tends to be the last block of the scenario. But try to write the document in two different formats, i.e. your last two blocks will be Write::Treex and Write::CoNLLU (in this order). And make sure that there is a node with a multi-valued interset feature, e.g.
prontype=int|rel
.The Treex file is written correctly while CoNLL-U is not. Apparently when Write::CoNLLU takes its turn, the Lingua::Interset::FeatureStructure object is already corrupt. Somehow the "int|rel" string makes it down to the hash value, which would not happen if the set() method of the FeatureStructure was used.
Therefore I suspect that there is a bug in one of the methods of the
Node::Interset
role.The text was updated successfully, but these errors were encountered: