-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix syntax annotation and add syntactic movement support (through alignments/relations) (T062) #138
Comments
Hi Alex, That sounds like a bug indeed; a new field should appear for syntax annotation when you click the plus button. I'll see if I can reproduce it and fix it (might take a few days). |
Bug reproduces with javascript error:
|
OK, this should be fixed now in v0.7.14. |
Thank you so much. I just updated to this version and do not encounter this bug. |
I have another question: what is the best way to handle syntactic movement annotation (i.e. inserting a null element that is co-indexed with another linguistic material in a sentence) in FLAT? |
Could you give a specific example of such an annotation perhaps? |
Here is an example from Syntactic annotation manual for AAPCAppE: |
Thanks for the elaborate report, that indeed looks like another bug (syntactic annotation hasn't been too widely used yet so I'm afraid you're kind of the guinnea pig in this, sorry). I'll investigate and fix it soon! (I'll get back to you on the syntactic movement issue as well, that may prove challenging in the current setup) |
I really like the FoLiA paradigm and would love to use FLAT for our project. For syntactic movement annotation, the simplest solution we are thinking of is just inserting the trace symbols into the text, and then assign syntactic labels to them as to normal tokens. As I understand, we can do this kind of insertion using FLAT, right? Again, thank you so much for your great support. |
Yes, that simple solution would indeed be a decent workaround and should work right away (after I fix the bug you reported). Perhaps in the underlying FoLiA tokenisation you can also mark such pseudo/trace elements by assigning a special class to the word elements ( <s xml:id="s1">
<w xml:id="w1" class="normal">
<t>What</t>
<pos class="WPRO" />
</w>
<w xml:id="w2" class="normal">
<t>is</t>
<pos class="BEP" />
</w>
<w xml:id="w3" class="normal">
<t>your</t>
<pos class="PRO$" />
</w>
<w xml:id="w4" class="normal">
<t>name</t>
<pos class="N" />
</w>
<w xml:id="w.bep-2" class="empty">
<t>*-2</t>
</w>
<w xml:id="w.wnp-1" class="empty">
<t>*T*-1</t>
</w>
<w xml:id="w5">
<t>?</t>
<pos class="PUNC" />
</w>
<syntax>
<su class="CP-QUE-MAT">
<su xml:id="s1.WNP-1" class="WNP">
<wref id="w1" />
</su>
<su class="IP-SUB">
<su xml:id="s1.BEP-2" class="BEP">
<wref id="w2" />
</su>
<su class="NP-SBJ">
<su class="NP-POS">
<wref id="w3" />
</su>
<wref id="w4" />
</su>
<su class="VP">
<su class="BEP">
<wref id="w.bep-2" />
</su>
<su class="NP-PRD">
<wref id="w.wnp-1" />
</su>
</su>
</su>
<su class="PUNC">
<wref id="w5" />
</su>
</su>
</syntax>
</s> I'm not really a fan of using extra words/tokens ( I'm also thinking what the most elegant representation would be from a FoLiA perspective. I'm not very knowledgeable on syntactic movement, but I guess these trace elements should ideally not be expressed in the tokenisation layer but only as part of the syntax tree? The notion of co-indexed should then also be expressed explicitly rather than conventionally, which could be done with FoLiA's alignments (basically higher-order references). I come up with something like this then: <s xml:id="s1">
<w xml:id="w1">
<t>What</t>
<pos class="WPRO" />
</w>
<w xml:id="w2">
<t>is</t>
<pos class="BEP" />
</w>
<w xml:id="w3">
<t>your</t>
<pos class="PRO$" />
</w>
<w xml:id="w4">
<t>name</t>
<pos class="N" />
</w>
<w xml:id="w5">
<t>?</t>
<pos class="PUNC" />
</w>
<syntax>
<su class="CP-QUE-MAT">
<su xml:id="s1.WNP-1" class="WNP">
<wref id="w1" />
</su>
<su class="IP-SUB">
<su xml:id="s1.BEP-2" class="BEP">
<wref id="w2" />
</su>
<su class="NP-SBJ">
<su class="NP-POS">
<wref id="w3" />
</su>
<wref id="w4" />
</su>
<su class="VP">
<su class="BEP">
<alignment class="A-movement">
<aref id="s1.BEP-2" type="su"/>
</alignment>
</su>
<su class="NP-PRD">
<alignment class="Wh-movement">
<aref id="s1.WNP-1" type="su"/>
</alignment>
</su>
</su>
</su>
<su class="PUNC">
<wref id="w5" />
</su>
</su>
</syntax>
</s> This looks much cleaner to me than the workaround, though it's currently impossible to do in FLAT and would demand an extension (or we could solve it in a postprocessing conversion script, though less elegant). What do you think? Thanks for considering FLAT for your project! :) It would indeed be great if it can be applied to your task and is a great test-run for syntax annotation for us as well. |
I'm debugging the original issue with the dissappearing 'They' and can confirm this indeed goes wrong (just documenting this mostly for my own fixing process, I'll answer the other questions in a separate comment):
The second query seems the culprit and shouldn't have been formed (the fourth also not). Additionally, after doing this I end up with an inconsistency in the front end when hovering over the annotation: stack trace:
Todo:
|
That sounds about right yes. Whether I can visualize the alignments as nicely in the syntax tree viewer remains to be seen though.
Not really a bug as such, the visualisation was only designed to represent syntactic annotation. But inclusion of part of speech tags makes sense. (I was a bit unsure whether to represent certain parts as PoS or syntactic unit (or even both) when translating your example).
True, that is indeed problematic in the workaround approach and makes it less ideal. FLAT doesn't really do structure editing (adding words/sentences/etc) yet (this has long been planned in #5) and focusses mostly on annotation. Perhaps it's best to focus on the more elegant solution (with alignments, planned also in #84).
It sounds feasible yes, I think I should be able to implement the necessary extensions and bugfixes in the coming two/three months |
Further debugging: Somehow I did end up in a correct state despite the respan error: When adding a V under VP the representation seemed fine but the tree visualizer couldn't visualize it properly (perhaps due to one child being a
When trying to add a
|
Those respans I said were wrong actually are not wrong! I first respan the parent so it doesn't include the |
After a syntactic unit on the first word, the order is wrong: <syntax>
<su annotator="flat" annotatortype="manual" class="S" datetime="2018-10-18T12:56:21" xml:id="issue138.text.su.1">
<wref id="issue138.p.1.s.1.w.2" t="brought"/>
<wref id="issue138.p.1.s.1.w.3" t="the"/>
<wref id="issue138.p.1.s.1.w.4" t="documents"/>
<wref id="issue138.p.1.s.1.w.5" t="on"/>
<wref id="issue138.p.1.s.1.w.6" t="Tuesday"/>
<wref id="issue138.p.1.s.1.w.7" t="."/>
<su annotator="flat" annotatortype="manual" class="PRON" datetime="2018-10-18T12:56:21" xml:id="issue138.text.su.2">
<wref id="issue138.p.1.s.1.w.1" t="They"/>
</su>
</su>
</syntax> (I thought this might explain subissue (c) in the tree visualisation going wrong, but no, that also goes wrong if the order is correct) Todo:
|
The final respan when clearing all of the remaining parent syntactic unit (which is common when the parent unit is fully covered child syntactic units), doesn't happen is and instead if a kind of no-operation:
Probably because I didn't allow RESPANs to be empty, but that is valid and necessary here. So, new subissue replacing (b) (which turned out not to be wrong):
|
Small update: I haven't forgotten about this but since this issue requires changes in FoLiA and its libraries I'm taking it along in the development of FoLiA v2.0 which is currently in full progress (and which depends on a fair amount of other new stuff too so takes some time). |
I'm sure that you are working hard on certain radical changes. Thank you. |
proycon/flat#138 and serving as an example and test for hidden tokens (#51)
proycon/flat#138 and serving as an example and test for hidden tokens (#58)
Now FoLiA v2 is released, I'm working on FLAT again and making progress regarding this issue. I hadn't commented on this yet though:
That would be possible by introducing a morphology layer on the "lemme" word, with two morphemes, and then link to the morphemes from the syntax layer. FoLiA supports that but it may require some additional work in FLAT still. |
subissue d) (Insertion point of ADD queries should be computed in case of nested span elements (su)) seems okay now. |
A new subissue f arose, which may be related to e) (Allow RESPAN NONE on parent when inserting children (without deleting the parent span) ).
This seems an instance of what @luutuntin already reported here:
FQL queries:
This seems caused by the |
Subissue f is fixed now, e is also confirmed solved. Subissue c remains still. |
Tree visualisation (subissue c) is fixed now as well |
A short summary of dependencies to be implemented for proper syntactic movement support:
These should be realisable in the immediate short term (I hope), even though #84 is fairly big component. The next lists what is additionally needed if you want to refer to sub-parts of a token (i.e. morphemes) rather than to a whole token. A work around is to adapt the tokenisation layer (in a preprocessing step prior to FLAT).
|
@luutuntin Привет! FLAT v0.8.0 has been released a bit over a week ago, implementing a lot of syntax annotation fixes stemming from this issue. As I already mentioned in the previous summary post, I plan to implement relations (#84) for v0.9.0 (aiming for the beginning of June as I have some other priorities in other projects first). I just wanted to check if you guys are still planning on using FLAT for your syntactic movement annotation task, and what your timeline is? Can you also let me know if issue proycon/folia#50 has more or less priority for you? |
We are starting morphological annotation now; therefore, issue proycon/folia#50 has more priority for us. |
In addition, proycon/flat#134 is also relevant to us, because our annotators of morphological analysis may detect some errors in the transcripts and want to make certain corrections in FLAT, for instance, adding quotes. |
Sorry for my ignorance. This is solved by proycon/flat/#145. And I just realized that in FoLiA documentation the introduction and specification of hidden token annotation are the same as those of token annotation, which should not be. You might have forgotten to update them. |
Right, good point, something went wrong there indeed. I'll fix it! |
… visualise span annotation that don't span anything (proycon/flat#138)
I released FLAT v0.9.0 last week, this implements the essentials that should enable syntactic movement annotation, as mentioned in my comment from April 18th. Proper support for alternative annotations is also implemented in the latest release. I suggest we make a new issue to continue the discussion on additional features needed for syntactic movement (this one is getting rather long and most has been resolved). |
That sounds great. Thank you. |
Hi Maarten,
I'm exploring FLAT for our annotation project.
Opening this example in FLAT, I can access to a variety of annotation types (see the first attached picture); but when I choose "Syntactic Unit -- syntax-set" and click "Add" button, nothing happens (see the second attached picture). Do I miss something?
Best,
Alex
PS. I did try to create a FoLiA file with two declared annotation types: POS and Syntax, and encounter the same problem: I can add POS annotations but not Syntax ones.
The text was updated successfully, but these errors were encountered: