-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remaining issues/lack of annotation consistency #638
Comments
Several responses here:
|
Hi @caifand About the first point, I've went through the creators and out of 1120 creator annotations, there are only 15 "person" creators (1,3%). I marked them with an attribute |
Cool, thanks! By the way, how do you work with tei xml? In python? |
Yes python has nice library for reading and manipulating XML (much easier to use than the Java ones I think), for instance Then I have to say working in general with XML remains painful by design ;) |
Here are some remaining issues we observed in the current annotation scheme:
we kept so far the type name
creator
, but actually we almost always have here the "publisher" of the software, having a person name is exceptional. We could thus use ratherpublisher
as name for this annotation, given than annotated entities have not always created the software but simply commercialize it (for instance IBM is not the creator of SPSS, it has acquired SPSS Inc.).Sometimes, the name of the software publisher are used to refer to a software (PMC3534176):
MathWorks is the company developing MATLAB (the correct name was introduced at the beginning of the paper, but a strange shift in referring expressions happened in the middle of the paper!). It is hard to decide how to annotate this case.
We leave "MAS5" (the acronym of "Affymetrix microarray suite") unannotated while it could be valuable for disambiguation. Currently software name are always considered as a continuous chunk.
As an improvement, we could use non-continuous software name annotation like this:
We observed this case as encoded as another software entity (as above first example), sometimes both together in one, sometimes only the framework is annotated (as above second example). This case is not frequent and we have not fixed an annotation rule for this yet.
However we have not considered for the moment the "GraphPad Prism" case, where the name of the software is actually Prism and its editor is GraphPad, so it should normally be annotated like the "Microsoft Excel" case.
Similarly "Lotus Notes" is always identified as such, and not as "notes" from Lotus Inc. (although it is now called IBM Notes, but it's another story). So here unconsistencies remain for the moment.
The text was updated successfully, but these errors were encountered: