What about Dates? #113

rcostu · 2014-11-17T16:43:05Z

Hi,

I have been working lately on Part-of-Speech tagging in Spanish and we tend to follow the EAGLES standard which uses tag "w" to define dates.

What about using any special tag to define dates? Such as DATE?

If not, how are they supposed to be tagged within this new standard?

dan-zeman · 2014-11-17T17:54:31Z

I would say that in "17 November 2014", "November" is NOUN and "17" and "2014" are NUM. In "17. 11. 2014", the numbers are NUM and the dots are PUNCT. Alternatively, one could consider "17" an ordinal numeral, in which case it would become an ADJ with the feature NumType=Ord.

dan-zeman · 2014-11-17T19:00:28Z

BTW, if you have a link to a source describing the EAGLES standard, I'd love to know about it. In the past, I tried several times to learn more about EAGLES but I found it tough to find resources on-line.

rcostu · 2014-11-17T19:04:30Z

Yes, that annotation is logical. I was just wondering is as some standards annotate the dates as a new PoS, may be worth considering adding it, more than delaying in the pipeline its detection.

A couple of links of EAGLES:

The official website: http://www.ilc.cnr.it/EAGLES96/home.html
Spanish tagset of EAGLES: http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html

The official information is hard though.

dan-zeman · 2014-11-17T20:47:07Z

Thanks, @rcostu ! So it looks like the Freeling analyzer tries to generally follow the EAGLES approach to annotation, but the "W" category is their extension over the EAGLES standard, unless I am missing something. It does not appear at the EAGLES site here:

http://www.ilc.cnr.it/EAGLES96/annotate/node20.html#SECTION00063000000000000000
http://www.ilc.cnr.it/EAGLES96/annotate/node16.html#cmobli

It is true that some corpora have a special tag for date/time expressions. (E.g. AnCora (Catalan and Spanish) have a "w" tag :-)) I personally am not much in favor of that, since it is always a compound expression made of "normal" words that have their morphological and syntactic properties. Dates can occur in various structures and they can be incomplete ("Jan 6" vs. "6 January 2005") so I think it is better to capture the relations and leave the interpretation for a specialized module. But apparently it is not the only possibility how to do it.

spyysalo · 2014-11-19T15:33:24Z

+1 for "normal" POS tags for the words that make up dates. (Also, UD v1, including the POS tags, is frozen until at least Oct 2015.)

On a related note, I would argue that in languages using the period as the ordinal indicator (such as Finnish, see e.g. http://en.wikipedia.org/wiki/Ordinal_indicator#Finnish) the period is part of the token and the analysis ADJ[NumType=Ord] is most appropriate for e.g. 5. in 5. maaliskuuta "5th of March".

rcostu · 2014-11-20T15:56:55Z

Yes @dan-zeman. Actually it seems that the Freeling team decided to put it over there. In fact they use the AnCora corpus which is developed by the same university.

I was asking just in case it is considered or any other language uses similar tags to get this information tagged.

I also support the use of normal POS tags to tag each word in a date and further processing is done just to extract the knowledge that it is a date or so on.

OT: Is any there any Spanish contribution to this project or I am the first one?

spyysalo · 2014-11-20T16:59:08Z

Great, I believe the original issue is then resolved, closing.

dan-zeman · 2014-11-20T20:40:40Z

@rcostu : Yes, I believe you are the first to work on UD for Spanish. You can have a look at the "stanfordized" Ancora we have in HamleDT 2.0, but it predates UD and it is an automatic conversion only. For POS tags and morphosyntactic features, you can have a look at the tagset conversion table that I uploaded here: http://universaldependencies.github.io/docs/tagset-conversion/es-conll2009-uposf.html (also automatic approximation, see the disclaimer; "w" is one of the tags for which it did not do a good job).

rcostu · 2014-11-20T22:08:18Z

@dan-zeman Thanks for the info. I will look at them as I am working into getting corpus such as AnCora working with UD and towards getting proper standard constituency to dependency parsing.

I have made my conversion from EAGLES to UD, and i will be contributing with the list when I can, and I am also interested in contributing into the Spanish conversion and documentation of UD as well.

I could review that automatic conversion to get a proper CONLL09 -> UD conversion too.

What is the best way to contribute? Forking and pull-requesting?

spyysalo · 2014-11-24T18:01:40Z

What is the best way to contribute? Forking and pull-requesting?

If you're interested in contributing an entire treebank (or several), I'd suggest to first propose this so that current project members have an opportunity to comment (to avoid overlap etc.). You might want to open a separate issue for this or just contact people by email (comments on this issue are unlikely to be widely read).

hans · 2014-12-21T17:42:51Z

Hi @rcostu and all — I worked with Christopher Manning this summer on Spanish NLP, and as a side-effect of some of the work I produced some documentation for Spanish relations: https://github.com/UniversalDependencies/docs/tree/spanish/_es

I unfortunately don't have the bandwidth to continue this at the moment, but I figure I should mention it somewhere so the docs don't get lost..

Most of the examples are short excerpts from AnCora. We were thinking it might be possible to produce a reliable UD Spanish corpus by synthesizing the HamleDT output (which is lossy and at times very incorrect) with the original AnCora dependency treebank.

spyysalo closed this as completed Nov 20, 2014

This was referenced May 28, 2017

In consistency in the annotation of date expression UniversalDependencies/UD_English-EWT#22

Open

Dates, times, addresses, currency, numbers in v2 guidelines and English data #455

Open

martinpopel mentioned this issue Apr 24, 2018

Spelled-out numbers #198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What about Dates? #113

What about Dates? #113

rcostu commented Nov 17, 2014

dan-zeman commented Nov 17, 2014

dan-zeman commented Nov 17, 2014

rcostu commented Nov 17, 2014

dan-zeman commented Nov 17, 2014

spyysalo commented Nov 19, 2014

rcostu commented Nov 20, 2014

spyysalo commented Nov 20, 2014

dan-zeman commented Nov 20, 2014

rcostu commented Nov 20, 2014

spyysalo commented Nov 24, 2014

hans commented Dec 21, 2014

What about Dates? #113

What about Dates? #113

Comments

rcostu commented Nov 17, 2014

dan-zeman commented Nov 17, 2014

dan-zeman commented Nov 17, 2014

rcostu commented Nov 17, 2014

dan-zeman commented Nov 17, 2014

spyysalo commented Nov 19, 2014

rcostu commented Nov 20, 2014

spyysalo commented Nov 20, 2014

dan-zeman commented Nov 20, 2014

rcostu commented Nov 20, 2014

spyysalo commented Nov 24, 2014

hans commented Dec 21, 2014