Skip to content

Sentence segmentation

andrely edited this page Aug 7, 2013 · 3 revisions

The following characters occurs as sentence final in various Norwegian Bokmål corpora:

Norsk Avis korpus:

! ? . " ) ]

Oslo-Bergen Tagger development corpus and Språkbanken Gullkorpus (130606):

! " ) . : ? | »

(| marks headlines in those corpora).

Norwegian Bokmål wikipedia dump (20130702):

! .

(Possible bliki tokenization artufact?)

Oslo Parallel corpus Bokmål texts:

; : ! ? . ' " ) ?

Clone this wiki locally