-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
67 lines (46 loc) · 1.96 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
NAME
Lingua::EN::Sentence - split text into sentences
SYNOPSIS
use Lingua::EN::Sentence qw( get_sentences add_acronyms );
add_acronyms('lt','gen'); ## adding support for 'Lt. Gen.'
my $text = q{
A sentence usually ends with a dot, exclamation or question mark optionally followed by a space!
A string followed by 2 carriage returns denotes a sentence, even though it doesn't end in a dot
Dots after single letters such as U.S.A. or in numbers like -12.34 will not cause a split
as well as common abbreviations such as Dr. I. Smith, Ms. A.B. Jones, Apr. Calif. Esq.
and (some text) ellipsis such as ... or . . are ignored.
Some valid cases canot be deteected, such as the answer is X. It cannot easily be
differentiated from the single letter-dot sequence to abbreviate a person's given name.
Numbered points within a sentence will not cause a split 1. Like this one.
See the code for all the rules that apply.
This string has 7 sentences.
};
my $sentences=get_sentences($text); # Get the sentences.
foreach my $sent (@$sentences)
{
$i++;
print("SENTENCE $i:$sent\n");
}
DESCRIPTION
The C<Lingua::EN::Sentence> module contains the function get_sentences, which
splits text into its constituent sentences, based on a regular expression and a
list of abbreviations (built in and given).
Certain well know exceptions, such as abbreviations, may cause incorrect
segmentations. But some of them are already integrated into this code and are
being taken care of. Still, if you see that there are words causing the
get_sentences function to fail, you can add those to the module, so it notices them.
Note that abbreviations are case sensitive, so 'Mrs.' is recognised but not 'mrs.'
INSTALLATION
To install this module, type the following:
perl Makefile.PL
make
make test
make install
or
perl Build.PL
build
build test
build install
MAINTAINER
This project was originated by Shlomo Yona. Currently maintained
by Kim Ryan