srx-english¶ ↑

github.com/apohllo/srx-english

DESCRIPTION¶ ↑

‘srx-english’ is a Ruby library containing English sentence and word segmentation rules. The sentence segementation rules are based on rules defined by Marcin Miłkowski: morfologik.blogspot.com/2009/11/talking-about-srx-in-lt-during-ltc.html

FEATURES/PROBLEMS¶ ↑

this library is generated by ‘srx2ruby’ which has some limitations and might be not 100% SRX standard compliant.

INSTALL¶ ↑

Standard rubygems installation:

$ gem install srx-english

BASIC USAGE¶ ↑

The library defines the SRX::English::Sentence class allowing to iterate over the matched sentences:

require 'srx/english/sentence_splitter'

text =<<-END
  This is e.g. Mr. Smith, who talks slowly... And this is another sentence.
END

splitter = SRX::English::SentenceSplitter.new(text)
splitter.each do |sentence|
  puts sentence.gsub(/\n|\r/,"")
end
# This is e.g. Mr. Smith, who talks slowly...
# And this is another sentence.

require 'srx/english/word_splitter'

sentence = 'My home is my castle.'
splitter = SRX::English::WordSplitter.new(sentence)
splitter.each do |word,type,start_offset,end_offset|
  puts "'#{word}' #{type}"
end
# 'My' word
# ' ' other
# 'home' word
# ' ' other
# 'is' word
# ' ' other
# 'my' word
# ' ' other
# 'castle' word
# '.' punct

LICENSE¶ ↑

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

FEEDBACK¶ ↑

apohllo@o2.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rdoc

README.rdoc

srx-english¶ ↑

DESCRIPTION¶ ↑

FEATURES/PROBLEMS¶ ↑

INSTALL¶ ↑

BASIC USAGE¶ ↑

LICENSE¶ ↑

FEEDBACK¶ ↑

Files

README.rdoc

Latest commit

History

README.rdoc

File metadata and controls

srx-english¶ ↑

DESCRIPTION¶ ↑

FEATURES/PROBLEMS¶ ↑

INSTALL¶ ↑

BASIC USAGE¶ ↑

LICENSE¶ ↑

FEEDBACK¶ ↑