Skip to content

Commit 01528ca

Browse files
committed
Describe new modules and classes
1 parent f08768d commit 01528ca

File tree

4 files changed

+10
-0
lines changed

4 files changed

+10
-0
lines changed

lib/classifier-reborn/extensions/token_filter/stemmer.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
module ClassifierReborn
77
module TokenFilter
8+
# This filter converts given tokens to their stemmed versions in the language.
89
module Stemmer
910
module_function
1011

lib/classifier-reborn/extensions/token_filter/stopword.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
module ClassifierReborn
77
module TokenFilter
8+
# This filter removes stopwords in the language, from given tokens.
89
module Stopword
910
STOPWORDS_PATH = [File.expand_path(File.dirname(__FILE__) + '/../../../../data/stopwords')]
1011

lib/classifier-reborn/extensions/tokenizer/token.rb

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@
66
module ClassifierReborn
77
module Tokenizer
88
class Token < String
9+
# The class can be created with one token string and extra attributes. E.g.,
10+
# t = ClassifierReborn::Tokenizer::Token.new 'Tokenize', stemmable: true, maybe_stopword: false
11+
#
12+
# Attributes available are:
13+
# stemmable: true Possibility that the token can be stemmed. This must be false for un-stemmable terms, otherwise this should be true.
14+
# maybe_stopword: true Possibility that the token is a stopword. This must be false for terms which never been stopword, otherwise this should be true.
915
def initialize(string, stemmable: true, maybe_stopword: true)
1016
super(string)
1117
@stemmable = stemmable

lib/classifier-reborn/extensions/tokenizer/whitespace.rb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77

88
module ClassifierReborn
99
module Tokenizer
10+
# This tokenizes given input as white-space separated terms.
11+
# It mainly aims to tokenize sentences written with a space between words, like English, French, and others.
1012
module Whitespace
1113
module_function
1214

0 commit comments

Comments
 (0)