From d7240a519548e1dea3b642f0cfe71384e126fed5 Mon Sep 17 00:00:00 2001 From: neil Date: Tue, 21 Aug 2018 15:08:58 +0900 Subject: [PATCH] Rewritten files for English manual --- en/command.adoc | 75 ++- en/format.adoc | 98 ++-- en/output.adoc | 42 +- en/quickstart.adoc | 13 +- en/validator.adoc | 1152 ++++++++++++++++++++++++++------------------ 5 files changed, 784 insertions(+), 596 deletions(-) diff --git a/en/command.adoc b/en/command.adoc index b1e3a87..896093b 100644 --- a/en/command.adoc +++ b/en/command.adoc @@ -11,44 +11,36 @@ RedPen provides a simple command line tool called 'redpen' to check documents. [[usage-redpen]] ==== Using redpen -We use the redpen command as follows. +Use the redpen command as follows: [source,bash] ------------------------------ $ redpen [options] input-files ------------------------------ -By default, input files are delimited by whitespace and then analysed. -The redpen command supports the following options. +You can specify more than one file. Separate the file names with single-byte spaces. [[options]] ==== Options -The redpen command has the following options. +The redpen command has the following options: [suppress] -===== Specify the RedPen configuration file +===== To specify the RedPen configuration file ---- -c , --configuration ---- -RedPen CLI (redpen command) search the configuration file when users do not -specify the configuration file with -c option. -First RedPen loads ``redpen-conf.xml`` in the current directory. -If ``redpen-conf.xml`` does not exist in the current directory, -RedPen loads the localized setting file, ``redpen-conf-**{lang}**.xml`` in current directory, -where {lang} is language code in ISO 639-1. -The language codes are selected with the locale setting of the users computer. -When there is no localized setting file, RedPen try to load the configuration file in ``$REDPEN_HOME/conf``. +If you do not specify a configuration file with the -c option, RedPen first searches for and loads the configuration file ``redpen-conf.xml`` in the current directory. If ``redpen-conf.xml`` does not exist in the current directory, RedPen loads the localized setting file ``redpen-conf-**{lang}**.xml`` in the current directory, where {lang} is the language code as defined in ISO 639-1. The language codes are determined by the locale setting of the user’s computer. If there is no localized setting file, RedPen loads the configuration file in ``$REDPEN_HOME/conf``. -===== Input file format [**default**: plain] +===== To specify the input file format [**default**: plain] ---- -f , --format ---- -This argument specifies the input format. Currently RedPen supports the following formats. +This argument specifies the input format. Currently, RedPen supports the following formats: [options="header",] |==== @@ -63,8 +55,7 @@ This argument specifies the input format. Currently RedPen supports the followin |rest |reStructuredText format |==== -NOTE: When users do not specify the input format with `-f`, redpen command guess the format from file extensions. -The followings table shows the list of file extensions, which redpen command understand. +NOTE: If you do not specify the input format with `-f`, RedPen determines the format from the file extensions. The table below shows the list of file extensions recognised by RedPen. [options="header",] |==== @@ -79,55 +70,55 @@ The followings table shows the list of file extensions, which redpen command und |rest |reStructuredText |==== -===== Result format [**default**: plain] +===== To specify the result format [**default**: plain] ---- option:: -r , --result_format ---- -This argument determines the output format. Currently RedPen supports the following output formats. +This option determines the output format. Currently RedPen supports the following output formats: [options="header"] |==== |Value |Description -|plain |plain text format -|plain2 |an alternate plain text format collated by sentence +|plain |Plain text format +|plain2 |Alternative plain text format with errors collated by sentence |xml |xml format |json |json format -|json2 |an alternate json format collated by sentence +|json2 |Alternative json format with errors collated by sentence |==== -===== Specify the limit of error number [**default**: 1] +===== To specify the permitted number of errors [**default**: 1] -The redpen command returns 0 when the number of found errors are less than the specified limit. +The redpen command returns 0 if the number of errors is lower than the specified number. ---- option:: -l , --limit ---- -===== Specify the language of error messages [**default**: depends on the locale settings of the machine] +===== To specify the language of error messages [**default**: locale setting of the machine] -For selecting language of errors, we specify the language code (en or ja). +You can choose to display error messages in English (en) or Japanese (ja). ---- option:: -L ,--lang ---- -===== Specify the input sentences +===== To specify input sentences -Commonly users specify input files, but sometimes making files is tidious. For testing purpose redpen command provides the parameter for input sentences. +You can check an individual sentence rather than a whole file. ---- option:: -s , --sentence ---- -===== Display help +===== To display help ---- -h, --help ---- -===== Show the redpen version +===== To display the RedPen version ---- --version @@ -136,15 +127,15 @@ option:: -s , --sentence [[sample-server]] === RedPen server -RedPen also provides the server. RedPen server provides not only UI but also practical REST API (see <> section). -The following is an image of RedPen server UI. +RedPen also provides a server function. RedPen server provides a UI and a practical REST API (see the <> section for details). +The following is a screenshot of the RedPen server UI: image:redpen-ui.png[Image] [[usage-redpen-server]] -==== Usage: redpen-server +==== To use redpen-server -We can start and stop the redpen server with the following command. +You can start and stop RedPen server by using the following command: [source,bash] ---------------------------- @@ -154,17 +145,17 @@ $ redpen-server [start|stop] [[configuration]] ==== Configuration -The redpen-server command is able to be configured with editing the variables in -*redpen-server* file itself. The following table shows the configuration -variables and the default values. +You can configure the redpen-server command by editing the variables in the +*redpen-server* file. The following table shows the configurable settings +and their default values: [options="header",] |======================================================================= |Configuration |Default Value |Description -|REDPEN_PORT |8080 |Specify Port number of RedPen server. -|STOP_KEY |redpen.stop |RedPen server is able to stop with Stop key with http access. If you do not want to stop with stop key comment out the value. -|REDPEN_CONF_FILE | |Specify default redpen config file. -|REDPEN_LANGUAGE |Depends on locale settings |Specify the language of error messages from RedPen. +|REDPEN_PORT |8080 |Specifies the port number of RedPen server. +|STOP_KEY |redpen.stop |Specifies a stop key for RedPen server. This means that you can stop the server via http. If you do not want to allow this, comment out this setting. +|REDPEN_CONF_FILE | |Specifies the default redpen config file. +|REDPEN_LANGUAGE |Depends on locale setting |Specifies the language in which RedPen error messages are displayed. |======================================================================= -The functionality of the RedPen server is described in the <> section. +For more details on RedPen server functions, refer to the <> section. diff --git a/en/format.adoc b/en/format.adoc index 59fa50f..d290339 100644 --- a/en/format.adoc +++ b/en/format.adoc @@ -1,7 +1,7 @@ [[formats]] == RedPen Input Formats -RedPen supports several types of input formats: +RedPen supports the following types of input format: - Plain Text - Markdown @@ -15,14 +15,14 @@ RedPen supports several types of input formats: [[plain-text]] === Plain text -Plain text supports a set of paragraphs. Paragraphs are separated by two -new lines. For example, the following article has two paragraphs. +Plain text consists of a series of paragraphs. Paragraphs are separated by two +line breaks. For example, the following article has two paragraphs: ---- -This is a first paragraph. This paragraph is the introduction of this article. +This is the first paragraph. This paragraph is the introduction of this article. It introduces the central issue discussed throughout the rest of the article. -Second paragraph describes the details of the issue and attempts to present a solution. +The second paragraph describes the details of the issue and attempts to present a solution. ---- [[asciidoc]] @@ -33,13 +33,13 @@ See the http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/[AsciiDoctor [[latex]] === LaTeX -NOTE: RedPen does not supports Macros defined by writers. +NOTE: RedPen does not support macros defined by writers. [[wiki-format]] === Wiki format RedPen supports a subset of Wiki syntax. Currently, the supported -elements of Wiki syntax are as follows. +elements of Wiki syntax are as follows: [[headings]] ==== Headings @@ -49,50 +49,50 @@ To create a heading, add a line starting with **h[1234]**. The number after h re [[inline-formatting]] ==== Inline Formatting -RedPen supports the following inline formatting. +RedPen supports the following inline formatting: [[bold]] ==== Bold ---- -**this is a Bold sentence.** +**This is a sentence in bold font.** ---- [[italic]] ==== Italic ---- -//this is an italic sentence.// +//This is a sentence in italic font.// ---- [[underline]] ==== Underline ---- -__this is an underlined sentence.__ +__This is an underlined sentence.__ ---- [[strikethrough]] ==== Strikethrough ---- ---this is a strikethrough sentence.-- +--This is a strikethrough sentence.-- ---- [[links]] ==== Links -Links elements are included in Wiki formatted documents. +Link elements are included in Wiki formatted documents. [[lists]] ==== Lists -Wiki syntax supports two types of lists. +Wiki syntax supports the following two types of list: [[bulleted-lists]] ===== Bulleted Lists -To enter a bulleted list, start a line with an asterisk. The number of +To add a bulleted list, start each line with an asterisk. The number of asterisks denotes the indent level of the list. ---- @@ -102,16 +102,16 @@ asterisks denotes the indent level of the list. ---- [[numbered-list]] -===== Numbered List +===== Numbered Lists -If you want to add numbered lists, use the hash/pound symbol (#) instead -of the asterisk used by Bulleted Lists. +To add a numbered list, use the hash/pound symbol (#) instead +of the asterisk used for a bulleted list. [[comments]] ==== Comments To add a comment to the wiki source, add a ``[!-- ... --]`` block. The -following shows a sample comment. +following shows a sample comment: ---- [!-- @@ -122,22 +122,22 @@ following shows a sample comment. [[paragraphs]] ==== Paragraphs -Paragraphs are separated by two new lines. This syntax is the same as +Paragraphs are separated by two consecutive line breaks. This syntax is the same as for plain text. [[markdown]] === Markdown -RedPen currently supports the following Markdown elements. +RedPen currently supports the following Markdown elements: [[headings-1]] ==== Headings -Two styles of headings are supported. +The following two styles of heading are supported: * Underlined headings -First and second level headings can be specified using underlines. +You can specify first and second level headings by using underlines, as shown below. ---- First-level headings @@ -145,13 +145,13 @@ First-level headings ---- ---- -second-level headings +Second-level headings --------------------- ---- * Atx style headings -1-6 hash or pound characters (#) at the beginning of a line. +You can specify up to six levels of heading by adding 1-6 hash or pound characters (#) at the beginning of a line. For example: @@ -164,35 +164,35 @@ For example: [[inline-formatting-1]] ==== Inline Formatting -RedPen supports the following inline formatting. +RedPen supports the following inline formatting: [[bold-1]] ===== Bold -Wrap characters with double asterisks or underscores for bold. The -following are samples of bold sentences. +To display a sentence in bold font, enclose the sentence in double asterisks or double underscores. The +following are examples of sentences in bold font: ---- -**this is a Bold sentence.** -__this is also a Bold sentence.__ +**This is a sentence in bold font.** +__This is also a sentence in bold font.__ ---- [[italic-1]] ===== Italic -Wrap characters with a single asterisk or underscore for italics. The -following are samples of italic sentences. +To display a sentence in bold font, enclose the sentence in single asterisks or single underscores. The +following are examples of sentences in italic font: ---- -*this is a italic syntax.* -_this is also a italic syntax._ +*This is a sentence in italic font.* +_This is also a sentence in italic font._ ---- [[links-1]] ==== Links -To create a link, wrap square brackets around the link's label and -parentheses around the URL. For example. +To create a link, enclose the link text in square brackets and enclose +the URL in parentheses, as shown below. ---- [label](url) @@ -201,15 +201,15 @@ parentheses around the URL. For example. [[lists-1]] ==== Lists -The Markdown parser used by RedPen supports two types of lists - -Bulleted lists and Numbered lists. +The Markdown parser used by RedPen supports two types of list: +Bulleted and Numbered. [[bulleted-lists-1]] ===== Bulleted Lists -To create a bulleted list, start a line with an asterisk or a hyphen. +To add a bulleted list, start each line with an asterisk or a hyphen. The lists are nested according to how many leading spaces there are. The -following is a example of a bulleted list using asterisks. +following is an example of a bulleted list using asterisks: ---- * List @@ -219,10 +219,10 @@ following is a example of a bulleted list using asterisks. ---- [[numbered-list-1]] -===== Numbered List +===== Numbered Lists -If you want to create a numbered list, use a number followed by a -period, as in the following example. +To add a numbered list, start each line with a number followed by a +period, as shown below. ---- 1. List @@ -232,24 +232,24 @@ period, as in the following example. [[paragraphs-1]] ==== Paragraphs -Paragraphs are separated by two new lines. This syntax is the same as for plain text. +Paragraphs are separated by two consecutive line breaks. This syntax is the same as for plain text. [[review-format]] === Re:VIEW format -See the https://github.com/kmuto/review/blob/master/doc/format.md[Re:VIEW reference] +For details, refer to the https://github.com/kmuto/review/blob/master/doc/format.md[Re:VIEW reference]. [[java-properties]] === Java Properties Properties files or Resource Bundles are commonly used for internalization in Java. -RedPen treats every property as a section, which can have one or more sentences. Comments and values, but not keys are validated. +RedPen treats each property as a section, which can have one or more sentences. Comments and values are validated, but keys are not. -See the https://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader)[Properties Javadoc] for more information on file format. +For details on file format, refer to the https://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.Reader)[Properties Javadoc]. [[restuucturedtext]] === reStructuredText -NOTE: currently RedPen only supports basic notations +NOTE: Currently, RedPen only supports basic notations. -See A http://docutils.sourceforge.net/docs/user/rst/quickstart.html#structure[ReStructuredText Primer] +For details, refer to the http://docutils.sourceforge.net/docs/user/rst/quickstart.html#structure[ReStructuredText Primer]. diff --git a/en/output.adoc b/en/output.adoc index 355b345..9bbb990 100644 --- a/en/output.adoc +++ b/en/output.adoc @@ -3,52 +3,52 @@ RedPen supports three basic output formats - **Plain text**, **XML**, and **JSON**. -NOTE: From v1.10, RedPen supports2 error level (default is *error*). For the configuration, please refer to the <> section. +NOTE: From v1.10, RedPen supports three different error levels (the default is *error*). For more details, refer to the <> section. [[plain-text]] === Plain text -Plain text output format consists of the following lines. +If you specify plain text as the output format, errors are output as follows: ---- FILE_NAME:LINE_NUM: Validation[Error|Info|Warn][ERROR_TYPE], ERROR_MESSAGE at line: SENTENCE ---- -An alternate plain text form (plain2) prints each sentence, followed by -all of the errors found in the sentence. +An alternative plain text format (plain2) outputs each sentence followed by all of the errors found in that sentence. + [[xml]] === XML -The top section of the XML output format is *validation-result* element -which contains multiple *error* sections. Each error section has the -following sub-elements. +If you specify xml as the output format, errors are output in xml with *validation-result* as the top level element. This contains multiple *error* elements. Each error element has the following sub-elements: [option="header"] |==== |Block | Optional | Description -|`validator` | false | Validator name -|`message` | false | Error message -|`lineNum` | false | Line Number -|`sentence` | false | Sentence containing error -|`file` | true | File name -|`level` | String | Error level (Info, Warn, Error) +|`validator` | No | Validator name +|`message` | No | Error message +|`lineNum` | No | Line number +|`sentence` | No | Sentence containing error +|`file` | Yes | File name +|`level` | Yes | Error level (Info, Warning, Error) |==== [[json]] [suppress='UnexpandedAcronym'] === JSON +If you specify json as the output format, errors are output in JSON as a list of errors for each file. +Each error consists of the following elements: + [option="header"] |==== |Block | Optional | Description -|`validator` | false | Validator name -|`message` | false | Error message -|`lineNum` | false | Line Number -|`sentence` | false | Sentence containing error -|`file` | true | File name -|`level` | String | Error level (Info, Warn, Error) +|`validator` | No | Validator name +|`message` | No | Error message +|`lineNum` | No | Line number +|`sentence` | No | Sentence containing error +|`file` | Yes | File name +|`level` | Yes | Error level (Info, Warning, Error) |==== -The alternative JSON output format (json2) collates each error by the -sentence it relates to. +An alternative JSON format (json2) outputs each sentence followed by all of the errors found in that sentence. diff --git a/en/quickstart.adoc b/en/quickstart.adoc index 918c1f4..9f3a894 100644 --- a/en/quickstart.adoc +++ b/en/quickstart.adoc @@ -2,22 +2,22 @@ [suppress='WeakExpression'] == RedPen Quickstart -This quickstart guide is to help you get started with RedPen. Let's go through some of the basics. +The purpose of this guide is to help you get started with RedPen. [[requirements]] [suppress] === Requirements -RedPen requires the following software. +RedPen requires the following software: -* Java 1.8.40 or greater +* Java 1.8.40 or later [[example-run]] === Example run -First, download the RedPen package from +First, download the RedPen package from the https://github.com/redpen-cc/redpen/releases/[release page], and then -decompress the package with the following commands. +decompress the package using the following commands: [source,bash] ---- @@ -25,8 +25,7 @@ $ tar xvf redpen-*-assembled.tar.gz $ cd redpen-* ---- -Then, run the redpen command with the supplied sample document and -configuration file. +Next, run the redpen command with the supplied sample document and configuration file. [source,bash] ---- diff --git a/en/validator.adoc b/en/validator.adoc index 2e24afc..5e4c73d 100644 --- a/en/validator.adoc +++ b/en/validator.adoc @@ -1,897 +1,1095 @@ [[validator]] -== RedPen Supported Validators +== RedPen Validators -RedPen supports the following validators. +This section describes the validators provided by RedPen. -[[sentencelength]] -=== SentenceLength +[[commanumber]] +=== CommaNumber -SentenceLength validator checks the length of sentences in the input -document. If the length of the sentence is greater than the specified -maximum length, the validator generates a warning. +CommaNumber checks the number of commas in a sentence. +If a sentence contains more than the maximum number of commas, +the validator generates a warning. -[[properties]] +[[properties-3]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_len`` |50 |Maximum length of sentence. +|``max_num`` |3 |The maximum number of commas allowed in a sentence. |==== -[[supported-languages]] +[[supported-languages-1]] ==== Supported languages -SentenceLength can be applied to any language. +CommaNumber can be applied to any language. -[[invalidexpression]] -=== InvalidExpression -InvalidExpression validator checks if input sentences contain invalid -expressions (words or phrases). If the input sentence contains invalid -expressions, the validator generates a warning. +[[contraction]] +=== Contraction -[[properties-1]] -==== Properties +Contraction checks for contractions in a document. +If more than half of the verbs are written in non-contracted form, +the validator generates a warning when it finds a contraction. + +[[supported-languages-8]] +==== Supported languages + +Contraction can only be applied to English documents. -[options="header"] -|==== -|Property |Default Value |Description -|``dict`` |None |File name of dictionary. -|``list`` |None |List of invalid expressions delimited by commas. -|==== -The dictionary is a set of words or expressions. The following is an -example of a dictionary. +[[doubled-conjunctive-particle-ga]] +=== DoubledConjunctiveParticleGa + +DoubledConjunctiveParticleGa checks if the Japanese conjunction *ga (が)* is used +twice in the same sentence. For example, if the document contains the sentence below, +the validator generates a warning because *ga* is used twice. ---- -like -you know -hey -kidding -... +今日は早朝から出発したが、定刻通りではなかったが、無事会場に到着した。 ---- -[[supported-languages-1]] + +[[supported-languages-doubled-conjunctive-particle-ga]] ==== Supported languages -InvalidExpression can be applied to any language. +DoubledConjunctiveParticleGa can only be applied to Japanese documents. -[[invalidword]] -=== InvalidWord -InvalidWord validator checks if input sentences contain invalid words. -If the input sentence contains invalid words, the validator generates a -warning. +[[doubledjoshi]] +=== DoubledJoshi + +DoubledJoshi checks if the same joshi (a Japanese part-of-speech for linking +words and phrases) is used more than once in a Japanese sentence. -[[properties-2]] +You can add joshi that you do not want to be flagged as doubled by creating a +dictionary or word list. The dictionary is a file in dat or txt format that +contains one joshi per line. + +[[properties-doubled-joshi]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dict`` |None |File name of dictionary. -|``list`` |None |List of invalid expressions delimited by commas. +|``dict`` |None |The file name of the dictionary. +|``list`` |None |A list of words to skip delimited by commas. |==== -The dictionary is a set of words. The following is an example of a dictionary. +[[supported-languages-doubled-joshi]] +==== Supported languages + +DoubledJoshi can only be applied to Japanese documents. + + +[[doubledword]] +=== DoubledWord + +DoubledWord checks if a word is used more than once in the same sentence. +For example, if the document contains the following sentence, the validator +generates a warning because *good* is used twice. ---- -like -hey -wow -... +This good item is very good. ---- -[[supported-languages]] -==== Supported Languages +You can add words that you do not want to be flagged as doubled words by +creating a dictionary or word list. The dictionary is a file in dat or txt +format that contains one word per line. -InvalidWord can be any of languages (but the default dictionaries are -supplied only for English and Japanese). +[[properties-8]] +==== Properties -[[spacebeginningofsentencevalidator]] -=== SpaceBeginningOfSentenceValidator +[options="header"] +|==== +|Property |Default Value |Description +|``dict`` |None |The file name of the dictionary. +|``list`` |None |A list of words to skip delimited by commas. +|==== -This validator checks if there is a white -space at the end of sentences (except for the last sentence of paragraph). -If the input sentence does end with a white space, a warning is given. +[[supported-languages-10]] +==== Supported languages -WARNING: SpaceBeginningOfSentenceValidator is deprecated. +DoubledWord can be applied to Japanese and any language that uses a space to +separate words. It cannot be applied to Chinese or other Asian languages. -[[supported-languages-2]] +NOTE: Default dictionaries are provided only for English and Japanese. + + +[[doublenegative]] +=== DoubleNegative + +DoubleNegative checks if the document contains any double negative expressions. + +[[supported-languages-14]] ==== Supported languages -SpaceBeginningOfSentenceValidator can be applied to any language. +DoubleNegative can only be applied to English and Japanese documents. -[[commanumber]] -=== CommaNumber -CommaNumber validator checks the number of commas in a sentence. +[[duplicatedsection]] +=== DuplicatedSection -[[properties-3]] +DuplicatedSection checks if the document contains sections that are identical +or similar. + +NOTE: Because RedPen treats a text file as a single section, DuplicatedSection does not work with text files. + +[[supported-languages-12]] +==== Supported languages + +DuplicatedSection can be applied to any language. + + +[[emptysection]] +=== EmptySection + +EmptySection checks if any sections in the document do not contain any +paragraphs or sentences. + +[[properties-emptysection]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_num`` |3 |Maximum number of commas in a sentence. +|``limit`` |5 |The hierarchical level at which to skip validating sections (with "1" being the top level of the document). |==== -[[supported-languages-1]] +NOTE: EmptySection does not work in files in text format (.txt). + +[[supported-languages-emptysection]] ==== Supported languages -CommaNumber can be applied to any language. +EmptySection can be applied to any language. -[[wordnumber]] -=== WordNumber -WordNumber validator checks the number of words in one setnece. +[[endofsentence]] +=== EndOfSentence + +EndOfSentence checks if the document contains any sentences that do not follow +the American style of placing sentence ending punctuation marks inside +quotation marks. + +[[supported-languages-end-of-sentence]] +==== Supported languages + +EndOfSentence can only be applied to English documents. + + +[[frequentsentencestart]] +=== FrequentSentenceStart + +FrequentSentenceStart checks if too many sentences start with the same +sequence of words. -[[properties-4]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_num`` |30 |Maximum number of words in a sentence. +|``leading_word_limit`` |3 |The number of words to consider at the start of each sentence. +|``percentage_threshold`` |25 |The maximum percentage of sentences that can start with the same words. +|``min_sentence_count`` |5 |The minimum number of sentences required in the document for the validator to report errors. |==== -[[supported-languages-3]] +[[supported-languages-15]] ==== Supported languages -WordNumber can be applied to any languages except for several Asian -languages (Chinese or Thai). RedPen does not have the tokenizer -for the languages. +FrequentSentenceStart can be applied to Japanese and any language that uses a +space to separate words. It cannot be applied to Chinese or other Asian +languages. -[[suggestexpression]] -=== SuggestExpression -SuggestExpression validator works in a similar way to the -InvalidExpression validator. If the input sentence contains invalid -expressions, this validator returns a warning suggesting the correct -expression. +[[gappedsection]] +=== GappedSection -[[properties-5]] +GappedSection checks if there is a chapter, section or subsection missing in +the logical structure of the document. For example, in the example below, the +validator generates a warning because Section 1.1 is expected between +Chapter 1 and Subsection 1.1.1. + +---- += Chapter 1 +... +=== Subsection 1.1.1 +=== Subsection 1.1.2 +... +---- + +NOTE: GappedSection does not work in files in text format (.txt). + +[[supported-languages-gappedsection]] +==== Supported languages + +GappedSection can be applied to any language. + + +[[HankakuKana]] +=== HankakuKana + +HanakakuKana checks if the document contains any single-byte katakana +characters (also called “half-width kana”). + +[[supported-languages-hankaku-kana]] +==== Supported languages + +HanakakuKana can only be applied to Japanese documents. + + +[[hyphenation]] +[suppress='WeakExpression'] +=== Hyphenation + +Hyphenation checks that words in the document are hyphenated according to +dictionary usage. + +[[supported-languages-18]] +==== Supported languages + +Hyphenation can only be applied to English documents. + + +[[invalidexpression]] +=== InvalidExpression + +InvalidExpression checks if the input document contains any invalid +words or phrases listed in a pre-defined dictionary. If the input document +contains an invalid expression, the validator generates a warning. + +[[properties-1]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dict`` |None |File name of dictionary. -|``map`` |None |List of pairs of elements. e.g. `{SVM,Support Vector Machine},{like,such as}` +|``dict`` |None |The file name of the dictionary. +|``list`` |None |A list of invalid expressions delimited by commas. |==== -The dictionary is a TSV file with two columns. First column contains the -invalid expression, and the second column contains a suggested -replacement expression. +You can add expressions by adding a dictionary or word list. +A dictionary is a file in dat or txt format that contains one expression per line. +The following is an example of a dictionary listing: ---- -SVM Support Vector Machine -LLVM Low Level Virtual Machine +like +you know +hey +kidding ... ---- -[[supported-languages-4]] +[[supported-languages-1]] ==== Supported languages -SuggestExpression can be any of languages but the default dictionaries -are provided only for English and Japanese. +InvalidExpression can be applied to any language. + [[invalidsymbol]] [suppress='InvalidSymbol WeakExpression'] === InvalidSymbol -Some symbols or characters have alternate characters with the same role. -For example question mark **? (0x003F)** has another unicode variation -**?(0xFF1F)**. InvalidSymbol checks if input sentences contains invalid -characters or symbols. The symbols settings are added -into the character setting block int the configuration file. -In this file, we write the symbols we should use in the document and their invalid -counterparts. The details of these settings is described in the next section. +Some symbols or characters have more than one representation. +For example, the standard question mark in Unicode is **? (0x003F)**, +but **?(0xFF1F)** also exists. InvalidSymbol checks if the input document +contains any invalid characters or symbols. You can specify invalid +symbols in the symbols block in the configuration file. For more details, +refer to <>. [[supported-languages-2]] ==== Supported languages -InvalidSymbol works for any langugages. See the settings of symbols in -the <> section. +InvalidSymbol can be applied to any language. -[[symbolwithspace]] -[suppress='WeakExpression'] -=== SymbolWithSpace -Some symbols need space before or after them. For example, if we want to -ensure a space before a left parentheses, we can add the preference to the symbol setting block (see <>) +[[invalidword]] +=== InvalidWord -[[supported-languages-3]] -==== Supported languages +InvalidWord checks if the input document contains any invalid +words listed in a pre-defined dictionary. If the input document +contains an invalid word, the validator generates a warning. -InvalidSymbol works for any language. +[[properties-2]] +==== Properties -[[katakanaendhyphen]] -[suppress='InvalidSymbol NumberFormat WeakExpression'] -=== KatakanaEndHyphen +[options="header"] +|==== +|Property |Default Value |Description +|``dict`` |None |The file name of the dictionary. +|``list`` |None |A list of invalid expressions delimited by commas. +|==== -KatakanaEndHyphen validator checks the end hyphens of Katakana words in -*Japanese* documents. Japanese Katakana words have variations in their -end hyphen. For example, "computer" is written in Katakana as -"コンピュータ" (without hyphen), and "コンピューター" (with hypen). This -validator checks to ensure that Katakana words match the predefined -standard. See JIS Z8301, G.6.2.2 b) G.3. +You can add words by adding a dictionary or word list. +A dictionary is a file in dat or txt format that contains one word per line. +The following is an example of a dictionary listing: -* a: Words of 3 characters or more cannot have an end hyphen. -* b: Words of 2 characters or less can have an end hyphen. -* c: A compound word should apply *a* and *b* to each component word. -* d: In the cases from *a* to **c**, the length of a syllable which is -represented by a hyphen is 1 except for Youon. +---- +like +hey +wow +... +---- -[[supported-languages-4]] -==== Supported languages +[[supported-languages]] +==== Supported Languages -KatakanaEndSymbol works only for Japanees texts. +InvalidWord can be applied to any language that separates words using spaces +(such as English or French). +A default dictionary is supplied for English. -[[katakanaspellcheck]] -=== KatakanaSpellCheck -KatakanaSpellCheck validator checks if Katakana words have variational written form. -For example, if the Katakana word "インデックス" and the variational form "インデクス" exist within -the same document, this validator will return a warning. +[[japanese-ambiguous-noun-conjunction]] +[suppress='WeakExpression'] +=== JapaneseAmbiguousNounConjunction + +JapaneseAmbiguousNounConjunction checks if the document contains an ambiguous +noun conjunction pattern. The pattern is defined as a string of noun phrases +joined by two or more instances of the conjunction **no (の)**, as shown in the example below. + +---- +弊社の経営方針の説明を受けた。 +---- +[[properties-japanese-ambiguous-noun-conjunction]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dict`` |None |File name of dictionary. -|``min_ratio`` |0.2 |Threshold of the minimum similarity. KatakanaSpellCheck reports an error when there is a pair of words of which the similarity is more than the min_ratio. -|``min_freq`` |5 |Threshold of the minimum word frequency. KatakanaSpellCheck checks words of which frequencies are less than min_freq. +|``dict`` |None |The file name of the dictionary containing expressions to ignore. +|``list`` |None |A list of expressions to ignore, delimited by commas. |==== - -[[supported-languages-5]] +[[supported-language-japanese-ambiguous-noun-conjunction]] ==== Supported languages -KatakanaSpellCheck works only for Japanese texts. +JapaneseAmbiguousNounConjunction can only be applied to Japanese documents. -[[sectionlength]] -=== SectionLength +[[japanese-anchor-expression]] +=== JapaneseAnchorExpression -SectionLength validator checks the maximum number of words allowed in an -section. +JapaneseAnchorExpression checks that chapters and sections are marked in a +consistent style. -[[properties-6]] +[[properties-japanese-anchor-expression]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_num`` |1000 |Maximum number of characters in a section. +|``mode`` |numeric |The permitted style of anchor expression. This must be one of the following: "numeric", "numeric-zenkaku" or "kansuji". |==== -[[supported-languages]] +The details of these styles are as follows: + +[options="header"] +|==== +|Style | Sample +|``numeric`` | 1章、2節 (single-byte number with kanji for chapter or section) +|``numeric-zenkaku`` | 1章、2節 (double-byte number with kanji for chapter or section) +|``kansuji`` | 一章、二節 (kanji number with kanji for chapter or section) +|==== + +NOTE: RedPen ignores the style if the number follows the kanji for chapter or section (for example, 章1). + +[[supported-language-japanese-anchor-expression]] ==== Supported languages -SectionLength works for any language. +JapaneseAnchorExpression can only be applied to Japanese documents. -[[paragraphnumber]] -=== ParagraphNumber -ParagraphNumber validator checks the maximum number of paragraphs -allowed in one section. +[[japanese-expression-variation]] +=== JapaneseExpressionVariation + +JapaneseExpressionVariation checks for variations in the use of expressions +in Japanese documents. +The function of JapaneseExpressionVariation is similar to KatakanaSpellCheck, +but whereas KatakanaSpellCheck only checks katakana, +JapaneseExpressionVariation also checks hiragana and kanji. For example, +you can check if a word is written in kanji and hiragana in the same document. +You can prepare a dictionary or list of expressions to specifically check for. -[[properties]] +[[properties-japanese-expression-variation]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_num`` |5 |Maximum number of paragraphs in a section. +|``dict`` |None |The file name of the dictionary. +|``map`` |None |A list of pairs of expressions. For example: `{smart,スマート},{distributed,ディストリビューテッド}` |==== -[[supported-languages-1]] +The dictionary is a tab-separated value (TSV) file in dat format consisting of +two columns. Each column contains one of the pair of expressions that you +want to check for. +The following is an example of a dictionary listing: + +---- +SVM Support Vector Machine +LLVM Low Level Virtual Machine +... +---- + +[[supported-language-japanese-expression-variation]] ==== Supported languages -ParagraphNumber works for any language. +JapaneseExpressionVariation can only be applied to Japanese documents. -[[paragraphstartwith]] -=== ParagraphStartWith -ParagraphStartWith validator checks to see if the characters at the -beginning of paragraphs conforms to the correct style. +[[japanese-joyo-kanji]] +=== JapaneseJoyoKanji -[[properties-7]] +JapaneseJoyoKanji checks if the document contains any kanji that are not +included in the official "joyo kanji" list, such as the following: + +---- +殆 (hotondo), 踵 (kakato), 迄 (made) +---- + +[[supported-language-japanese-joyo-kanji]] +==== Supported languages + +JapaneseJoyoKanji can only be applied to Japanese documents. + + +[[japanese-number-expression]] +=== JapaneseNumberExpression + +JapaneseNumberExpression checks if the number expressions ending in "tsu" +that are used in the document are consistent in style. + +[[properties-section-level]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``start_with`` |" " |Characters in the beginning of paragraphs. +|``mode`` |numeric |The permitted style of number expression. This must be one of the following: "numeric", "numeric-zenkaku", "kansuji", "hiragana". |==== -[[supported-languages-6]] +The details of these styles are as follows: + +[options="header"] +|==== +|Style | Sample +|``numeric`` | 1つ、2つ (single-byte number with hiragana "tsu") +|``numeric-zenkaku`` | 1つ、2つ (double-byte number with hiragana "tsu") +|``kansuji`` | 一つ、二つ (kanji number with hiragana "tsu") +|``hiragana`` | ひとつ、ふたつ (all hiragana) +|==== + +[[supported-language-japanese-number-expression]] ==== Supported languages -ParagraphStartWith works for any langugaes. +JapaneseNumberExpression can only be applied to Japanese documents. -[[spacebetweenalphabeticalword]] -[suppress='WeakExpression'] -=== SpaceBetweenAlphabeticalWord -SpaceBetweenAlphabeticalWord validator checks that alphabetic words are -surrounded with whitespace. This validator is used in non-latin -languages such as Japanese or Chinese. +[[japanesestyle]] +=== JapaneseStyle + +JapaneseStyle checks if the input document contains both the "de-aru" +and "desu-masu" styles. -[[properties-spacebetweenalphabeticalword]] +[[supported-languages-13]] +==== Supported languages + +JapaneseStyle can only be applied to Japanese documents. + + +[[javascript]] +=== JavaScript + +JavaScript executes any additional validators that you create in JavaScript. +For details on creating validators in JavaScript, refer to <>. + +[[properties-javascript]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``forbidden`` | false | Speces are enforce (false) or forbidden. -|``skip_before`` | "" | Skip errors when there is no space before the specifed characters (symbols). -|``skip_after`` | "" | Skip errors when there is no space after the specifed characters (symbols). +|``script_path`` |$REDPEN_HOME/js |The directory that contains the JavaScript files. You can set multiple ``script_path`` properties. |==== -[[supported-languages-spacebetweenalphabeticalword]] +[[supported-languages-javascript]] ==== Supported languages -SpaceBetweenAlphabeticalWord works for languages whose words are not -split by white spaces such as Japanese or Chinese. +JavaScript can be applied to any language. -[[contraction]] -=== Contraction -Contraction throws an error when contractions are used in a -document in which more than half of the verbs are written in -non-contracted form. +[[katakanaendhyphen]] +[suppress='InvalidSymbol NumberFormat WeakExpression'] +=== KatakanaEndHyphen -[[supported-languages-8]] +KatakanaEndHyphen checks that the long vowel symbol ("end hyphen") at the end +of katakana words in *Japanese* documents is used according to the standard +JIS Z8301, G.6.2.2 b) G.3. For example, "computer" can be written in katakana +as "コンピュータ" (without end hyphen), and "コンピューター" (with end hyphen). +However, according to JIS Z8301, "コンピュータ" is correct. If the input +document contains "コンピューター", the validator generates a warning. + +The basic rules of JIS Z8301 are as follows: + +* a: Words of 3 characters or more cannot have an end hyphen. +* b: Words of 2 characters or less must have an end hyphen. +* c: A compound word should apply *a* and *b* to each component word. +* d: When counting characters for cases *a* to *c* above, the following +characters are counted: +“―”: Long vowel symbol, as in “テーパ” (taper) +“ン”: “n” sound, as in “ダンパ” (dumper) +“ッ”: Small “tsu”, as in “ニッパ” (nipper) +And the following characters are not counted: +Diphthong characters such as “ャ” in “シャワー” (shower) + +==== Properties + +[options="header"] +|==== +|Property |Default Value |Description +|``list`` |None |A comma-separated list of words with end hyphens that should not be flagged by KatakanaEndHyphen. +|==== + +[[supported-languages-4]] ==== Supported languages -Contraction works only for English texts. +KatakanaEndHyphen can only be applied to Japanese documents. -[[spelling]] -=== Spelling -Spelling validator throws an error if there are spelling mistakes in the -input documents. This validator only works for English documents. +[[katakanaspellcheck]] +=== KatakanaSpellCheck + +KatakanaSpellCheck checks for similar katakana words in the same Japanese +document that might be the result of misspelling. For example, depending on +the *min_ratio* setting, if "パラメタ" and the similar "パラメータ" exist in +the same document, the validator generates a warning. -[[properties-spelling]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dict`` |None |File name of known word dictionary. -|``list`` |None |List of known words delimited by commas. +|``dict`` |None |The file name of the dictionary. +|``min_ratio`` |0.3 |The degree of similarity between words for KatakanaSpellCheck to check. The valid range is from 0.1 to 0.9. The smaller the value, the more the words must match. +|``min_freq`` |5 |The maximum number of times a word can occur in the document for KatakanaSpellCheck to check. If a word occurs more times than the specified value, it is not checked. |==== -[[supported-languages-9]] +[[supported-languages-5]] ==== Supported languages -Spelling works only for English texts. +KatakanaSpellCheck can only be applied to Japanese documents. -[[doubledword]] -=== DoubledWord -DoubledWord validator throws an error if a word is used more than once -in a sentence. For example, if an input document contains the following -sentence, the validator will report an error since *good* is used twice. +[[list-level]] +=== ListLevel ----- -this good item is very good. ----- +ListLevel checks how deeply lists are nested. This validator generates a +warning if a list has more levels than the value specified for ``max_level ``. -[[properties-8]] +NOTE: ListLevel does not work in files in text format (.txt). + +[[properties-list-level]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dict`` |None |File name of skip list dictionary. -|``list`` |None |List of skip words delimited by commas. +|``max_level`` |5 |The maximum number of levels allowed in a list. |==== -[[supported-languages-10]] +For example, if ``max_level`` is set to 5, the validator generates a warning +for the list below because it has six levels. + +---- +* one +** two +*** three +**** four +***** five +****** six +---- + +[[supported-languages-list-level]] ==== Supported languages -DoubledWord works for any langages except for Chiense or other Asian -languages. +ListLevel can be applied to any language. -NOTE: The default dictionaries are supplied only for Japanese and English. -[[doubledjoshi]] -=== DoubledJoshi +[[long-kanji-chain]] +=== LongKanjiChain -DoubledJoshi throws an error if a Joshi (Japanese part-of-speech) is used more than once -in a Japanese sentence. +LongKanjiChain checks if the document contains long strings of kanji +characters. If a string is longer than the specified maximum length, +the validator generates a warning. -[[properties-doubled-joshi]] + +[[properties-long-kanji-chain]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dict`` |None |File name of skip list dictionary. -|``list`` |None |List of skip words delimited by commas. +|``max_len`` |5 |The maximum number of kanji characters allowed in succession. |==== -[[supported-languages-doubled-joshi]] +[[supported-language-long-kanji-chain]] ==== Supported languages -DoubledJoshi works only for Japanese +LongKanjiChain can only be applied to Japanese documents. -[[successiveword]] -[suppress="SuccessiveWord"] -=== SuccessiveWord -SuccessiveWord validator throws an error if the same word is used twice -in succession. For example, if an input document contains the following -sentence, the validator will report an error since *is* is used twice in -succession. +[[numberformat]] +[suppress='WeakExpression NumberFormat SymbolWithSpace'] +=== NumberFormat ----- -the item is is very good. ----- +NumberFormat checks that numbers in a document are delimited in three-digit +blocks using commas (for example, “12,000” instead of “12000”). +It also checks which symbol is used to delimit three-digit blocks and which +symbol is used as to denote a decimal point. -[[supported-languages-11]] -==== Supported languages +[[properties-11]] +==== Properties -SuccessiveWord works for any langages except for Chiense or other Asian -languages. +[options="header"] +|==== +|Property |Default Value |Description +|``decimal_delimiter_is_comma`` | false |If false, the validator assumes that the decimal point is a period ("."). +If true, the validator assumes that the decimal point is a comma (","), as used in most European countries. -[[duplicatedsection]] -=== DuplicatedSection +|``ignore_years`` | true |If false, the validator assumes that all integers are numbers and checks accordingly for commas or periods. If true, the validator assumes that 4-digit integers (such as 2015 and 1998) are years and does not check them for commas or periods. -DuplicatedSection validator throws an error if there are section pairs -which have almost the same content. +|==== -[[supported-languages-12]] +[[supported-languages-19]] ==== Supported languages -DuplicatedSection works for any language. - -[[japanesestyle]] -=== JapaneseStyle +NumberFormat can be applied to documents written in European languages such +as English or French. -JapaneseStyle validator reports errors if the input file contains both -"dearu" and "desu-masu" style. -[[supported-languages-13]] -==== Supported languages +[[okurigana]] +=== Okurigana -JapaneseStyle works only for Japanese +Okurigana checks if the document contains any incorrect okurigana +(declensional kana endings for kanji used in Japanese). -[[doublenegative]] -=== DoubleNegative +[[supported-languages-okurigana]] +==== Supported languages -DoubleNegative validator reports errors when input sentence contains -double negative expression. +Okurigana can only be applied to Japanese documents. -[[supported-languages-14]] -==== Supported languages -DoubleNegative works only for English and Japanese texts. +[[paragraphnumber]] +=== ParagraphNumber -[[frequentsentencestart]] -=== FrequentSentenceStart +ParagraphNumber checks the number of paragraphs in a section. +If the number is greater than the specified maximum number, +the validator generates a warning. -This validator reports an error if too many sentences start with the -same sequence of words. +NOTE: RedPen treats a text file as a single section. +[[properties]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``leading_word_limit`` |3 |Number of words starting each sentence to consider. -|``percentage_threshold`` |25 |Maximum percentage of sentences that can start with the same words. -|``min_sentence_count`` |5 |Minimum number of sentences required for the validator to report errors. +|``max_num`` |5 |The maximum number of paragraphs allowed in a section. |==== -[[supported-languages-15]] +[[supported-languages-1]] ==== Supported languages -FrequentSentenceStart works for any language. +ParagraphNumber can be applied to any language. -[[unexpandedacronym]] -[suppress='WeakExpression'] -=== UnexpandedAcronym -This validator ensures that there are candidates for expanded versions -of acronyms somewhere in the document. +[[paragraphstartwith]] +=== ParagraphStartWith -That is, if there exists an acronym ABC in the document, then there must -also exist a sequence of capitalized words such as Axxx Bxx Cxxx. +ParagraphStartWith checks that paragraphs do not start with illegal characters. -[[properties-9]] +[[properties-7]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``min_acronym_length`` |3 |Minimum size for the acronym +|``start_with`` |" " |The character that cannot appear at the start of a paragraph. |==== -[[supported-languages-16]] +[[supported-languages-6]] ==== Supported languages -UnexpandedAcronym works only for English text. +ParagraphStartWith can be applied to any language. -[[wordfrequency]] -[suppress='WeakExpression'] -=== WordFrequency -This validator ensures that usage of specific words in the document -don't occur too frequently. It calculates the frequency that words are -used and compares them the a reference histogram of word frequency for -written English. +[[parenthesizedsentence]] +=== ParenthesizedSentence -Excessive deviation from normal usage generates a validation error. +ParenthesizedSentence checks if parenthesized phrases (such as this) are used +too frequently, are nested too deeply, or are too long. -[[properties-10]] +[[properties-12]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``deviation_factor`` | 3 | Permitted factor of deviation from the norm. So if a word is normally used 3% of the time, your document can use it up to 9% of the time. -|``min_word_count`` | 200 | Minimum number of words in a document before this validator starts to validate +|``max_nesting_level`` |2 |The maximum level to which parenthesized phrases may be nested (one parenthesized phrase inside another (such as this)). +|``max_count`` |1 |The maximum number of parenthesized phrases allowed in a sentence. +|``max_length`` |4 |The maximum number of words allowed in a parenthesized phrase. |==== -[[supported-languages-17]] +[[supported-languages-20]] ==== Supported languages -WordFrequency works only for English text. - -[[hyphenation]] -[suppress='WeakExpression'] -=== Hyphenation +ParenthesizedSentence can be applied to any language. -This validator ensures that sequences of words that are hyphenated in -the dictionary are hyphenated in your document. -[[supported-languages-18]] -==== Supported languages +[[sectionlength]] +=== SectionLength -Hyphenation works only for English text. +SectionLength checks the number of characters (letters, numbers and symbols) +in a section. If the number is greater than the specified maximum number, +the validator generates a warning. -[[numberformat]] -[suppress='WeakExpression NumberFormat SymbolWithSpace'] -=== NumberFormat +NOTE: RedPen treats a text file as a single section. -This validator ensures that numbers in a sentence are formatted using -commas (ie: 12,000 instead of 120000), and don't have excessive decimal -points. -[[properties-11]] +[[properties-6]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``decimal_delimiter_is_comma`` | false |Change the decimal delimiter from . to , (as in Europe) -|``ignore_years`` | true |Ignore 4 digit integers (2015, 1998) +|``max_num`` |1000 |The maximum number of characters allowed in a section. |==== -[[supported-languages-19]] +[[supported-languages]] ==== Supported languages -NumberFormat works for texts written in European languages such as -English or French. +SectionLength can be applied to any language. -[[parenthesizedsentence]] -=== ParenthesizedSentence -This validator generates errors if parenthesized sentences (such as -this) are used too frequently, or are nested too heavily. +[[section-level]] +=== SectionLevel -[[properties-12]] +SectionLevel checks the number of levels in the section hierarchy. +If the document contains more levels than the specified maximum number, +the validator generates a warning. + +[[properties-section-level]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_nesting_level`` |2 |The limit on how many parenthesized expressions are permitted -|``max_count`` |1 |The number of parenthesized expressions allowed -|``max_length`` |4 |The maximum number of words in a parenthesized expression +|``max_num`` |5 |The maximum number of levels allowed. |==== -[[supported-languages-20]] -==== Supported languages - -ParenthesizedSentence works for any langugages. - -[[weakexpression]] -=== WeakExpression - -This validator generates errors if sequences of words form what is -generally considered to be a "weak expression." - -[[supported-languages-21]] +[[supported-language-section-level]] ==== Supported languages -WeakExpression works only for English. +SectionLevel can be applied to any language. -[[endofsentence]] -=== EndOfSentenceSentence - -This validator generates errors if the style end of sentence is American style. -[[supported-languages-end-of-sentence]] -==== Supported languages +[[sentencelength]] +=== SentenceLength -EndOfSentence works for English. +SentenceLength checks the length of sentences in the input +document. If the length of a sentence is greater than the specified +maximum length, the validator generates a warning. -[[HankakuKana]] -=== HankakuKana +[[properties]] +==== Properties -This validator generates errors if the Hankaku Kana characters are used in input document. +[options="header"] +|==== +|Property |Default Value |Description +|``max_len`` |120 |The maximum length of a sentence (in characters). +|==== -[[supported-languages-hankaku-kana]] +[[supported-languages]] ==== Supported languages -HanakakuKana works only for Japanese. - -[[okurigana]] -=== Okurigana - -This validator generates errors if input sentence uses invalid Okurigana Style (Japanese). +SentenceLength can be applied to any language. -[[supported-languages-okurigana]] -==== Supported languages -Okurigana works for Japanese. +[[spacebeginningofsentence]] +=== SpaceBeginningOfSentence -[[startwithcapitalcharacter]] -=== StartWithCapitalLetter +SpaceBeginningOfSentence checks if there is a space between any two adjacent sentences +(except for the last sentence of paragraph). If there is not, the validator generates a warning. -This validator generates errors if input sentence start with a capital character. +WARNING: SpaceBeginningOfSentence is now deprecated. -[[supported-languages-startwithcapitalcharacter]] +[[supported-languages-2]] ==== Supported languages -This validator works for English or other european langugages. +SpaceBeginningOfSentence can be applied to any language. -[[voidsection]] -=== VoidSection -This validator generates errors if sections in input documents do not have any paragraphs or sentences. +[[spacebetweenalphabeticalword]] +[suppress='WeakExpression'] +=== SpaceBetweenAlphabeticalWord -WARNING: VoidSection is deprecated and removed in the future release. Please use EmptySection. +SpaceBetweenAlphabeticalWord checks that a single-byte alphabetic word that +appears in a double-byte language document (such as Japanese or Chinese) +is preceded and followed by a single-byte space. -[[properties-voidsection]] +[[properties-spacebetweenalphabeticalword]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``limit`` |5 |Skip validation to the sections smaller than specified level. +|``forbidden`` | false | If false, spaces are required. If true, spaces are not allowed. +|``skip_before`` | "" | Ignores cases where there is no space AFTER the specified character. +|``skip_after`` | "" | Ignores cases where there is no space BEFORE the specified character. |==== -[[supported-languages-voidsection]] +[[supported-languages-spacebetweenalphabeticalword]] ==== Supported languages -VoidSection works for any languages. +SpaceBetweenAlphabeticalWord can be applied to any language that does not +use spaces to separate words, such as Japanese or Chinese. -[[emptysection]] -=== EmptySection -This validator generates errors if sections in input documents do not have any paragraphs or sentences. +[[spelling]] +=== Spelling + +Spelling checks for spelling mistakes in a document. +You can add words that you do not want to be flagged as spelling mistakes +(such as product names) by creating a dictionary or word list. +The dictionary is a file in dat or txt format that contains one word per line. -[[properties-emptysection]] +[[properties-spelling]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``limit`` |5 |Skip validation to the sections smaller than specified level. +|``dict`` |None |The file name of the dictionary. +|``list`` |None |A list of words to skip delimited by commas. |==== -[[supported-languages-emptysection]] +[[supported-languages-9]] ==== Supported languages -EmptySection works for any languages. - -[[gappedsection]] -=== GappedSection +Spelling can only be applied to English documents. -This validator generates errors when the level of child sections (chapters) has the gap. -For example, The following is a misplaced section sample. ----- -= chapter 1 -... -=== section 1.1.1 -=== section 1.1.2 -... ----- +[[startwithcapitalcharacter]] +=== StartWithCapitalLetter -In the above example, chapter 1 should have section 1.1 before subsection 1.1.1. +StartWithCapitalLetter checks if sentences start with a capital character. -[[supported-languages-gappedsection]] +[[supported-languages-startwithcapitalcharacter]] ==== Supported languages -GappedSection works for any languages. +StartWithCapitalLetter can be applied to documents written in European +languages such as English or French. -[[long-kanji-chain]] -=== LongKanjiChain -This validator generates errors when input sentences has a words consist of too many Kanji characters. +[[successive-sentence]] +=== SuccessiveSentence + +SuccessiveSentence checks if the document contains two or more sentences in +succession which are identical or almost identical. For example, if the +document contains the paragraph below, the validator generates a warning +because the same sentence is used twice in succession. + +---- +The component is useful for testing. Especially for unit level testing. Especially for unit level testing. If necessary, we can apply it for higher level testing. +---- -In the above example, chapter 1 should have section 1.1 before subsection 1.1.1. -[[properties-long-kanji-chain]] +[[properties-successive-sentence]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_len`` |5 |The limit on how many characters are used in succession. +|``dist`` |3 |The degree of similarity between sentences. If the value of "dist" is 0, the sentences must be identical in order for the validator to generate a warning. +|``min_len`` |5 |The minimum number of words a sentence must contain in order to be checked. |==== -[[supported-language-long-kanji-chain]] +[[supported-language-successive-sentence]] ==== Supported languages -GappedSection works for Japanese text. +SuccessiveSentence can be applied to any language. -[[section-level]] -=== SectionLevel -This validator generates errors when input documents contains smaller sections than specified. +[[successiveword]] +[suppress="SuccessiveWord"] +=== SuccessiveWord + +SuccessiveWord checks if the same word is used twice in succession. +For example, if the document contains the sentence below, +the validator generates a warning because *is* is used twice in succession. -[[properties-section-level]] +---- +The item is is very good. +---- + +[[supported-languages-11]] +==== Supported languages + +SuccessiveWord can be applied to Japanese and any language that uses a space +to separate words. It cannot be applied to Chinese or other Asian languages. + + +[[suggestexpression]] +=== SuggestExpression + +SuggestExpression works in a similar way to InvalidExpression. +If the input document contains an invalid expression, +the validator generates a warning and suggests an alternative expression. + +[[properties-5]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_num`` |5 |The limit of the sub-section level. +|``dict`` |None |The file name of the dictionary. +|``map`` |None |A list of pairs of expressions. Each pair consists of an invalid expression separated by a comma from its suggested replacement. For example: `{SVM,Support Vector Machine},{like,such as}` |==== -[[supported-language-section-level]] +The dictionary is a tab-separated value (TSV) file in dat format consisting of +two columns. The first column contains the invalid expression, and the second +column contains a suggested replacement expression. +The following is an example of a dictionary listing: + +---- +SVM Support Vector Machine +LLVM Low Level Virtual Machine +... +---- + +[[supported-languages-4]] ==== Supported languages -SectionLevel works for any languages. +SuggestExpression can be applied to any language. +Default dictionaries are provided for English and Japanese. -[[japanese-ambiguous-noun-conjunction]] +[[symbolwithspace]] [suppress='WeakExpression'] -=== JapaneseAmbiguousNounConjunction +=== SymbolWithSpace -This validator generates errors when Japanese documents contains the ambiguous noun conjunction pattern. -The ambigous pattern is that two nouns are conjuncted with Joshi, **no (の)**. +SymbolWithSpace checks if symbols are preceded or followed by a space, +as appropriate. You can specify which symbols must (or must not) be preceded +or followed by a space in the *symbols* block in the configuration file. +For more details, refer to <>. -The following is a sample of this pattern. +[[supported-languages-3]] +==== Supported languages ----- -弊社の経営方針の説明を受けた。 ----- +SymbolWithSpace can be applied to any language. -[[supported-language-japanese-ambiguous-noun-conjunction]] -==== Supported languages -JapaneseAmbigousNounConjunction works for Japanese. +[[unexpandedacronym]] +[suppress='WeakExpression'] +=== UnexpandedAcronym -[[japanese-number-expression]] -=== JapaneseNumberExpression +UnexpandedAcronym checks that, if an acronym appears in a document, the +expanded version of the acronym also appears somewhere in the document. -JapaneseNumberExpression checks if the number expressions in the input text are in the consistent style. +For example, if there exists an acronym ABC in the document, then there must also +exist a sequence of capitalized words such as Axxx Bxx Cxxx. -[[properties-section-level]] +[[properties-9]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``mode`` |numeric |Style of number expression. There is four types of styles ("numeric", "numeric-zenkaku", "kansuji", "hiragana"). +|``min_acronym_length`` |3 |The minimum length of acronyms to check. |==== -Each style expects the following number expression. - -[options="header"] -|==== -|Style | Sample -|``numeric`` | 1つ、2つ -|``numeric-zenkaku`` | 1つ、2つ -|``kansuji`` | 一つ、二つ -|``hiragana`` | ひとつ、ふたつ -|==== - -[[supported-language-japanese-number-expression]] -==== Supported languages - -JapaneseNumberExpression works only for Japanese text. - -[[japanese-joyo-kanji]] -=== JapaneseJoyoKanji - -This validator generates errors when Japanese documents contains a non-joyo kanji. - -The following is a sample of this pattern. - ----- -踵を返して出て行った。 ----- +[[supported-languages-16]] +==== Supported languages -[[supported-language-japanese-joyo-kanji]] -==== Supported languages +UnexpandedAcronym can only be applied to English documents. -JapaneseJoyoKanji works for Japanese. +[[voidsection]] +=== VoidSection -[[japanese-expression-variation]] -=== JapaneseExpressionVariation +VoidSection checks if any sections in the document do not contain any +paragraphs or sentences. -JapaneseExpressionVariation checks spelling variation of Japanese words. -The function is similar to KatakanaSpellCheck. This validator covers all types of Japanese words consist of not only Katakana but also Hiragana or Kanji. +WARNING: VoidSection is deprecated and will be removed in a future release. +Use EmptySection instead. -[[properties-japanese-expression-variation]] +[[properties-voidsection]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dict`` |None |File name of dictionary. -|``map`` |None |List of pairs of elements. e.g. `{smart,スマート},{distributed,ディストリビューテッド}` +|``limit`` |5 |The hierarchical level at which to skip validating sections (with "1" being the top level of the document). |==== -[[supported-language-japanese-number-expression]] +[[supported-languages-voidsection]] ==== Supported languages -JapaneseNumberExpression works only for Japanese text. +VoidSection can be applied to any language. -[[successive-sentence]] -=== SuccessiveSentence -SuccessiveSentence throws an error when it find almost the same sentences are in succession. This validator is useful to check the human error as follows. +[[weakexpression]] +=== WeakExpression + +WeakExpression checks if the document contains any words or phrases listed in +a pre-defined dictionary of weak expressions. ----- -The component is useful for testing. Especially for unit level testing. Especially for unit level testing. Of course we can apply it for higher level testing. ----- +[[supported-languages-21]] +==== Supported languages -In the above sample, the same sentences are used in succession. This is a human error. +WeakExpression can only be applied to English documents. -[[properties-successive-sentence]] +[[wordfrequency]] +[suppress='WeakExpression'] +=== WordFrequency + +WordFrequency checks that certain words do not appear too frequently in a +document. The validator generates a warning if a word appears too often with +reference to a histogram of word frequency for written English. + +[[properties-10]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``dist`` |3 |Threshold of minimum distance in https://en.wikipedia.org/wiki/Edit_distance[Edit Distance] -|``min_len`` |5 |Minimum sentence length to compute +|``deviation_factor`` | 3 | The permitted factor of deviation from the norm. This means that, if a word is normally used 3% of the time, your document can use it up to 9% of the time. +|``min_word_count`` | 200 | The minimum number of words in a document required for this validator to activate. |==== -[[supported-language-successive-sentence]] +[[supported-languages-17]] ==== Supported languages -SuccessiveSentence works for any languages. +WordFrequency can only be applied to English documents. -[[list-level]] -=== ListLevel -ListLevel checks the level of list items. This validator generates errors when input sections contain list items nested too deeply. +[[wordnumber]] +=== WordNumber -[[properties-list-leve]] +WordNumber checks the number of words in a sentence. +If a sentence contains more than the maximum number of words, +the validator generates a warning. + +[[properties-4]] ==== Properties [options="header"] |==== |Property |Default Value |Description -|``max_level`` |5 |The maximum level of list items +|``max_num`` |30 |The maximum number of words allowed in a sentence. |==== -The following example generates an error at the six list item if ``max_level`` is five. - ----- -* one -** two -*** three -**** four -***** five -****** six ----- - -[[supported-languages-list-level]] +[[supported-languages-3]] ==== Supported languages -ListLevel works for any languages. +WordNumber can be applied to any language that separates words +using spaces (such as English or French).