-
Notifications
You must be signed in to change notification settings - Fork 14
User Guide (English)
KefirBB User Guide, version 1.2, english.
KefirBB is a Java-library for text processing. Initially it was developed for BBCode (Bulletin Board Code) to HTML translation. But flexible configuration allows to use it in others situations. For example XML-to-HTML translation or for HTML filtration. Now it supports Textile and Markdown markup languages. Actually it's the most powerfull and flexible Java-library for BBCode parsing.
- Full support of Textile from TxStyle.
- Partial support of Markdown from Markdown.
- New pattern element tags for URL and Email —
<url/>
,<email/>
. - Conditional tag
if
in templates.
- New pattern element tags for a beginning of line, an end of line and a blank line:
<bol/>
,<eol/>
,<blankline/>
. - The ghost tags. Parser parses them but doesn't change cursor position.
- Actions for variables.
- It is possible to describe a few patterns ion one code now.
- The package name was changed to
org.kefirsf.bb
. - add an ability to ignore case in codes.
- Better performance.
- Add a limitation for code nesting for preventing
java.lang.StackOverflowError
- Add a configuration for HTML filtration.
for the beginning you have to add a dependency on kefirbb library to your project. It's easy for maven-based projects.
<dependency>
<groupId>org.kefirsf</groupId>
<artifactId>kefirbb</artifactId>
<version>1.5</version>
</dependency>
or, using Gradle
compile 'org.kefirsf:kefirbb:1.5'
For other projects you have to download kefirbb-1.2.jar and put the library to the classpath
of your application.
Text processing is done by objects which implements an interface org.kefirsf.bb.TextProcessor
.
public interface TextProcessor {
public CharSequence process(CharSequence source);
public String process(String source);
public StringBuilder process(StringBuilder source);
public StringBuffer process(StringBuffer source);
}
As you can see the interface contains a few simple methods which get in parameters text by different types transform it and return transformed text in objects the same types.
To get the standard TextProcessor
for BBCode to HTML translation you have to use a fabric org.kefirsf.bb.BBProcessorFactory
.
TextProcessor processor = BBProcessorFactory.getInstance().create();
Now you can use it to translate your text.
assert "<b>text</b>".equals(processor.process("[b]text[/b]"));
The object processor
is thread safe. So you can use it in a few threads same time.
KefirBB has very flexible configuration. It allows to use it not only for BBCode to HTML translation but for others text translations. For example for HTML filtration of text which was wrote by user or for escaping special characters from a text. A user also can make a custom configuration for any text translations.
KefirBB contains a configuration for HTML filtration. It's needed to prevent XSS attacks if you allows to your users input HTML on the site and wants to show it others users. And to prevent problems with layout.
TextProcessor processor = BBProcessorFactory.getInstance()
.createFromResource(ConfigurationFactory.SAFE_HTML_CONFIGURATION_FILE);
assert "<b>text</b>".equals(processor.process("<b onclick=\"alert('Attack!');\">text</b>"));
KefirBB fully supports a markup language Textile. The syntax description is available at TxStyle.
TextProcessor processor = BBProcessorFactory.getInstance()
.createFromResource(ConfigurationFactory.TEXTILE_CONFIGURATION_FILE);
assert "<p><b>text</b></p>".equals(processor.process("**text**"));
Since version 1.2 the library partially supports a markup language Markdown. The syntax description is available at Markdown Syntax. Current realization doesn't fully support markdown lists and blockquotes.
TextProcessor processor = BBProcessorFactory.getInstance()
.createFromResource(ConfigurationFactory.MARKDOWN_CONFIGURATION_FILE);
assert "<p><strong>text</strong></p>".equals(processor.process("**text**"));
KefirBB contains a special class implemented an interface TextProcessor
for replacement special character sequences. It contains a constructor with a parameter where you can put your special character sequences for replacement.
TextProcessor processor = new EscapeProcessor(
new String[][]{
{"a", "4"},
{"e", "3"},
{"l", "1"},
{"o", "0"}
}
);
assert "4bcd3fghijk1mn0pqrstuvwxyz".equals(processor.process("abcdefghijklmnopqrstuvwxyz"));
Escaping apecial XML characters is a task which appears very often. It is needed to put a text into XML or HTML. So KefirBB contains special fabric which creates an object of org.kefirsf.bb.EscapeProcessor
configured special for escaping XML characters.
TextProcessor processor = EscapeXmlProcessorFactory.getInstance().create();
assert "<escape tag>".equals(processor.process("<escape tag>"));
You can create your custom configuration of text processor for your specific tasks. The configuration can be defined declarative in an XML file or programmatically. There are a few ways to use custom configuration.
The first way is to name a configuration file kefirbb.xml
and put it in the root of classpath
. Next to use standard factory.
TextProcessor processor = BBProcessorFactory.getInstance().create();
The factory first finds a configuration by the path classpath*:kefirbb.xml
. If didn't found uses default configuration by the path classpath*:org/kefirsf/bb/default.xml
. Some configuration parameters can be defined in classpath resource files kefirbb.properties
and kefirbb.properties.xml which have standard syntax of java properties
java.util.Properties`.
The second way is to give a configuration from classpath resource file.
TextProcessor processor = BBProcessorFactory.getInstance().createFromResource("my/package/config.xml");
It's needed when you have to use different configurations or can't put your configuration to the classpath
.
The third is to put a configuration in a file in the file system.
TextProcessor processor = BBProcessorFactory.getInstance().create("config.xml");
or
TextProcessor processor = BBProcessorFactory.getInstance().create(new File("config.xml"));
The fourth is to create a programmatic configuration.
Configuration configuration = new Configuration();
...
TextProcessor processor = BBProcessorFactory.getInstance().create(configuration);
You can give programmatic configuration from an XML configuration by the fabric org.kefirsf.bb.ConfigurationFactory
.
A configuration file is a XML file which describes a text translation. The permanent address of the XML schema is (http://kefirsf.org/kefirbb/schema/kefirbb-1.2.xsd). You have to use the tag configuration
without any attributes as root element in your XML configuration file.
this is an example of simple configuration for escaping XML special sequences.
<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://kefirsf.org/kefirbb/schema"
xsi:schemaLocation="http://kefirsf.org/kefirbb/schema http://kefirsf.org/kefirbb/schema/kefirbb-1.2.xsd">
<code>
<pattern>&</pattern>
<template>&amp;</template>
</code>
<code>
<pattern>'</pattern>
<template>&apos;</template>
</code>
<code>
<pattern><</pattern>
<template>&lt;</template>
</code>
<code>
<pattern>></pattern>
<template>&gt;</template>
</code>
<code>
<pattern>"</pattern>
<template>&quot;</template>
</code>
</configuration>
Codes are the main entities of a text translation. A code defines which text fragment must be converted and how it must be converted.
A code id defined by a tag code
. It contains two mandatory tags inside:
-
pattern
— the pattern to find a text fragment for converting. Since version 1.1, you can define a few patterns in a code; -
template
— the template to generate new text.
Also a tag code
contains two attributes:
-
name
— a code name; -
priority
— a code priority (bigger priority is higher, by default0
).
This is an example of a tag code
:
<code name="bold">
<pattern ignoreCase="true">[b]<var inherit="true"/>[/b]</pattern>
<template><b><var/></b></template>
</code>
A tag code
can be defined inside the tag configuration
inside a tag scope
.
A pattern
can contains text and tags — var
, constant
, junk
, bol
, eol
, blankline
, url
, email
. When text is being processed the processor finds text fragments by elements inside a tag pattern
of the code. Also you can define an attribute ignoreCase
which has type boolean
. If the value of the attribute ignoreCase
is true
then finding text by the pattern will ignore character case.
A tag var
defines variables and has attributes
-
name
— a variable name, by defaultvariable
; -
parse
— mark that is needed to process text of the variable, by defaulttrue
; -
regex
— regular expression for variable parsing, it is used only ifparse=false
; -
scope
— defines a scope of codes for text processing of the variable, used only ifparse=true
, by defaultROOT
; -
inherit
— defines that it is needed to inherit a scope from the outside code, by defaultfalse
; -
transparent
— marks that variable must be visible outside the code, by defaultfalse
; -
action
— a variable action-
rewrite
— rewrite a variable value; -
append
— add the text to the existing variable; -
check
— check if the variable text is the same as of an existing variable.
-
-
ghost
— if it is true the processor doesn't change current cursor position, by default false.
A tag constant
is ued to describe constants in a tag pattern
. It has attributes
-
value
— a contant value; -
ignoreCase
— if it is true then ignore case, by defaulttrue
; -
ghost
— if it is true the processor doesn't change current cursor position, by default false.
A tag junk
ignores all characters until a terminator.
A tag bol
indicates a beginning of line. Many markup languages use a beginning of line so it was added. Be careful when you use a tag bol
a cursor position isn't changed and it will be in the beginning of line so don't use a tag var
with the same scope after the bol
. It will produce a stack overflow.
A tag eol
— an end of line. Processes characters of an end of line in all OS. Has attribute
-
ghost
— if it is true the processor doesn't change current cursor position, by default false.
A tab blankline
— a blank line. Has attribute
-
ghost
— if it is true the processor doesn't change current cursor position, by default false.
A tag url
is used for parsing an URL addresses. Has attributes
-
name
— a variable name in which will be put an URL address, by defaulturl
; -
local
— allows to parse local addresses; -
schemaless
— allows to parse addresses without a schema, in this case will be used schema HTTP; -
ghost
— if it is true the processor doesn't change current cursor position, by default false.
A tag email
— is used for parsing email addresses. Has attributes
-
name
— a variable name in which will be put an email address, by defaultemail
; -
ghost
— if it is true the processor doesn't change current cursor position, by default false.
A tag template
can contains text and tags var
and if
.
A tag var
is used for replacement to result text variable values. Has attributes
-
name
— a variable name for replacement; -
function
— allows modify value of the variable before replacement. Now it supports functions-
value
— a variable value, by default; -
length
— length of a variable value text.
-
A condition tag if
contains attribute name
and check if the variable was initialized before or now. If the variable was initialized in the code then to the result text will be put content of the tag if
. Otherwise no. a tag if
can contains text and tags var
and if
the same way as a tag template
.
A scope defines which codes can be used for text processing. By default the scope with the name ROOT
is used. Even if it is not defined in the configuration it exists and contains all the codes defined in a tag configuration
but scope
. It's usefule for simple configurations with a few codes. A developer must not be worry about scopes in this case.
A scope is defined by a tag scope
which is situated inside a tag configuration
. For a tag scope
are defined attributes
-
name
— a scope name; -
parent
— a parent scope, all the codes from the parent scope will be put into the scope; -
ignoreText
— signs that it is needed to ignore all the text which is not a codes of the scope; -
strong
— a text can contains only the scope codes not; -
min
— a minimum count of codes which must be in the text, by default-1
is not defined; -
max
— a maximum count of codes which can be in the text, by default-1
is not defined;
Inside a tag scope
tags are allowed
-
code
— a code tag; -
coderef
— a reference to a tagcode
defined outside any tagscope
. A tagcoderef
has an attributename
. It is a name of code.
Parameters are predefined variables which can be used when a text generating in templates, a prefix and a suffix. Parameters are defined inside a tag params
by tags param
which have two attributes:
-
name
— a variable name; -
value
— a variable value.
For example,
<params>
<param name="music" value="Punk"/>
</params>
Also parameters can be defined in a separate file kefirbb.properties
or kefirbb.properties.xml
in classpath
. File formats are defined by the class java.util.Properties
.
Prefix and suffix are put in the beginning and the end of the text. They are defined by tags prefix
and suffix
the same way as a tag template
.
<prefix><!-- bbcodes begin --></prefix>
<suffix><!-- bbcodes end --></suffix>
The programmatic configuration is defined by class org.kefirsf.bb.conf.Configuration
. For example,
// Create configuration
Configuration cfg = new Configuration();
// Set the prefix and suffix
cfg.setPrefix(new Template(Arrays.asList(new Constant("["))));
cfg.setSuffix(new Template(Arrays.asList(new Constant("]"))));
// Configure default scope
Scope scope = new Scope(Scope.ROOT);
// Create code and add it to scope
Code code = new Code();
code.setPattern(new Pattern(Arrays.asList(new Constant("val"))));
code.setTemplate(new Template(Arrays.asList(new NamedValue("value"))));
Set<Code> codes = new HashSet<Code>();
codes.add(code);
scope.setCodes(codes);
// Set scope to configuration
cfg.setRootScope(scope);
// Set the parameter
Map<String, Object> params = new HashMap<String, Object>();
params.putAll(cfg.getParams());
params.put("value", "KefirBB");
cfg.setParams(params);
// Test the configuration
TextProcessor processor = BBProcessorFactory.getInstance().create(cfg);
assert "[KefirBB]".equals(processor.process("val"));