create a rules file for n-grams of token #2

allan-simon · 2014-02-15T19:36:02Z

a rules files will be a list of rule (ordered?) , loosy bnf grammar (need to review by language theory lessons...) (not that right now i don't precise how it's going to be written, xml, json, whatever)

 rule      ->  anchor tokens
 anchor ->  TOKEN
 tokens ->  TOKEN tokens
 TOKEN

TODO: complete TOKEN description in a bnf or xslt way

The anchor will be the token that need to be matched in order to trigger the rule

if a TOKEN is present, it must match a token in the data given as input
a TOKEN is a set of key-> valueS (note the S to value)
a rule can be considered as match or not match (one MAY implement other values as 'partially match`)
a rule MUST BE be considered as matched only if all the TOKEN in it are matched with token in the data given as input. if not the rule MAY BE consired as not matched (as one can have implement 'partially match' )
a TOKEN that does not precise a key is considering as 'matching' a key present in a token given as entry (it means that a TOKEN does not need to precise all the key a token have)
a TOKEN cannot match a token in the data given as input that is already matched by an other TOKEN
a TOKEN is considered as matched if all its key have ONE of their values corresponding to the value of that key in one of the token in data given as input
a TOKEN MAY have an special key id that should be unique among other token of one rule (just so that we can make reference from one token to an other inside a rule) however the key id MUST NOT be used for other purpose, it MUST BE 0 or anchor for the TOKEN that supposed to represent the anchor of that rule
*a TOKEN MAY have a special key proxymity that will have for value a list of pair of, if not used as described, it MUST NOT be present)
- from (being the id of an other token)
- distance being the distance (starting at 1 / -1 between the current token and the one referenced by from with the following possible values (note the syntax is chosen to make it easy to be parse with simple split and convert to int and read one byte :
  - a integral number, positive meaning that the from token should be "before" , and a negative meaning that the from token should be after
  - [optional] * to mean any distance is valid
  - [optional] X+ to mean 'X' or more (as an absolute value)
  - [optional] X- to mean 'X' or less (as an absolute value)
  - [optional] X|Y to mean 'between X and Y included'
  - [optional] X,Y,Z to mean either X , Y or Z
  - if the reader does not understand an value it MUST consider it as meaning any distance

The text was updated successfully, but these errors were encountered:

allan-simon · 2014-02-15T19:55:02Z

additional idea:

maybe in addition to know if the rule is matched, we need to know which token in the input has been matched with which TOKEN in the rule, so that we can make more things out of that result

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create a rules file for n-grams of token #2

create a rules file for n-grams of token #2

allan-simon commented Feb 15, 2014

allan-simon commented Feb 15, 2014

create a rules file for n-grams of token #2

create a rules file for n-grams of token #2

Comments

allan-simon commented Feb 15, 2014

allan-simon commented Feb 15, 2014