Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a rules file for n-grams of token #2

Open
allan-simon opened this issue Feb 15, 2014 · 1 comment
Open

create a rules file for n-grams of token #2

allan-simon opened this issue Feb 15, 2014 · 1 comment

Comments

@allan-simon
Copy link
Owner

a rules files will be a list of rule (ordered?) , loosy bnf grammar (need to review by language theory lessons...) (not that right now i don't precise how it's going to be written, xml, json, whatever)

 rule      ->  anchor tokens
 anchor ->  TOKEN
 tokens ->  TOKEN tokens
 TOKEN

TODO: complete TOKEN description in a bnf or xslt way

The anchor will be the token that need to be matched in order to trigger the rule

  • if a TOKEN is present, it must match a token in the data given as input
  • a TOKEN is a set of key-> valueS (note the S to value)
  • a rule can be considered as match or not match (one MAY implement other values as 'partially match`)
  • a rule MUST BE be considered as matched only if all the TOKEN in it are matched with token in the data given as input. if not the rule MAY BE consired as not matched (as one can have implement 'partially match' )
  • a TOKEN that does not precise a key is considering as 'matching' a key present in a token given as entry (it means that a TOKEN does not need to precise all the key a token have)
  • a TOKEN cannot match a token in the data given as input that is already matched by an other TOKEN
  • a TOKEN is considered as matched if all its key have ONE of their values corresponding to the value of that key in one of the token in data given as input
  • a TOKEN MAY have an special key id that should be unique among other token of one rule (just so that we can make reference from one token to an other inside a rule) however the key id MUST NOT be used for other purpose, it MUST BE 0 or anchor for the TOKEN that supposed to represent the anchor of that rule
    *a TOKEN MAY have a special key proxymity that will have for value a list of pair of, if not used as described, it MUST NOT be present)
    • from (being the id of an other token)
    • distance being the distance (starting at 1 / -1 between the current token and the one referenced by from with the following possible values (note the syntax is chosen to make it easy to be parse with simple split and convert to int and read one byte :
      • a integral number, positive meaning that the from token should be "before" , and a negative meaning that the from token should be after
      • [optional] * to mean any distance is valid
      • [optional] X+ to mean 'X' or more (as an absolute value)
      • [optional] X- to mean 'X' or less (as an absolute value)
      • [optional] X|Y to mean 'between X and Y included'
      • [optional] X,Y,Z to mean either X , Y or Z
      • if the reader does not understand an value it MUST consider it as meaning any distance
@allan-simon
Copy link
Owner Author

additional idea:

maybe in addition to know if the rule is matched, we need to know which token in the input has been matched with which TOKEN in the rule, so that we can make more things out of that result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant