-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding does not properly group #18
Comments
Oh interesting, I didn't know that I guess I can have a precedence system and surround a pattern with a non capturing group in the I'm on it. |
Thanks for posting this btw! |
@Alan-Chen99 try version |
Thanks! I think "operations" also need to assign precedence. For ex import re
from regexfactory import *
r = str(Or("a", "b") + Or("x", "y"))
print(repr(r))
print(re.search(r, "a")) outputs
The precedence of a raw regex also need to be lower: import re
from regexfactory import *
r = str(RegexPattern("^") + RegexPattern("a|b"))
print(repr(r))
print(re.search(r, "-b")) outputs
I think two things need to be done:
|
Could you provide examples of what regex you would have it generate ideally in specific scenarios? This will help me implement the functionality you're looking for. On another note: RegexPattern is not a class intended to be instantiated directly. Until I develop a parser system to parse raw regex strings into a tree of RegexFactory objects I don't intend to provide any support for "raw" regex strings. Without that parser system I don't want to restructure the parent class, RegexPattern, to take on a whole new functionality as a raw regex pattern class because it doesn't make sense for the children to inherit that raw string functionality. However, the example you sent with the Or's being concatenated does look a little wonky so I will look into that in the next few days. |
Looking at the Or example more closely it looks like the b and x patterns are getting interpreted as a merged Or option. So instead of the compiled string being a two character pattern with two options per character I think it might be interpreting the pattern as "a" or "bx" or "y" with three cases. I need to confirm this, but if I'm right then this shouldn't happen and I think I just need to implement a group around Or's specifically when they get concatted. Rather than creating a precedence system for operations which I don't entirely understand how would work. |
I made a PR which assigns a "_precedence" to a RegexPattern, which is the precedence of the "root node" if one would to parse the underlying regex. It generates some excessive parenthesis, for ex ;; The `rx--translate...' functions below return (REGEXP . PRECEDENCE),
;; where REGEXP is a list of string expressions that will be
;; concatenated into a regexp, and PRECEDENCE is one of
;;
;; t -- can be used as argument to postfix operators (eg. "a")
;; seq -- can be concatenated in sequence with other seq or higher (eg. "ab")
;; lseq -- can be concatenated to the left of rseq or higher (eg. "^a")
;; rseq -- can be concatenated to the right of lseq or higher (eg. "a$")
;; nil -- can only be used in alternatives (eg. "a\\|b")
;;
;; They form a lattice:
;;
;; t highest precedence
;; |
;; seq
;; / \
;; lseq rseq
;; \ /
;; nil lowest precedence imo we should just use an extra group to keep stuff simpler |
doesnt a literal str currently represent a "raw" regex? |
Yeah, I suppose concatenating a literal string would represent "raw" regex, but what I meant was that I don't intend to be responsible for behavior of regexfactory when users concat literal strings because there are so many edge cases and head-aches that I don't want to deal with. Now that I think about it, we should raise exception when we try to concat a non-RegexPattern object. (i.e. I think then I'd also add
yeah, by that I presume you mean putting a group around the Or's when they get concatenated together? If so, that's what I wanted to do to begin with:
|
actually i meant '(?:^)ab'. I thought one need to use the diamond above, but actually regex in python seem to work differently then elisp it seems that i can treat "^" just as a normal char. not sure though, its marked as having a precedence below concatenation according to https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_08, so if we follow that "^ab" is invalid and we need to make '(?:^)ab' |
If you wanted to match the string "^ab" literally you can use the pattern |
results in
which is incorrect
The text was updated successfully, but these errors were encountered: