-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To create anything, creates Chinese characters #13
Comments
Thank you for reporting this! Project Fare turns Regular Expressions into Automatons by applying the algorithms of dk.brics.automaton and xeger. Unfortunately, I don't have an answer to your question, as Project Fare is really a port of the above Java projects. You may use a different pattern or use a different engine to reverse the Regular Expression into an Automaton. As an example, you can use the Rex engine. |
I'm running into this issue too. I don't suppose either of you figured out why we're getting Chinese characters? Thanks! |
@edwardskrod, this is a wild guess, but I'm thinking whether it's because of some encoding issue that slipped through while I was porting the code from the Java version... I'd be curious to find out if this is the case also in the Java version, which means I'd first have to install Java and a Java-capable IDE on my machine, so I can compare with the original Java versions... |
@moodmosaic We were able to run the .jar from https://code.google.com/archive/p/xeger/downloads using the following Regex: ^([^ A-Z]+)$ It looks like the issue is in the original Java code as well. I believe this is because they are using the Unicode character set to generate characters. Interestingly enough, the generated string will validate when you call Regex.Validate and pass the generated string and the regex. |
@edwardskrod, thank you for trying this also in the Java version. At this point, I think we could deviate from the Java version, and perhaps use ASCII, or some Unicode subset that will generate readable characters. |
@moodmosaic I'll take a look. However, my first priority is to figure out how to solve the Stack Overflow exception generated by certain regular expressions. I'll let you know if I figure it out. |
@edwardskrod, sounds good. |
I'm trying to get a random string. Anything is acceptable. Thus I came with this regular expression:
.*
That literally means any character, any number of times.
However, I ran this line of code like a thousand times, and not even once I get a simple English character, all gibberish and Chinese.
new Fare.Xeger(".*").Generate(); // ran for a thousand times, and I expected outputs like "A坏_#@LA", but I got examples like "☐缨坏庶袗"
Why is it behaving that way? Does it have an internal preference for Unicode characters? Does it ignore ASCII characters on string generation, unless explicitly force via the Regular Expression pattern?
The text was updated successfully, but these errors were encountered: