Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To create anything, creates Chinese characters #13

Closed
Nefcanto opened this issue Jun 7, 2015 · 7 comments
Closed

To create anything, creates Chinese characters #13

Nefcanto opened this issue Jun 7, 2015 · 7 comments

Comments

@Nefcanto
Copy link

Nefcanto commented Jun 7, 2015

I'm trying to get a random string. Anything is acceptable. Thus I came with this regular expression:

.*

That literally means any character, any number of times.

However, I ran this line of code like a thousand times, and not even once I get a simple English character, all gibberish and Chinese.

new Fare.Xeger(".*").Generate(); // ran for a thousand times, and I expected outputs like "A坏_#@LA", but I got examples like "☐缨坏庶袗"

Why is it behaving that way? Does it have an internal preference for Unicode characters? Does it ignore ASCII characters on string generation, unless explicitly force via the Regular Expression pattern?

@moodmosaic
Copy link
Owner

Thank you for reporting this!

Project Fare turns Regular Expressions into Automatons by applying the algorithms of dk.brics.automaton and xeger.

Unfortunately, I don't have an answer to your question, as Project Fare is really a port of the above Java projects.

You may use a different pattern or use a different engine to reverse the Regular Expression into an Automaton. As an example, you can use the Rex engine.

@edwardskrod
Copy link

I'm running into this issue too. I don't suppose either of you figured out why we're getting Chinese characters? Thanks!

@moodmosaic
Copy link
Owner

@edwardskrod, this is a wild guess, but I'm thinking whether it's because of some encoding issue that slipped through while I was porting the code from the Java version...

I'd be curious to find out if this is the case also in the Java version, which means I'd first have to install Java and a Java-capable IDE on my machine, so I can compare with the original Java versions...

@edwardskrod
Copy link

@moodmosaic We were able to run the .jar from https://code.google.com/archive/p/xeger/downloads using the following Regex: ^([^ A-Z]+)$

It looks like the issue is in the original Java code as well.

I believe this is because they are using the Unicode character set to generate characters.

Interestingly enough, the generated string will validate when you call Regex.Validate and pass the generated string and the regex.

d9bd5502

@moodmosaic
Copy link
Owner

@edwardskrod, thank you for trying this also in the Java version. At this point, I think we could deviate from the Java version, and perhaps use ASCII, or some Unicode subset that will generate readable characters.

@edwardskrod
Copy link

edwardskrod commented Oct 31, 2017

@moodmosaic I'll take a look. However, my first priority is to figure out how to solve the Stack Overflow exception generated by certain regular expressions. I'll let you know if I figure it out.

@moodmosaic
Copy link
Owner

@edwardskrod, sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants