Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regex support #13

Merged
merged 12 commits into from
May 12, 2024
Merged

Add regex support #13

merged 12 commits into from
May 12, 2024

Conversation

rasa
Copy link
Contributor

@rasa rasa commented May 10, 2024

Fixes #12.

I tried to write logic that converts regexs such as (abc|defg) to defg, but I gave up, as it's non-trivial. The cost/benefit is not worth it :). Instead, I just note the issue in the readme.

@axllent
Copy link
Owner

axllent commented May 10, 2024

I'm sorry (and not to be rude...), but I'm really struggling to see the point of this functionality. Can you please explain the use-case for searching a regular expression?

Let me rephrase that: Can you please give me an example (or examples) of what someone may use a regular expression for?

@rasa
Copy link
Contributor Author

rasa commented May 10, 2024

Can you please give me an example (or examples) of what someone may use a regular expression for?

Certainly. Here are some examples:

  1. .*word.* - find word anywhere in the key (word.* and .*word also work)
  2. ^.{0,10}word - find word anywhere in the first 10 letters of the key (how wireguard-vanity-address currently works)
  3. word1.*word2 - find two words, but anywhere in the key. The first word may be the hostname, the second word could be the OS, its location, whether it's a server, or just a peer, etc.
  4. (word1|word2).*(word1|word2) - find two words, but in any order, anywhere in the key (word1.*word1 will also match)
  5. ^word[/+A-Z0-9] - find lowercase word at beginning of key, but delimit with non-lowercase character, so word stands out more clearly. See also Tip: Getting a totally "vain" address warner/wireguard-vanity-address#22
  6. ^[s5][o0][ll]ar - find 'solar' or the visually similar 's01ar`, per Match visually-ambiguous characters for more matches & longer strings warner/wireguard-vanity-address#25
  7. ^[s5][i1][z2][a4][b86][l7][e3].* - find 'sizable' in leet speek

Since adding each letter to the search term increasing the time exponentially, we want to give the user maximum flexibility in finding the term, or terms, they are looking for.

See also warner/wireguard-vanity-address#23

@rasa rasa marked this pull request as draft May 10, 2024 14:00
@rasa
Copy link
Contributor Author

rasa commented May 10, 2024

I changed this to draft, as I think we should add some of the above to the readme, as I'm sure if you've questioned its usefulness, others will to.

Also, I think it's important to stop users from creating regex patterns that would never match. So, for example: aa$ will never match, as it's missing the = character.

As defer defers to the end of the function, not the end of the block.
See https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01
Copy link
Owner

@axllent axllent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely would not do any if len(..) for either slices as this has to count both slices for every calculation, nor would I do a separate mutex for the string & regex wordmaps.

I would do something like:

        // Allow only one routine at a time to avoid
	// "concurrent map iteration and map write"
	c.mapMutex.Lock()
	defer c.mapMutex.Unlock()
	for w, count := range c.WordMap {
		if count == 0 {
			continue
		}
		completed = false
		if strings.HasPrefix(matchKey, w) {
			c.WordMap[w] = count - 1
			cb(Pair{Private: k.String(), Public: pub})
		}
	}

	for w, count := range c.RegexpMap {
		if count == 0 {
			continue
		}
		completed = false
		if w.MatchString(matchKey) {
			c.RegexpMap[w] = count - 1
			cb(Pair{Private: k.String(), Public: pub})
		}
	}

@axllent
Copy link
Owner

axllent commented May 11, 2024

Also in main.go, I would bypass the estimation for regex entirely as it cannot be calculated, as well as validate the regex (rather than a MustCompile())

                if stripped != sword {
			regex, err := regexp.Compile(sword)
			if err != nil {
				fmt.Printf("Invalid regular expression: %s\n", sword)
				os.Exit(2)
			}
			c.RegexpMap[regex] = options.LimitResults
			fmt.Printf("Cannot calculate probability for a regular expression: %s\n", sword)
		} else {
			c.WordMap[sword] = options.LimitResults
			probability := keygen.CalculateProbability(stripped, options.CaseSensitive)
			estimate64 := int64(speed) * probability
			estimate := time.Duration(estimate64)

			fmt.Printf("Probability for \"%s\": 1 in %s (approx %s per match)\n",
				word, keygen.NumberFormat(probability), keygen.HumanizeDuration(estimate))
		}

@rasa rasa marked this pull request as ready for review May 11, 2024 17:48
@rasa
Copy link
Contributor Author

rasa commented May 11, 2024

@axllent It's ready for review. Let me know your thoughts. Happy to make any changes you deem worthy.

@axllent axllent merged commit 6ff2d42 into axllent:develop May 12, 2024
@axllent
Copy link
Owner

axllent commented May 12, 2024

Thanks awesome, thanks @rasa - I did some testing and it works as I'd expect 👍

@axllent
Copy link
Owner

axllent commented May 12, 2024

This has been merged and released in 0.0.9. I also just did a manual change to the README which resolves your other PR.

Thanks again for your hard work @rasa!

@rasa rasa deleted the feat-add-regexs branch May 12, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add regex support
2 participants