Use concurrency to check mangled passwords #8

rafasc · 2020-05-05T01:34:11Z

This is part of an attempt to improve crunchy's performance.

This PR focuses on improving the mangled password checks by using concurrency to speed up the search.

$ benchstat before after 
name                   old time/op  new time/op  delta
ValidatePassword-8      41.3s ± 1%   29.7s ± 1%  -28.12%  (p=0.000 n=8+8)
FoundInDictionaries-8   11.8s ± 1%    4.4s ± 1%  -62.24%  (p=0.000 n=8+8)

Hashing will be next

coveralls · 2020-05-05T01:40:28Z

Coverage increased (+0.3%) to 93.902% when pulling 10c7756 on rafasc:ra/goroutine-mangled-check into 98c5f9f on muesli:master.

muesli · 2020-05-15T15:38:02Z

crunchy_test.go

 	pws = []struct {
 		pw       string
 		expected error
 		rating   uint
 	}{
+		// include values from pass in tests


What's the purpose of including these values here (again)?

Having a reference to a password known to be valid is useful for benchmarks because they never exit earlier and need to go though every check we perform.

We could search the pws slice for a valid password when we need one, or make a convention that the first password of the slice is valid. I thought being explicit would make things more clear. I.e. pass.valid vs pws[0].pw

The values of the pass struct are then inserted into the pws slice to make sure they remain valid. If a future change caused pass.valid to be invalid, tests would fail, indicating that benchmark numbers are not trustworthy.

crunchy.go

muesli · 2020-05-15T15:48:13Z

crunchy.go

-			if dist := smetrics.WagnerFischer(word, revpw, 1, 1, 1); dist <= v.options.MinDist {
-				return &DictionaryError{ErrMangledDictionary, word, dist}
+			select {
+			case queue <- struct{}{}:


I'm not sure I fully understand the purpose of this queue. Couldn't we just keep launching a couple of worker threads in a pool and feed them with all the words from the wordlist? I think it may turn out a bit more readable. Let me know if you already tried that but this queue simply outperforms it 😃

I tried some approaches, e.g. using x/sync/errgroup, and this was the simplest way I found to make it concurrency bounded with the ability to cancel on the first error found.

The queue is just a semaphore that keeps us from spawning more than njobs goroutines simultaneously.

The benchmark should run with the same input each time in order to obtain a meaningful average. Let's also benchmark the cases where hash verification is enabled and use a valid password as input since they cover more checks. CheckHIBP is intentionally left out from this benchmark as network factors could fudge the results.

Indexing only happens once per validator. Let's benchmark indexing independently so we can ignore its duration in other benchmarks.

In a future commit we will improve this function by converting into go routines. Introducing the benchmark now is convenient to provide a baseline for comparison.

muesli · 2020-10-21T04:45:04Z

Sorry @rafasc, I have never seen any notification that you responded to my comments 😞 ...and so I completely missed your latest changes.

muesli reviewed May 15, 2020

View reviewed changes

crunchy.go Outdated Show resolved Hide resolved

muesli reviewed May 15, 2020

View reviewed changes

rafasc added 4 commits May 16, 2020 20:45

Benchmark dictionary indexing independently

c6a7cae

Indexing only happens once per validator. Let's benchmark indexing independently so we can ignore its duration in other benchmarks.

Add benchmark for foundInDictionaries

35e27d8

In a future commit we will improve this function by converting into go routines. Introducing the benchmark now is convenient to provide a baseline for comparison.

Use concurrency to check mangled passwords

f5855f9

rafasc force-pushed the ra/goroutine-mangled-check branch from 10c7756 to f5855f9 Compare May 16, 2020 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use concurrency to check mangled passwords #8

Use concurrency to check mangled passwords #8

rafasc commented May 5, 2020

coveralls commented May 5, 2020

muesli May 15, 2020

rafasc May 16, 2020

muesli May 15, 2020

rafasc May 16, 2020 •

edited

Loading

muesli commented Oct 21, 2020

Use concurrency to check mangled passwords #8

Are you sure you want to change the base?

Use concurrency to check mangled passwords #8

Conversation

rafasc commented May 5, 2020

coveralls commented May 5, 2020

muesli May 15, 2020

Choose a reason for hiding this comment

rafasc May 16, 2020

Choose a reason for hiding this comment

muesli May 15, 2020

Choose a reason for hiding this comment

rafasc May 16, 2020 • edited Loading

Choose a reason for hiding this comment

muesli commented Oct 21, 2020

rafasc May 16, 2020 •

edited

Loading