Skip to content

Speed up fuzzy search #2639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 19, 2022
Merged

Speed up fuzzy search #2639

merged 8 commits into from
Feb 19, 2022

Conversation

Bodigrim
Copy link
Contributor

@Bodigrim Bodigrim commented Jan 25, 2022

My intuition is that this approach should be faster, but I do not have data to back such claim. @pepeiborra any chance to test it with a huge list of completions?

@pepeiborra
Copy link
Collaborator

Thanks @Bodigrim (and sorry to have dragged you into another project :) )

I have a benchmark in the Sigma codebase for this completions fuzzy search, but I will need a few days to find the time. Probably over the weekend.

@Bodigrim
Copy link
Contributor Author

I've probably broken something, but it's still surprising that 9.0.1 Ubuntu and 9.2.1 Mac pass, while 9.2.1 Ubuntu fails.

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jan 27, 2022

https://github.com/haskell/haskell-language-server/runs/4959105656?check_suite_focus=true#step:7:9685 says

Exception: Prelude.head: empty list

So helpful! Cannot wait for GHC 9.4 with stack traces for head.

@Bodigrim
Copy link
Contributor Author

@pepeiborra this is finally ready for review.

@pepeiborra
Copy link
Collaborator

Benchmarked using the completions experiment of the ghcide bench suite on the Sigma codebase. The results show that the new code allocates much less and is up to 2X as fast. Nice job!

BEFORE

name        | success | samples | startup | setup | userTime | delayedTime | totalTime
----------- | ------- | ------- | ------- | ----- | -------- | ----------- | ---------
completions | True    | 100     | 1m25s   | 0.00s | 1m08s    | 0.05s       | 1m08s    
/data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 
 641,948,658,384 bytes allocated in the heap
  74,949,086,032 bytes copied during GC
  10,809,313,048 bytes maximum residency (14 sample(s))
     645,310,696 bytes maximum slop
           10308 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       979 colls,   978 par   1089.292s  27.336s     0.0279s    0.5449s
  Gen  1        14 colls,    12 par   351.536s  17.144s     1.2245s    7.6997s

  Parallel GC work balance: 84.43% (serial 0%, perfect 100%)

  TASKS: 215 (6 bound, 209 peak workers (209 total), using -N40)

  SPARKS: 46800 (46573 converted, 0 overflowed, 0 dud, 0 GC'd, 227 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  718.015s  (117.626s elapsed)
  GC      time  1440.828s  ( 44.479s elapsed)
  EXIT    time    0.046s  (  0.005s elapsed)
  Total   time  2158.890s  (162.111s elapsed)

  Alloc rate    894,060,716 bytes per MUT second

  Productivity  33.3% of total user, 72.6% of total elapsed

AFTER

name        | success | samples | startup | setup | userTime | delayedTime | totalTime
----------- | ------- | ------- | ------- | ----- | -------- | ----------- | ---------
completions | True    | 100     | 1m20s   | 0.00s | 43.60s   | 0.05s       | 43.67s   
/data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 
 351,299,429,720 bytes allocated in the heap
  47,676,026,872 bytes copied during GC
  10,489,733,720 bytes maximum residency (13 sample(s))
     642,772,392 bytes maximum slop
           10003 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       757 colls,   756 par   487.501s  12.327s     0.0163s    0.5824s
  Gen  1        13 colls,    11 par   294.023s  15.454s     1.1888s    7.6197s

  Parallel GC work balance: 84.19% (serial 0%, perfect 100%)

  TASKS: 218 (6 bound, 212 peak workers (212 total), using -N40)

  SPARKS: 46800 (46670 converted, 0 overflowed, 0 dud, 0 GC'd, 130 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  404.382s  (104.802s elapsed)
  GC      time  781.524s  ( 27.781s elapsed)
  EXIT    time    0.073s  (  0.008s elapsed)
  Total   time  1185.980s  (132.593s elapsed)

  Alloc rate    868,731,964 bytes per MUT second

  Productivity  34.1% of total user, 79.0% of total elapsed

Comment on lines +40 to +67
go !totalScore !currScore !currPOff !currSOff
-- If pattern has been matched in full
| currPOff >= pTotal
= Just totalScore
-- If there is not enough left to match the rest of the pattern, equivalent to
-- (sOff + sLen - currSOff) < (pOff + pLen - currPOff)
| currSOff > currPOff + sDelta
= Nothing
-- This is slightly broken for non-ASCII:
-- 1. If code units, consisting a single pattern code point, are found as parts
-- of different code points, it counts as a match. Unless you use a ton of emojis
-- as identifiers, such false positives should not be be a big deal,
-- and anyways HLS does not currently support such use cases, because it uses
-- code point and UTF-16 code unit positions interchangeably.
-- 2. Case conversions is not applied to non-ASCII code points, because one has
-- to call T.toLower (not T.map toLower), reallocating the string in full, which
-- is too much of performance penalty for fuzzy search. Again, anyway HLS does not
-- attempt to do justice to Unicode: proper Unicode text matching requires
-- `unicode-transforms` and friends.
-- Altogether we sacrifice correctness for the sake of performance, which
-- is a right trade-off for fuzzy search.
| pByte <- TA.unsafeIndex pArr currPOff
, sByte <- TA.unsafeIndex sArr currSOff
-- First byte (currPOff == pOff) should match exactly, otherwise - up to case.
, pByte == sByte || (currPOff /= pOff && pByte == toLowerAscii sByte)
= let curr = currScore * 2 + 1 in
go (totalScore + curr) curr (currPOff + 1) (currSOff + 1)
| otherwise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know that the new implementation produces the same scores? I would like to see a unit test here, or a property test.

It's ok for the test suite to depend on the fuzzy package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, even the old implementation was not fully matching fuzzy package because of a different approach to case matching. What's the best place to add unit tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ghcide test suite.

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Feb 4, 2022

@pepeiborra just a gentle reminder about my questions above.

Copy link
Collaborator

@pepeiborra pepeiborra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a test suite

@pepeiborra pepeiborra added the merge me Label to trigger pull request merge label Feb 19, 2022
@mergify mergify bot merged commit 847ad94 into haskell:master Feb 19, 2022
@Bodigrim Bodigrim deleted the fuzzy-search branch February 19, 2022 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Label to trigger pull request merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants