-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid inserting duplicate records in typerefs (fixes #61) #67
Avoid inserting duplicate records in typerefs (fixes #61) #67
Conversation
It would also be good to have a comparison with using |
I tried with the approach with UNIQUE constraint and it didn't find noticeable difference in perf. between it and master branch. git diff --patch master..HEAD
diff --git a/src/HieDb/Create.hs b/src/HieDb/Create.hs
index a29838d..3f28842 100644
--- a/src/HieDb/Create.hs
+++ b/src/HieDb/Create.hs
@@ -161,6 +161,7 @@ initConn (getConn -> conn) = do
\, ec INTEGER NOT NULL \
\, FOREIGN KEY(id) REFERENCES typenames(id) DEFERRABLE INITIALLY DEFERRED \
\, FOREIGN KEY(hieFile) REFERENCES mods(hieFile) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED \
+ \, CONSTRAINT uniqtyperef UNIQUE (id, hieFile, depth, sl, sc, el, ec) ON CONFLICT IGNORE \
\)"
execute_ conn "CREATE INDEX IF NOT EXISTS typeref_id ON typerefs(id)"
execute_ conn "CREATE INDEX IF NOT EXISTS typerefs_mod ON typerefs(hieFile)"
diff --git a/src/HieDb/Utils.hs b/src/HieDb/Utils.hs
index 0b5c073..596e4bf 100644
--- a/src/HieDb/Utils.hs
+++ b/src/HieDb/Utils.hs
@@ -56,7 +56,7 @@ addTypeRef (getConn -> conn) hf arr ixs sp = go 0
Nothing -> pure ()
Just occ -> do
let ref = TypeRef occ hf d sl sc el ec
- execute conn "INSERT INTO typerefs VALUES (?,?,?,?,?,?,?)" ref
+ execute conn "INSERT OR IGNORE INTO typerefs VALUES (?,?,?,?,?,?,?)" ref
let next = go (d+1)
case arr A.! i of
HTyVarTy _ -> pure () |
I think we should add the UNIQUE constraint anyway so we can detect violations of this property. |
c4562a1
to
225c08d
Compare
Would this require schema version bump? |
Tried this and the issue is that with the UNIQUE constraint the indexing time doubles. It's not as bad as on master, but still much worse than without it. Not sure if it's worth it.. |
not strictly, though I guess it would be good if all the existing databases used by HLS are rebuilt so they don't violate the property. |
ok, I guess checking in haskell should be sufficient. |
indexed <- get | ||
when (Set.notMember (occ, d) indexed) $ do | ||
let isTypeIndexed = ISet.member (fromIntegral occ) (IMap.findWithDefault ISet.empty depth indexed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use i
to ensure this still works on 32 bit machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't follow.
I think we have to use the occ
(which corresponds to type's id looked up from typenames table).
This is unique across multiple indexed hie files (autoincremented id in sqlite).
Whereas i
corresponds to index of a type within single hie file.
When I tried it, duplicated rows started to be created again, not sure why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we COULD maybe switch Int64 -> Int when looking up type indices here and then this would work?
Lines 172 to 173 in 404b4fa
addArr :: HieDb -> A.Array TypeIndex HieTypeFlat -> IO (A.Array TypeIndex (Maybe Int64)) | |
addArr (getConn -> conn) arr = do |
Is there a need for Int64 to represent type IDs? If yes, that I guess that would also break on 32 bit machines..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct about i
not working. But I think sqlite indices are Int64
, not much we can do about it.
Turns out when switching to int map/set I forgot to flip the boolean condition: |
Bump the schema version and this should be good to go I think |
Bumped. Is that just to force users to regenerate new hiedb? |
Hey @wz1000 |
@jhrcek feel free to prepare a release after updating the changelogs. Thanks! |
First naive attempt at solving #61.
Here's a quick comparison of this PR with master:
I built current master of haskell-language-server (to get bunch of .hie files to index):
cabal clean && cabal build all --ghc-options=-fwrite-ide-info
I indexed that directory with hiedb binary:
You can see that difference with hls codebase is not that significant.
But the difference is much more significant with our work codebase which has much more deriving of stuff: