-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the scripts in scorePages faster. #15
Comments
I think this has to be prioritized before other enhancements. |
Absolutely! For a start, the wikiwoods.dm file should just be loaded once. At the moment, it gets loaded every time findBestPear is called -- and even worse, every time a pear is looked at in scorePages (so 3 more times). On my machine, it takes around 2s to load, so that's already 8s gone... :( |
@minimalparts the wikiwoods.dm is created manually (using some tool) right?, what is your opinion about converting it into an sqlite table and querying it? |
Yes, absolutely! |
Same issue with the doc.dists files. See for example http://aurelieherbelot.net/pears-demo/pearone/doc.dists.txt. But I have no idea... can we also convert those to sqlite and have them downloadable from a website? |
Actually, I'm talking rubbish, wikiwoods.dm is only called once in scorePages, but that's also totally unnecessary, because it recalculates the distribution of the query, which has already been done in findBestPears. Who wrote this thing? ;-) I guess what we want is: load wikiwoods.dm when launching the application. Calculate the query's distribution (mkQueryDist) once, in findBestPears, and load the doc.dists files in scorePages. |
PR #22 introduces an sqlite database for wikiwoods. lets see how this goes |
Right now 'scoreDocs' and 'runScript' are taking around 13 seconds and 24 seconds.
The text was updated successfully, but these errors were encountered: