Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the scripts in scorePages faster. #15

Open
stultus opened this issue Sep 7, 2015 · 7 comments
Open

Make the scripts in scorePages faster. #15

stultus opened this issue Sep 7, 2015 · 7 comments

Comments

@stultus
Copy link
Collaborator

stultus commented Sep 7, 2015

Right now 'scoreDocs' and 'runScript' are taking around 13 seconds and 24 seconds.

@nandajavarma
Copy link
Collaborator

I think this has to be prioritized before other enhancements.

@minimalparts
Copy link
Owner

Absolutely! For a start, the wikiwoods.dm file should just be loaded once. At the moment, it gets loaded every time findBestPear is called -- and even worse, every time a pear is looked at in scorePages (so 3 more times). On my machine, it takes around 2s to load, so that's already 8s gone... :(

@stultus
Copy link
Collaborator Author

stultus commented Sep 7, 2015

@minimalparts the wikiwoods.dm is created manually (using some tool) right?, what is your opinion about converting it into an sqlite table and querying it?

@minimalparts
Copy link
Owner

Yes, absolutely!

@minimalparts
Copy link
Owner

Same issue with the doc.dists files. See for example http://aurelieherbelot.net/pears-demo/pearone/doc.dists.txt. But I have no idea... can we also convert those to sqlite and have them downloadable from a website?

@minimalparts
Copy link
Owner

Actually, I'm talking rubbish, wikiwoods.dm is only called once in scorePages, but that's also totally unnecessary, because it recalculates the distribution of the query, which has already been done in findBestPears. Who wrote this thing? ;-)

I guess what we want is: load wikiwoods.dm when launching the application. Calculate the query's distribution (mkQueryDist) once, in findBestPears, and load the doc.dists files in scorePages.

@stultus
Copy link
Collaborator Author

stultus commented Sep 9, 2015

PR #22 introduces an sqlite database for wikiwoods. lets see how this goes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants