Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid 10K cap on numbers of results #49

Closed
anjackson opened this issue Nov 5, 2019 · 10 comments
Closed

Avoid 10K cap on numbers of results #49

anjackson opened this issue Nov 5, 2019 · 10 comments
Assignees

Comments

@anjackson
Copy link
Contributor

For some pages, we hit OutbackCDX's default limit=10000 for numbers of results, e.g. https://beta.webarchive.org.uk/wayback/archive/*/https://www.theguardian.com/uk which stops in 2018.

How can we change pywb to use a larger limit?

@anjackson
Copy link
Contributor Author

@anjackson
Copy link
Contributor Author

Also, this may be a separate thing, but I notice there's no 'collapsing' happening even when instances are pretty close together (i.e. every snapshot has it's own line with a (1) at the end). Not sure this is set up right?

@ikreymer
Copy link
Contributor

ikreymer commented Nov 7, 2019

Looks like the limit is 100,000 if no other limit is specified, for both prefix and exact.
Can add a config.yaml setting:
query_limit: 50000 to override. Or support different limits for exact and prefix?

Will look at collapsing, think that's a separate issue.

@anjackson
Copy link
Contributor Author

anjackson commented Nov 7, 2019

That configuration setting would work great!

ikreymer added a commit to webrecorder/pywb that referenced this issue Nov 7, 2019
ikreymer added a commit to webrecorder/pywb that referenced this issue Nov 7, 2019
@ikreymer
Copy link
Contributor

ikreymer commented Nov 8, 2019

Configuration setting added! Included in webrecorder/pywb:2.4.0-rc1 Docker image.

@anjackson
Copy link
Contributor Author

anjackson commented Nov 11, 2019

Tried this and it didn't appear to work. How should I configure it?

EDIT: I put query_limit: 100000 at the top level of the config. Presumably it does work on any query back end? Or is it only the XmlQuery/OpenSearch API?

@ikreymer
Copy link
Contributor

Yes, it should work for any backend. I've tested smaller limits, but not larger ones.. Just in case, does it work if you set it to be small, eg. 10?

@ikreymer
Copy link
Contributor

Actually, looks like it does not currently work on the XmlQuery backend, as that requires slightly different semantics, not just adding &limit=10000.. will fix soon

@ikreymer
Copy link
Contributor

Oops, didn't mean to close this yet. Re-opening so you can double-check that its working @anjackson

@anjackson
Copy link
Contributor Author

Looks like it's fixed, based on https://beta.webarchive.org.uk/wayback/archive/*/https://www.theguardian.com/uk right now. Seems a bit slow but I think that might be something wholly separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants