-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid 10K cap on numbers of results #49
Comments
Similarly, prefix searches seem to top-out at 100,000? https://beta.webarchive.org.uk/wayback/archive/*?url=http://www.bl.uk/&matchType=prefix But if I query the https://beta.webarchive.org.uk/wayback/archive/cdx?url=http://www.bl.uk/&matchType=prefix&limit=1000000 I get 145,793?! |
Also, this may be a separate thing, but I notice there's no 'collapsing' happening even when instances are pretty close together (i.e. every snapshot has it's own line with a (1) at the end). Not sure this is set up right? |
Looks like the limit is 100,000 if no other limit is specified, for both prefix and exact. Will look at collapsing, think that's a separate issue. |
That configuration setting would work great! |
…t for both exact and prefix queries, addresses ukwa/ukwa-pywb#49
…t for both exact and prefix queries, addresses ukwa/ukwa-pywb#49 (#518)
Configuration setting added! Included in |
Tried this and it didn't appear to work. How should I configure it? EDIT: I put |
Yes, it should work for any backend. I've tested smaller limits, but not larger ones.. Just in case, does it work if you set it to be small, eg. 10? |
Actually, looks like it does not currently work on the XmlQuery backend, as that requires slightly different semantics, not just adding |
Oops, didn't mean to close this yet. Re-opening so you can double-check that its working @anjackson |
Looks like it's fixed, based on https://beta.webarchive.org.uk/wayback/archive/*/https://www.theguardian.com/uk right now. Seems a bit slow but I think that might be something wholly separate. |
For some pages, we hit OutbackCDX's default
limit=10000
for numbers of results, e.g. https://beta.webarchive.org.uk/wayback/archive/*/https://www.theguardian.com/uk which stops in 2018.How can we change pywb to use a larger limit?
The text was updated successfully, but these errors were encountered: