Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wb-manager collection indexing error #709

Closed
despens opened this issue Apr 12, 2022 · 1 comment
Closed

wb-manager collection indexing error #709

despens opened this issue Apr 12, 2022 · 1 comment

Comments

@despens
Copy link

despens commented Apr 12, 2022

Describe the bug

Using wb-manager <collection> reindex on a browsertrix crawled collection (--newContext page) prints thousands of lines on the console with the python error

'list' object has no attribute 'items'

The index.cdxj file is generated anyway, but unclear if complete.

Steps to reproduce the bug

Drop warc files from browsertrix crawl into collection and run reindexing command as shown above.

Expected behavior

All warc files should be reindexed without error.

Environment

pywb 2.6.6 (installed via pip)
Python 3.8.10
Ubuntu 20.04

Additional context

This only seems to happen when a crawl is made via browsertrix. If a collection solely contains warc files from other sources no error message appears. If a collection is containing both browsertrix and non-browsertrix warcs, only the browsertrix warcs cause this error to appear.

ikreymer added a commit that referenced this issue Apr 15, 2022
…indexer

if json parsing errors occur, log to stderr
fixes #709 in a better way
@ikreymer
Copy link
Member

The message is mostly ignorable in this case, but have a better fix in 2.6.7. The issue is the post-to-get conversion of JSON arguments didn't correctly handle lists (in pywb, but did in cdxj-indexer). This will now be fixed in 2.6.7 and also will print a better error if not parseable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants