You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using wb-manager <collection> reindex on a browsertrix crawled collection (--newContext page) prints thousands of lines on the console with the python error
'list' object has no attribute 'items'
The index.cdxj file is generated anyway, but unclear if complete.
Steps to reproduce the bug
Drop warc files from browsertrix crawl into collection and run reindexing command as shown above.
Expected behavior
All warc files should be reindexed without error.
Environment
pywb 2.6.6 (installed via pip)
Python 3.8.10
Ubuntu 20.04
Additional context
This only seems to happen when a crawl is made via browsertrix. If a collection solely contains warc files from other sources no error message appears. If a collection is containing both browsertrix and non-browsertrix warcs, only the browsertrix warcs cause this error to appear.
The text was updated successfully, but these errors were encountered:
The message is mostly ignorable in this case, but have a better fix in 2.6.7. The issue is the post-to-get conversion of JSON arguments didn't correctly handle lists (in pywb, but did in cdxj-indexer). This will now be fixed in 2.6.7 and also will print a better error if not parseable.
Describe the bug
Using
wb-manager <collection> reindex
on a browsertrix crawled collection (--newContext page
) prints thousands of lines on the console with the python errorThe index.cdxj file is generated anyway, but unclear if complete.
Steps to reproduce the bug
Drop warc files from browsertrix crawl into collection and run reindexing command as shown above.
Expected behavior
All warc files should be reindexed without error.
Environment
pywb 2.6.6 (installed via pip)
Python 3.8.10
Ubuntu 20.04
Additional context
This only seems to happen when a crawl is made via browsertrix. If a collection solely contains warc files from other sources no error message appears. If a collection is containing both browsertrix and non-browsertrix warcs, only the browsertrix warcs cause this error to appear.
The text was updated successfully, but these errors were encountered: