Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'WarcIndexer' object has no attribute 'records' #8

Open
martinvahi opened this issue Feb 9, 2017 · 0 comments
Open

Comments

@martinvahi
Copy link

martinvahi commented Feb 9, 2017

warc_librarian@acstorage3334:/media/pi/Sinine230GiBUSB/warc_librarian $ Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "./warcproxy.py", line 112, in run
    http_response = parse_http_response(record)
  File "./warcproxy.py", line 24, in parse_http_response
    remainder = message.feed(record.content[1])
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 576, in feed
    text = HTTPMessage.feed(self, text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 97, in feed
    text = self.feed_headers(text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 191, in feed_headers
    line, text = self.feed_line(text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 159, in feed_line
    text = str(self.buffer[pos:])
MemoryError

ERROR:tornado.application:Uncaught exception POST /load-warc (::1)
HTTPRequest(protocol='http', host='warc', method='POST', uri='/load-warc', version='HTTP/1.1', remote_ip='::1', headers={'Origin': 'http://warc', 'Content-Length': '102', 'Accept-Language': 'en-us;q=0.750', 'Accept-Encoding': 'gzip, deflate', 'Host': 'warc', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'User-Agent': 'Mozilla/5.0 (X11; Linux) AppleWebKit/538.15 (KHTML, like Gecko) Chrome/18.0.1025.133 Safari/538.15 Midori/0.5', 'Connection': 'Keep-Alive', 'X-Requested-With': 'XMLHttpRequest', 'Referer': 'http://warc/static/list.html', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1346, in _when_complete
    callback()
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1367, in _execute_method
    self._when_complete(method(*self.path_args, **self.path_kwargs),
  File "./warcproxy.py", line 344, in post
    index_status = self.warc_proxy.load_warc_file(path)
  File "./warcproxy.py", line 142, in load_warc_file
    self.indices[path] = indexer.records
AttributeError: 'WarcIndexer' object has no attribute 'records'
ERROR:tornado.access:500 POST /load-warc (::1) 30.11ms

The ~560MiB sized WARC-file that probably was used, when this happened, MIGHT be available from
http://temporary.softf1.com/2017/bugs/www.clausewitz.com-2017-02-09-8df72096-00000.warc.gz
It might have happened with some other WARC-file, I'm not totally sure, but the referenced one also fails to load for what ever reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant