Skip to content
This repository has been archived by the owner on Aug 8, 2024. It is now read-only.

Unable to pickle parsed output #27

Open
evan-burke opened this issue Mar 22, 2019 · 0 comments
Open

Unable to pickle parsed output #27

evan-burke opened this issue Mar 22, 2019 · 0 comments

Comments

@evan-burke
Copy link

evan-burke commented Mar 22, 2019

I'm trying to do some multiprocess/distributed processing of apache logs, which uses serialization/deserialization via pickle for moving data between scheduler/worker processes.

However, deserialization fails on the parsed outputs, in my case specifically time_received_tz_datetimeobj and time_received_utc_datetimeobj, for input strings like:

import apache_log_parser
import pickle 

mylist = ['157.55.39.31 - - [21/Mar/2019:07:56:41 +0000] "GET / HTTP/1.1" 200 6878 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"',
          '40.77.167.37 - - [21/Mar/2019:07:59:11 +0000] "GET / HTTP/1.1" 301 469 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"'
         ]

logparser = apache_log_parser.make_parser('%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"')

parsed_logline = logparser(mylist[0])
_ = pickle.dumps(parsed_logline)
# this causes error:  
pickle.loads(_)

(This is in python 3.66, and apache log parser 1.7.0, by the way.)

I can fix this in my implementation by converting the '0000' timezone to UTC:

def to_utc(datetimeobj):
	if str(datetimeobj.tzinfo) == "'0000'":
		return datetimeobj.astimezone(datetime.timezone.utc)
	else:
		return datetimeobj

parsed_logline['time_received_tz_datetimeobj'] = to_utc(parsed_logline['time_received_tz_datetimeobj'])
parsed_logline['time_received_utc_datetimeobj'] = to_utc(parsed_logline['time_received_utc_datetimeobj'])

But this seems like something more appropriate to do in the parser. That said, I'm not sure if this would break backwards compatibility with other Python versions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant