-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathto_json.py
35 lines (31 loc) · 1.08 KB
/
to_json.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import sys
import ast
import json
###########################################################
# to_json.py
#
# Purpose:
# For some reason, using the Python sample code suggested by Twitter Developer website,
# the tweets dumped onto disk are in invalid json format
# (e.g., single quoted and True instead of true for boolean values).
# This script is intended to clean such data files and covert them into valid json format files.
#
# Requirement:
# Python3.7+
#
# Example:
# gunzip -c coronavirus_12-27-2020.gz | python3 to_json.py >> coronavirus_12-27-2020.json
#
# Author:
# Qiushi Bai (baiqiushi@gmail.com)
#
# TODO:
# The datetime typed attributes (e.g., created_at) in the json generated are not in ISO format,
# which might not be accepted directly by consumers (e.g., MySQL).
# Later, we might need to format those values universally.
###########################################################
if __name__ == '__main__':
for line in sys.stdin:
tweet_dict = ast.literal_eval(line)
tweet_json = json.dumps(tweet_dict)
print(tweet_json)