18 May 08:57

echen102

7cf5329

Release v1.9

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 5/15/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.9)

Number of Tweets : 129,911,732

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	84,930,677	65.38%
Spanish	es	14,686,543	11.31%
Indonesian	in	4,438,377	3.42%
French	fr	3,947,201	3.04%
Portuguese	pt	3,779,380	2.91%
Japanese	ja	3,135,378	2.41%
(undefined)	und	2,895,932	2.22%
Thai	th	2,796,427	2.15%
Italian	it	1,669,494	1.29%
Turkish	tr	1,378,430	1.06%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

11 May 09:40

echen102

v1.8

6a797c7

Release v1.8

This release contains Tweet IDs collected from 1/21/20 - 5/08/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.8)

Number of Tweets : 123,113,914

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	80,698,556	65.55%
Spanish	es	13,848,449	11.25%
Indonesian	in	4,196,591	3.41%
French	fr	3,762,601	3.06%
Portuguese	pt	3,451,196	2.80%
Japanese	ja	2,897,046	2.35%
Thai	th	2,754,627	2.24%
(undefined)	und	2,711,649	2.20%
Italian	it	1,615,916	1.31%
Turkish	tr	1,308,989	1.06%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

04 May 09:38

echen102

v1.7

fb1837b

Release v1.7

This release contains Tweet IDs collected from 1/21/20 - 5/01/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.7)

Number of Tweets : 115,929,358

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	76,245,404	65.77%
Spanish	es	13,000,982	11.21%
Indonesian	in	4,012,099	3.46%
French	fr	3,554,454	3.07%
Portuguese	pt	3,148,855	2.72%
Thai	th	2,686,041	2.32%
Japanese	ja	2,538,512	2.19%
(undefined)	und	2,517,028	2.17%
Italian	it	1,550,723	1.34%
Turkish	tr	1,241,016	1.07%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

27 Apr 07:57

echen102

v1.6

d099e68

Release v1.6

This release contains Tweet IDs collected from 1/21/20 - 4/24/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.6)

Number of Tweets : 109,013,655

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	71,984,701	66.03%
Spanish	es	12,149,916	11.15%
Indonesian	in	3,826,448	3.51%
French	fr	3,340,808	3.06%
Portuguese	pt	2,928,843	2.69%
Thai	th	2,630,420	2.41%
(undefined)	und	2,327,240	2.13%
Japanese	ja	2,156,385	1.98%
Italian	it	1,484,474	1.36%
Turkish	tr	1,165,210	1.07%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

21 Apr 04:27

echen102

v1.5

bcff17d

Release v1.5

This release contains Tweet IDs collected from 1/21/20 - 4/17/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.5)

Number of Tweets : 101,771,227

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	67,427,185	66.25%
Spanish	es	11,254,540	11.06%
Indonesian	in	3,591,884	3.53%
French	fr	3,124,414	3.07%
Portuguese	pt	2,715,462	2.67%
Thai	th	2,577,166	2.53%
(undefined)	und	2,113,795	2.08%
Japanese	ja	1,867,601	1.84%
Italian	it	1,419,867	1.40%
Turkish	tr	1,092,512	1.07%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

13 Apr 07:12

echen102

v1.4

8db2401

Release v1.4

This release contains Tweet IDs collected from 1/21/20 - 4/3/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.4)

Number of Tweets : 94,671,486

Language Breakdown

Language	ISO	No. tweets	% total Tweets
English	en	63,111,114	66.66%
Spanish	es	10,348,964	10.93%
Indonesian	in	3,351,221	3.54%
French	fr	2,918,003	3.08%
Portuguese	pt	2,474,677	2.61%
Thai	th	2,355,393	2.49%
(undefined)	und	1,915,551	2.02%
Japanese	ja	1,639,825	1.73%
Italian	it	1,362,672	1.44%
Turkish	tr	1,011,417	1.7%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

12 Apr 02:37

echen102

v1.3

21e0f64

Release v1.3

This release contains Tweet IDs collected from 1/21/20 - 4/3/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.3)

Number of Tweets : 87,209,465

Language Breakdown

Language	ISO	No. tweets	% total Tweets
English	en	58,456,856	67.03%
Spanish	es	9,368,223	10.74%
Indonesian	in	3,091,193	3.54%
French	fr	2,681,635	3.07%
Thai	th	2,254,162	2.58%
Portuguese	pt	2,231,807	2.56%
(undefined)	und	1,701,784	1.95%
Japanese	ja	1,462,367	1.68%
Italian	it	1,301,795	1.49%
Turkish	tr	911,543	1.05%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

01 Apr 06:16

echen102

v1.2

bdde3f3

Release v1.2

This release contains Tweet IDs collected from 1/21/20 - 3/21/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.2)

Number of Tweets : 72,403,796

Language Breakdown

Language	ISO	No. tweets	% total Tweets
English	en	49,525,165	68.40%
Spanish	es	7,467,220	10.31%
Indonesian	in	2,296,629	3.17%
French	fr	2,164,654	2.99%
Thai	th	1,927,905	2.66%
Portuguese	pt	1,740,967	2.40%
(undefined)	und	1,307,121	1.81%
Japanese	ja	1,276,425	1.76%
Italian	it	1,183,317	1.63%
Turkish	tr	688,860	0.95%

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

23 Mar 20:24

echen102

v1.1

cc7c770

Release v1.1

This release contains Tweet IDs collected from 1/21/20 - 3/12/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.1)

Number of Tweets : 63,616,072

Language Breakdown

Language	ISO	No. tweets	% total Tweets
English	en	44,482,496	69.92%
Spanish	es	6,087,308	9.57%
Indonesian	in	1,844,037	2.90%
French	fr	1,800,318	2.83%
Thai	th	1,687,309	2.65%
Portuguese	pt	1,278,662	2.01%
Japanese	ja	1,223,646	1.92%
Italian	it	1,113,001	1.75%
(undefined)	und	1,110,165	1.75%
Turkish	tr	570,744	0.90%

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

18 Mar 01:00

echen102

v1.0

be09c5e

Release v1.0

This release contains Tweet IDs collected from 3/5/20 - 3/12/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.0)

Number of Tweets : 8,919,411

Language Breakdown

Language	ISO	No. tweets	% total Tweets
English	en	5,508,304	61.76%
Spanish	es	1,167,172	13.09%
French	fr	388,481	4.36%
Thai	th	352,902	3.96%
Italian	it	219,572	2.46%
(undefined)	und	208,908	2.34%
Indonesian	in	201,821	2.26%
Portuguese	pt	169,599	1.9%
Japanese	ja	145,985	1.64%
Turkish	tr	134,173	1.5%

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Usage Agreement

Statistics Summary (v1.9)

Known Gaps

Inquiries

Data Usage Agreement

Statistics Summary (v1.8)

Known Gaps

Inquiries

Data Usage Agreement

Statistics Summary (v1.7)

Known Gaps

Inquiries

Data Usage Agreement

Statistics Summary (v1.6)

Known Gaps

Inquiries

Data Usage Agreement

Statistics Summary (v1.5)

Known Gaps

Inquiries

Data Usage Agreement

Statistics Summary (v1.4)

Known Gaps

Inquiries

Data Usage Agreement

Statistics Summary (v1.3)

Known Gaps

Inquiries

Data Usage Agreement

Statistics Summary (v1.2)

Inquiries

Data Usage Agreement

Statistics Summary (v1.1)

Inquiries

Data Usage Agreement

Statistics Summary (v1.0)

Inquiries

Releases: echen102/COVID-19-TweetIDs

Release v1.9

Data Usage Agreement

Statistics Summary (v1.9)

Known Gaps

Inquiries

Release v1.8

Data Usage Agreement

Statistics Summary (v1.8)

Known Gaps

Inquiries

Release v1.7

Data Usage Agreement

Statistics Summary (v1.7)

Known Gaps

Inquiries

Release v1.6

Data Usage Agreement

Statistics Summary (v1.6)

Known Gaps

Inquiries

Release v1.5

Data Usage Agreement

Statistics Summary (v1.5)

Known Gaps

Inquiries

Release v1.4

Data Usage Agreement

Statistics Summary (v1.4)

Known Gaps

Inquiries

Release v1.3

Data Usage Agreement

Statistics Summary (v1.3)

Known Gaps

Inquiries

Release v1.2

Data Usage Agreement

Statistics Summary (v1.2)

Inquiries

Release v1.1

Data Usage Agreement

Statistics Summary (v1.1)

Inquiries

Release v1.0

Data Usage Agreement

Statistics Summary (v1.0)

Inquiries