Skip to content

Releases: echen102/COVID-19-TweetIDs

Release v1.9

18 May 08:57
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 5/15/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.9)

Number of Tweets : 129,911,732

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 84,930,677 65.38%
Spanish es 14,686,543 11.31%
Indonesian in 4,438,377 3.42%
French fr 3,947,201 3.04%
Portuguese pt 3,779,380 2.91%
Japanese ja 3,135,378 2.41%
(undefined) und 2,895,932 2.22%
Thai th 2,796,427 2.15%
Italian it 1,669,494 1.29%
Turkish tr 1,378,430 1.06%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues
5/14/2020 7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.8

11 May 09:40
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 5/08/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.8)

Number of Tweets : 123,113,914

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 80,698,556 65.55%
Spanish es 13,848,449 11.25%
Indonesian in 4,196,591 3.41%
French fr 3,762,601 3.06%
Portuguese pt 3,451,196 2.80%
Japanese ja 2,897,046 2.35%
Thai th 2,754,627 2.24%
(undefined) und 2,711,649 2.20%
Italian it 1,615,916 1.31%
Turkish tr 1,308,989 1.06%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.7

04 May 09:38
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 5/01/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.7)

Number of Tweets : 115,929,358

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 76,245,404 65.77%
Spanish es 13,000,982 11.21%
Indonesian in 4,012,099 3.46%
French fr 3,554,454 3.07%
Portuguese pt 3,148,855 2.72%
Thai th 2,686,041 2.32%
Japanese ja 2,538,512 2.19%
(undefined) und 2,517,028 2.17%
Italian it 1,550,723 1.34%
Turkish tr 1,241,016 1.07%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.6

27 Apr 07:57
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 4/24/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.6)

Number of Tweets : 109,013,655

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 71,984,701 66.03%
Spanish es 12,149,916 11.15%
Indonesian in 3,826,448 3.51%
French fr 3,340,808 3.06%
Portuguese pt 2,928,843 2.69%
Thai th 2,630,420 2.41%
(undefined) und 2,327,240 2.13%
Japanese ja 2,156,385 1.98%
Italian it 1,484,474 1.36%
Turkish tr 1,165,210 1.07%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.5

21 Apr 04:27
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 4/17/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.5)

Number of Tweets : 101,771,227

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 67,427,185 66.25%
Spanish es 11,254,540 11.06%
Indonesian in 3,591,884 3.53%
French fr 3,124,414 3.07%
Portuguese pt 2,715,462 2.67%
Thai th 2,577,166 2.53%
(undefined) und 2,113,795 2.08%
Japanese ja 1,867,601 1.84%
Italian it 1,419,867 1.40%
Turkish tr 1,092,512 1.07%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.4

13 Apr 07:12
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 4/3/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020

Statistics Summary (v1.4)

Number of Tweets : 94,671,486

Language Breakdown

Language ISO No. tweets % total Tweets
English en 63,111,114 66.66%
Spanish es 10,348,964 10.93%
Indonesian in 3,351,221 3.54%
French fr 2,918,003 3.08%
Portuguese pt 2,474,677 2.61%
Thai th 2,355,393 2.49%
(undefined) und 1,915,551 2.02%
Japanese ja 1,639,825 1.73%
Italian it 1,362,672 1.44%
Turkish tr 1,011,417 1.7%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.3

12 Apr 02:37
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 4/3/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.3)

Number of Tweets : 87,209,465

Language Breakdown

Language ISO No. tweets % total Tweets
English en 58,456,856 67.03%
Spanish es 9,368,223 10.74%
Indonesian in 3,091,193 3.54%
French fr 2,681,635 3.07%
Thai th 2,254,162 2.58%
Portuguese pt 2,231,807 2.56%
(undefined) und 1,701,784 1.95%
Japanese ja 1,462,367 1.68%
Italian it 1,301,795 1.49%
Turkish tr 911,543 1.05%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.2

01 Apr 06:16
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 3/21/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.2)

Number of Tweets : 72,403,796

Language Breakdown

Language ISO No. tweets % total Tweets
English en 49,525,165 68.40%
Spanish es 7,467,220 10.31%
Indonesian in 2,296,629 3.17%
French fr 2,164,654 2.99%
Thai th 1,927,905 2.66%
Portuguese pt 1,740,967 2.40%
(undefined) und 1,307,121 1.81%
Japanese ja 1,276,425 1.76%
Italian it 1,183,317 1.63%
Turkish tr 688,860 0.95%

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.1

23 Mar 20:24
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 3/12/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.1)

Number of Tweets : 63,616,072

Language Breakdown

Language ISO No. tweets % total Tweets
English en 44,482,496 69.92%
Spanish es 6,087,308 9.57%
Indonesian in 1,844,037 2.90%
French fr 1,800,318 2.83%
Thai th 1,687,309 2.65%
Portuguese pt 1,278,662 2.01%
Japanese ja 1,223,646 1.92%
Italian it 1,113,001 1.75%
(undefined) und 1,110,165 1.75%
Turkish tr 570,744 0.90%

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Release v1.0

18 Mar 01:00
Compare
Choose a tag to compare

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 3/5/20 - 3/12/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.0)

Number of Tweets : 8,919,411

Language Breakdown

Language ISO No. tweets % total Tweets
English en 5,508,304 61.76%
Spanish es 1,167,172 13.09%
French fr 388,481 4.36%
Thai th 352,902 3.96%
Italian it 219,572 2.46%
(undefined) und 208,908 2.34%
Indonesian in 201,821 2.26%
Portuguese pt 169,599 1.9%
Japanese ja 145,985 1.64%
Turkish tr 134,173 1.5%

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.