Releases: echen102/COVID-19-TweetIDs
Release v1.9
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 5/15/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020
Statistics Summary (v1.9)
Number of Tweets : 129,911,732
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 84,930,677 | 65.38% |
Spanish | es | 14,686,543 | 11.31% |
Indonesian | in | 4,438,377 | 3.42% |
French | fr | 3,947,201 | 3.04% |
Portuguese | pt | 3,779,380 | 2.91% |
Japanese | ja | 3,135,378 | 2.41% |
(undefined) | und | 2,895,932 | 2.22% |
Thai | th | 2,796,427 | 2.15% |
Italian | it | 1,669,494 | 1.29% |
Turkish | tr | 1,378,430 | 1.06% |
Known Gaps
Date | Time |
---|---|
2/1/2020 | 4:00 - 9:00 UTC |
2/8/2020 | 6:00 - 7:00 UTC |
2/22/2020 | 21:00 - 24:00 UTC |
2/23/2020 | 0:00 - 24:00 UTC |
2/24/2020 | 0:00 - 4:00 UTC |
2/25/2020 | 0:00 - 3:00 UTC |
3/2/2020 | Intermittent Internet Connectivity Issues |
5/14/2020 | 7:00 - 8:00 UTC |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.8
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 5/08/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020
Statistics Summary (v1.8)
Number of Tweets : 123,113,914
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 80,698,556 | 65.55% |
Spanish | es | 13,848,449 | 11.25% |
Indonesian | in | 4,196,591 | 3.41% |
French | fr | 3,762,601 | 3.06% |
Portuguese | pt | 3,451,196 | 2.80% |
Japanese | ja | 2,897,046 | 2.35% |
Thai | th | 2,754,627 | 2.24% |
(undefined) | und | 2,711,649 | 2.20% |
Italian | it | 1,615,916 | 1.31% |
Turkish | tr | 1,308,989 | 1.06% |
Known Gaps
Date | Time |
---|---|
2/1/2020 | 4:00 - 9:00 UTC |
2/8/2020 | 6:00 - 7:00 UTC |
2/22/2020 | 21:00 - 24:00 UTC |
2/23/2020 | 0:00 - 24:00 UTC |
2/24/2020 | 0:00 - 4:00 UTC |
2/25/2020 | 0:00 - 3:00 UTC |
3/2/2020 | Intermittent Internet Connectivity Issues |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.7
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 5/01/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020
Statistics Summary (v1.7)
Number of Tweets : 115,929,358
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 76,245,404 | 65.77% |
Spanish | es | 13,000,982 | 11.21% |
Indonesian | in | 4,012,099 | 3.46% |
French | fr | 3,554,454 | 3.07% |
Portuguese | pt | 3,148,855 | 2.72% |
Thai | th | 2,686,041 | 2.32% |
Japanese | ja | 2,538,512 | 2.19% |
(undefined) | und | 2,517,028 | 2.17% |
Italian | it | 1,550,723 | 1.34% |
Turkish | tr | 1,241,016 | 1.07% |
Known Gaps
Date | Time |
---|---|
2/1/2020 | 4:00 - 9:00 UTC |
2/8/2020 | 6:00 - 7:00 UTC |
2/22/2020 | 21:00 - 24:00 UTC |
2/23/2020 | 0:00 - 24:00 UTC |
2/24/2020 | 0:00 - 4:00 UTC |
2/25/2020 | 0:00 - 3:00 UTC |
3/2/2020 | Intermittent Internet Connectivity Issues |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.6
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 4/24/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020
Statistics Summary (v1.6)
Number of Tweets : 109,013,655
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 71,984,701 | 66.03% |
Spanish | es | 12,149,916 | 11.15% |
Indonesian | in | 3,826,448 | 3.51% |
French | fr | 3,340,808 | 3.06% |
Portuguese | pt | 2,928,843 | 2.69% |
Thai | th | 2,630,420 | 2.41% |
(undefined) | und | 2,327,240 | 2.13% |
Japanese | ja | 2,156,385 | 1.98% |
Italian | it | 1,484,474 | 1.36% |
Turkish | tr | 1,165,210 | 1.07% |
Known Gaps
Date | Time |
---|---|
2/1/2020 | 4:00 - 9:00 UTC |
2/8/2020 | 6:00 - 7:00 UTC |
2/22/2020 | 21:00 - 24:00 UTC |
2/23/2020 | 0:00 - 24:00 UTC |
2/24/2020 | 0:00 - 4:00 UTC |
2/25/2020 | 0:00 - 3:00 UTC |
3/2/2020 | Intermittent Internet Connectivity Issues |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.5
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 4/17/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020
Statistics Summary (v1.5)
Number of Tweets : 101,771,227
Language breakdown of top 10 most prevalent languages :
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 67,427,185 | 66.25% |
Spanish | es | 11,254,540 | 11.06% |
Indonesian | in | 3,591,884 | 3.53% |
French | fr | 3,124,414 | 3.07% |
Portuguese | pt | 2,715,462 | 2.67% |
Thai | th | 2,577,166 | 2.53% |
(undefined) | und | 2,113,795 | 2.08% |
Japanese | ja | 1,867,601 | 1.84% |
Italian | it | 1,419,867 | 1.40% |
Turkish | tr | 1,092,512 | 1.07% |
Known Gaps
Date | Time |
---|---|
2/1/2020 | 4:00 - 9:00 UTC |
2/8/2020 | 6:00 - 7:00 UTC |
2/22/2020 | 21:00 - 24:00 UTC |
2/23/2020 | 0:00 - 24:00 UTC |
2/24/2020 | 0:00 - 4:00 UTC |
2/25/2020 | 0:00 - 3:00 UTC |
3/2/2020 | Intermittent Internet Connectivity Issues |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.4
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 4/3/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv:cs.SI/2003.07372, 2020
Statistics Summary (v1.4)
Number of Tweets : 94,671,486
Language Breakdown
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 63,111,114 | 66.66% |
Spanish | es | 10,348,964 | 10.93% |
Indonesian | in | 3,351,221 | 3.54% |
French | fr | 2,918,003 | 3.08% |
Portuguese | pt | 2,474,677 | 2.61% |
Thai | th | 2,355,393 | 2.49% |
(undefined) | und | 1,915,551 | 2.02% |
Japanese | ja | 1,639,825 | 1.73% |
Italian | it | 1,362,672 | 1.44% |
Turkish | tr | 1,011,417 | 1.7% |
Known Gaps
Date | Time |
---|---|
2/1/2020 | 4:00 - 9:00 UTC |
2/8/2020 | 6:00 - 7:00 UTC |
2/22/2020 | 21:00 - 24:00 UTC |
2/23/2020 | 0:00 - 24:00 UTC |
2/24/2020 | 0:00 - 4:00 UTC |
2/25/2020 | 0:00 - 3:00 UTC |
3/2/2020 | Intermittent Internet Connectivity Issues |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.3
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 4/3/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020
Statistics Summary (v1.3)
Number of Tweets : 87,209,465
Language Breakdown
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 58,456,856 | 67.03% |
Spanish | es | 9,368,223 | 10.74% |
Indonesian | in | 3,091,193 | 3.54% |
French | fr | 2,681,635 | 3.07% |
Thai | th | 2,254,162 | 2.58% |
Portuguese | pt | 2,231,807 | 2.56% |
(undefined) | und | 1,701,784 | 1.95% |
Japanese | ja | 1,462,367 | 1.68% |
Italian | it | 1,301,795 | 1.49% |
Turkish | tr | 911,543 | 1.05% |
Known Gaps
Date | Time |
---|---|
2/1/2020 | 4:00 - 9:00 UTC |
2/8/2020 | 6:00 - 7:00 UTC |
2/22/2020 | 21:00 - 24:00 UTC |
2/23/2020 | 0:00 - 24:00 UTC |
2/24/2020 | 0:00 - 4:00 UTC |
2/25/2020 | 0:00 - 3:00 UTC |
3/2/2020 | Intermittent Internet Connectivity Issues |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.2
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 3/21/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020
Statistics Summary (v1.2)
Number of Tweets : 72,403,796
Language Breakdown
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 49,525,165 | 68.40% |
Spanish | es | 7,467,220 | 10.31% |
Indonesian | in | 2,296,629 | 3.17% |
French | fr | 2,164,654 | 2.99% |
Thai | th | 1,927,905 | 2.66% |
Portuguese | pt | 1,740,967 | 2.40% |
(undefined) | und | 1,307,121 | 1.81% |
Japanese | ja | 1,276,425 | 1.76% |
Italian | it | 1,183,317 | 1.63% |
Turkish | tr | 688,860 | 0.95% |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.1
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 1/21/20 - 3/12/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020
Statistics Summary (v1.1)
Number of Tweets : 63,616,072
Language Breakdown
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 44,482,496 | 69.92% |
Spanish | es | 6,087,308 | 9.57% |
Indonesian | in | 1,844,037 | 2.90% |
French | fr | 1,800,318 | 2.83% |
Thai | th | 1,687,309 | 2.65% |
Portuguese | pt | 1,278,662 | 2.01% |
Japanese | ja | 1,223,646 | 1.92% |
Italian | it | 1,113,001 | 1.75% |
(undefined) | und | 1,110,165 | 1.75% |
Turkish | tr | 570,744 | 0.90% |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.
Release v1.0
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.
This release contains Tweet IDs collected from 3/5/20 - 3/12/20.
Please refer to the README for more details regarding data, data organization and data usage agreement.
Data Usage Agreement
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:
Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020
Statistics Summary (v1.0)
Number of Tweets : 8,919,411
Language Breakdown
Language | ISO | No. tweets | % total Tweets |
---|---|---|---|
English | en | 5,508,304 | 61.76% |
Spanish | es | 1,167,172 | 13.09% |
French | fr | 388,481 | 4.36% |
Thai | th | 352,902 | 3.96% |
Italian | it | 219,572 | 2.46% |
(undefined) | und | 208,908 | 2.34% |
Indonesian | in | 201,821 | 2.26% |
Portuguese | pt | 169,599 | 1.9% |
Japanese | ja | 145,985 | 1.64% |
Turkish | tr | 134,173 | 1.5% |
Inquiries
If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.