Skip to content

Release v1.0

Compare
Choose a tag to compare
@echen102 echen102 released this 18 Mar 01:00
· 167 commits to master since this release

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 3/5/20 - 3/12/20.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service and cite the following manuscript:

Emily Chen, Kristina Lerman, Emilio Ferrara. #COVID-19: The First Public Coronavirus Twitter Dataset. arXiv preprint, March 18, 2020

Statistics Summary (v1.0)

Number of Tweets : 8,919,411

Language Breakdown

Language ISO No. tweets % total Tweets
English en 5,508,304 61.76%
Spanish es 1,167,172 13.09%
French fr 388,481 4.36%
Thai th 352,902 3.96%
Italian it 219,572 2.46%
(undefined) und 208,908 2.34%
Indonesian in 201,821 2.26%
Portuguese pt 169,599 1.9%
Japanese ja 145,985 1.64%
Turkish tr 134,173 1.5%

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.
If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.