-
Notifications
You must be signed in to change notification settings - Fork 4
/
WWW2010.html
executable file
·237 lines (222 loc) · 11.7 KB
/
WWW2010.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>What is Twitter, a Social Network or a News Media? - WWW'10</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="Author" content="Haewoon Kwak" />
<link rel="stylesheet" href="css/main.css" type="text/css" media="screen" />
<link rel="stylesheet" href="css/uswds.min.css" type="text/css" media="screen" />
</head>
<body>
<div
class="usa-banner"
aria-label="A website of the Advanced Networking Lab, KAIST">
<header class="usa-banner__header">
<div class="usa-banner__inner">
<div class="grid-col-auto">
<img
aria-hidden="true"
class="usa-banner__header-flag"
src="images/anlab_logo_sq.png"
alt=""
/>
</div>
<div class="grid-col-fill tablet:grid-col-auto" aria-hidden="true">
<p class="usa-banner__header-text">
A website of the Advanced Networking Lab, KAIST.
<a class="usa-link text-no-underline" href="https://an.kaist.ac.kr" target="_blank">Visit homepage.</a>
</p>
</div>
</div>
</header>
</div>
<div id="main">
<H1>What is Twitter, a Social Network or a News Media?</H1>
<P><a href="http://an.kaist.ac.kr/~haewoon">Haewoon Kwak</a>, <a href="http://an.kaist.ac.kr/~chlee">Changhyun Lee</a>, <a href="http://an.kaist.ac.kr/~hosung">Hosung Park</a>, and <a href="http://an.kaist.ac.kr/~sbmoon">Sue Moon</a><br>
<i>Proceedings of the 19th International World Wide Web (WWW) Conference, April 26-30, 2010, Raleigh NC (USA)</i>
<br>
<div class="abstract">
Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.
<br/>
<br/>
We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks~\cite{Newman03}.
In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets.
We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation.
We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature.
A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.
<br/><br/>
To the best of our knowledge this work is the first quantitative study on the entire Twittersphere
and information diffusion on it.
</div>
<p>[<a href="https://an.kaist.ac.kr/~haewoon/papers/2010-www-twitter.pdf">PDF</a> (4.8MB)]
<div class="bibtex"><p><FONT face="courier">
@inproceedings{Kwak10www,<br/>
author = {Kwak, Haewoon and Lee, Changhyun and Park, Hosung and Moon, Sue},<br/>
title = "{W}hat is {T}witter, a social network or a news media?",<br/>
booktitle = {WWW '10: Proceedings of the 19th international conference on World wide web},<br/>
year = {2010},<br/>
isbn = {978-1-60558-799-8},<br/>
pages = {591--600},<br/>
location = {Raleigh, North Carolina, USA},<br/>
doi = {https://doi.acm.org/10.1145/1772690.1772751},<br/>
publisher = {ACM},<br/>
address = {New York, NY, USA},<br/>
}
</font>
</div>
<H2>Slides</H2>
<div style="width:425px" id="__ss_3922095"><strong style="display:block;margin:12px 0 4px"><a href="https://www.slideshare.net/haewoon/what-is-twitter-a-social-network-or-a-news-media-3922095" title="What is Twitter, a Social Network or a News Media? ">What is Twitter, a Social Network or a News Media? </a></strong><div style="padding:5px 0 12px">View more <a href="https://www.slideshare.net/">presentations</a> from <a href="https://www.slideshare.net/haewoon">Haewoon Kwak</a>.</div></div>
<H2>Data</H2>
<div class="important">
Due to Twitter's new Terms of Services, we cannot share data containing tweets any more.<br/>
(for more info, read RWW's article <a href="https://readwrite.com/how_recent_changes_to_twitters_terms_of_service_mi/">"How Recent Changes to Twitter's Terms of Service Might Hurt Academic Research"</a>)<br/>
</div>
<H3>Social graph</H3>
<ul>
<li><b>Download</b><br/>
<div class="important">
* Now we offer direct download links: <a href="https://github.com/ANLAB-KAIST/traces/releases/tag/twitter_rv.net">[GitHub]</a>
</div>
<s>
<!--<a href="http://an.kaist.ac.kr/~haewoon/twitter_rv.tar_gz.torrent">-->
twitter_rv.tar.gz.torrent (34KB)
<!--</a> -->
or
<!--<a href="http://an.kaist.ac.kr/~haewoon/twitter_rv.zip.torrent">-->
twitter_rv.zip.torrent (26KB)
<!--</a>-->
(# of seeds >= 4) <br/>
</s>
twitter_rv.tar.gz, 6,475,352,982 bytes, MD5: c31b4c2d6f3ae325e516e78b499c46f8<br/>
twitter_rv.zip, 4,859,337,443 bytes, MD5: 5f2399aac71c604ac5a100fb6ca7e297<br/>
----<br/>
twitter_rv.net, 26,172,280,241 bytes, MD5: 9c0f7983a523edd1b753af68c5acc4bd<br/>
<li>Format<br/>
<div class="source">USER \t FOLLOWER \n</div>
<div class="note">
* USER and FOLLOWER are represented by numeric ID (integer). <br/>
* These numeric IDs are the same as numeric IDs Twitter managed.<br/>
* Therefore, you can access a profile of user 12 via http://api.twitter.com/1/users/show.xml?user_id=<b>12</b>.<br/>
* For details, see <a href="http://apiwiki.twitter.com/">Twitter API Page</a>
</div>
<li>Example
<div class="source">
12 13<br/>
12 14<br/>
12 15<br/>
16 17<br/>
</div>
<div class="note">
* Users 13, 14 and 15 are followers of user 12.<br/>
* User 17 is a follower of user 16.
</div>
</ul>
<h3>Mapping table from numeric ID to screen name</h3>
<ul style="text-decoration: line-through;">
<li><b>Download</b><br/>
<a href="http://an.kaist.ac.kr/~haewoon/release/numeric2screen.tar.gz">numeric2screen.tar.gz</a>
<li>Format<br/>
<div class="source">Numeric \t Screen_name \n</div>
<div class="note">
* You can use this data to map from numeric ID to screen name.<br/>
<s>* Writers of tweets released by Yang and Leskovec (Helpful other websites 1.) are recorded as screen name<br/></s>
</div>
</ul>
<h3>Restricted user profiles (> 10,000 followers)</h3>
<ul>
<li><b>Download</b><br/>
<a href="http://an.kaist.ac.kr/~haewoon/celebrities_profiles.txt">celebrities_profiles.txt (3.0MB)</a> (Save Link as...)
</li>
<li>Format</li>
<div class="source">
numeric_id \t
verified \t
profile_sidebar_fill_color \t
profile_text_color \t
followers_count \t <br/>
protected \t
location \t
profile_background_color \t
utc_offset \t
statuses_count \t <br/>
description \t
friends_count \t
profile_link_color \t
profile_image_url \t
notifications \t <br/>
profile_background_image_url \t
screen_name \t
profile_background_tile \t
favourites_count \t
name \t <br/>
url \t
created_at \t
time_zone \t
profile_sidebar_border_color \t
following \t <br/>
gender (infered by name) \n
</div>
<div class="note">
* All fields except gender are returned by user method (<a href="http://apiwiki.twitter.com/Twitter-REST-API-Method:-users%C2%A0show">users/show</a>) of Twitter API<br/>
* For the description of each field see <a href="http://apiwiki.twitter.com/Return-Values">Returns Values</a> page in Twitter API Wiki<br/>
* The last field, gender, is inferred by name. It can be m, f, or ?. <br/>
* "For U.S. births in 2008, the top 1000 names represent about 74 percent of all names." For detail information see <a href="http://www.ssa.gov/OACT/babynames/limits.html">Popular Baby Names in Social Security Online</a>
</div>
<li>Example</li>
<div class="source">
12 False EADEAA 333333 895829 False San Francisco 8B542B -28800 4209 Creator, Chairman and co-founder of Twitter 574 9D582E http://s3.amazonaws.com/twitter_production/profile_images/54668082/Picture_2_normal.png False http://static.twitter.com/images/themes/theme8/bg.gif jack False 614 Jack Dorsey None Tue Mar 21 20:50:14 +0000 2006 Pacific Time (US & Canada) D9B17E False m
</div>
</ul>
<H2>Frequently Asked Questions</H2>
<H3>About torrent</H3>
<ul>
<li><b>I cannot download torrent. I cannot find any seed to distribute the social graph.</b><br/>
We are maintaining the number of seeds more than 4. <br/>
If you cannot find any seed, it could be a problem of network configuration such as firewall in your university.<br/>
When problems occur continuously, please email me (haewoon_AT_an.kaist.ac.kr). <br/>
We provide a download link over HTTP for you.<br/>
</li>
</ul>
<H3>About crawling</h3>
<ul>
<li><b>How can I crawl a social graph of Twitter? </b><br/>
Twitter offers rich Application Programming Interface (API).<br/>
By two social graph methods (<a href="http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-friends%C2%A0ids">friends/ids</a>, <a href="http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-followers%C2%A0ids">followers/ids</a>) you can access an entire social graph without authentication.
</li>
<li><b>But... I can send only 150 requests per hour.</b><br/>
While Twitter basically controls API request rate within 150 requests per hour, <br/>
you can send up to 20,000 requests per hour (per IP) once you are registered on the <i>whitelist.</i> <br/>
For detail information see <a href="http://apiwiki.twitter.com/Rate-limiting">Whitelisting section in this page</a>
</li>
<li><b>Can I get more information about users in the social graph?</b><br/>
For every user you can access public user profiles such as name and bio by user method (<a href="http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-users%C2%A0show">users/show</a>) with numeric user ID in the social graph.
</li>
</ul>
<!--
<ol>
<li>there are no privacy issues with numeric ID instead of screen name or name</li>
<li>those who have more than 10,000 followers are mainly celebrities</li>
<li>tweets mentioning trending topics are certainly publicy avaialable</li>
<li>PageRank is simply one of the topological charcteristics computed from the link structure.
</ol>
-->
<H2>Helpful other websites</H2>
<ol>
<li><a href="http://snap.stanford.edu/data/twitter7.html">SNAP: Network datasets - 476 million tweets by J. Yang and J. Leskovec (No longer available)</a></li>
<li><a href="http://twitter.mpi-sws.org/">MPI-SWS - User accounts, social graph, and tweets by Cha. et al.</a></li>
<li><a href="http://140kit.com/">140kit</a></li>
</ol>
</div>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-2910420-5");
pageTracker._trackPageview();
} catch(err) {}</script>
</body>
</html>