-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
151 lines (137 loc) · 7.75 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
<!DOCTYPE html>
<html>
<head lang="en">
<meta charset="UTF-8">
<title>Simpson's Paradox</title>
<!-- bower:css -->
<link rel="stylesheet" href="bower_components/bootstrap/dist/css/bootstrap.css" />
<!-- endbower -->
<!-- bower:js -->
<script src="bower_components/jquery/dist/jquery.min.js"></script>
<script src="bower_components/d3/d3.min.js"></script>
<script src="bower_components/twgl.js/dist/twgl-full.min.js"></script>
<script src="bower_components/bootstrap/dist/js/bootstrap.js"></script>
<script src="bower_components/underscore/underscore-min.js"></script>
<!-- endbower -->
<link rel="stylesheet" href="cometChart/cometChart.css">
<link rel="stylesheet" href="cometChart3D/cometChart3D.css">
<script src="cometChart/cometChartFunctions.js"></script>
<script src="cometChart/cometChart.js"></script>
<script src="cometChart3D/cometChart3D.js"></script>
<script src="vectorVis/vectorVis.js"></script>
<script src="vectorVisComponents/vectorVisComponents.js"></script>
</head>
<body>
<div class="container">
<div class="col-md-12">
<div class="text-center">
<h1>Visualizing Simpson's Paradox</h1>
<b class="text-center">Allen Korenevsky</b>
</div>
<h2>
Description
</h2>
<p>
This project is an attempt to improve visualizing if data is subject to Simpson's paradox.
Simpson's paradox occurs when grouped data has an opposite trend from the total aggregate. This occurs when
there's a lurking variable present, like data weights. Most visualizations fail to show data values and their
weights at the same time.
</p>
<p>
The first attempt is an improvement of the comet chart. The comets in the chart represent data at two different
time periods. The tip of the comet is the start and the wider tail is the end time. Blue and orange colors mean
that the data weight is either heavily skewed one side or another. In the below comet chart, many items are grey
on the left side because it's using a log scale.
</p>
<p>
The comet chart has two major pitfalls. First, it can only show two date differences. Second, and most importantly,
it's difficult which direction small comets are pointing. For example, the black aggregate comet is so small it's
essentially a triangle. Both of these shortcomings can be fixed by making the chart 3D, since the depth can allow
an arbitrary number of time steps and small comets can be rotated to determine their position. The 3D representation
removed the comets however sine the direction of time flow is more obvious with a dedicated axis.
</p>
<p>
The comet chart on the left is a prototype version that one of the creators sent me. She told me the full version wasn't
available for distribution. The data set involves new born baby's weights, broken up by state. The only interactivity
is seeing more information about the data facets by hovering over each comet. The full version described in the paper
has much more interactivity enabled.
The 3D comet chart to the right is the exact same data set. Dragging the mouse left and right will rotate the data,
and holding the <code>y</code> key allows tilting up and down as well.
Clicking the data set drop down toggles which data is being viewed. The second and third data sets are unemployment
data for different market segments. The sharp color and position difference signals that the market segment weights
- the market size in this case - are changing and can skew aggregate results.
</p>
<p>
The 3D comet chart is made with WebGL. It's essentially three components, each with a separate shader. The axes
are the first component; they are simply a few vertices with indices connecting them into the familiar 3D cartesian
plot. The data lines are the next component, where each line has a custom color depending on how much the weighted
category has changed. For the employment data, I added support for an arbitrary number of time steps in the data.
Finally, there's the axis labels. Each label is a textured plane carefully positioned over the proper axis. The
texture is generated using a <code>Canvas</code> that has text set on it. WebGL can take a canvas directly as
a texture, making text manipulation very easy.
</p>
<p>
The bottom two visualizations were my attempt on visualizing Simpson's paradox data that doesn't have a time component.
It's data from a case in Berkley in the 1960s where more men then women were admitted overall, but for most departments
women were accepted in higher numbers. The lurking variable is the amount of women and that applied to each department
is different. Women applied to more selective departments so while they were accepted to them more than men they were
still rejected more overall.
</p>
<p>
The Vector Visualization colors lines pink or blue according to if more women or men were admitted to a department.
The large line is the aggregate however, so overall most departments admitted more women. The components was an attempt
to replicate <a href="data/simpson_paradox.jpg">this</a> image but with multiple categories instead of just two.
Lining up the vectors head to tail however didn't produce an interesting visualization.
</p>
<p>
This project allowed me to cultivate a good software development base for future WebGL work. I learned how to use
Bower to install and manage client side packages and bower to automatically include them in the main HTML file.
I also set up a reliable system to import shaders from external files. It's surprisingly difficult to store shaders
to use with WebGL; many examples store the shaders in script tags on the main HTML page. This makes managing shaders
unruly as more and more get added. Most importantly for me, I lose syntax highlighting if shaders are not in a separate
.glsl file.
</p>
<h2>
Resources
</h2>
<p>
The idea for this project comes from the paper <a href="http://research.google.com/pubs/pub42901.html">
Visualizing Statistical Mix Effects and Simpson's Paradox</a> which pioneered the comet chart, along with formalizing
the issue and giving ideas about future work.
I use <a href="http://twgljs.org/">twgl</a> as a thin wrapper around WebGL. It makes the API less verbose and
includes helpful additions like libraries to multiply 4x4 matricies and generate common geometric primitives.
Finally, <a href="http://webglfundamentals.org/webgl/lessons/webgl-text-texture.html">this</a> tutorial helped me
to get text in my WebGL application.
</p>
</div>
<div class="col-md-6">
<h2>Comet Chart</h2>
<div id="chart"></div>
</div>
<div id="cc3dParent" class="col-md-6">
<h2>3D Comet Chart</h2>
<div class="btn-group">
<button type="button" class="btn btn-default">Data Set</button>
<button type="button" class="btn btn-default dropdown-toggle" data-toggle="dropdown" aria-expanded="false">
<span class="caret"></span>
<span class="sr-only">Toggle Dropdown</span>
</button>
<ul class="dropdown-menu" role="menu">
<li><a id="weightData" href="javascript:void(0)">Weight Data</a></li>
<li><a id="employmentData" href="javascript:void(0)">Employment Data</a></li>
<li><a id="employmentData2" href="javascript:void(0)">Employment Data (Extra time step)</a></li>
</ul>
</div>
<canvas id="chart3D"></canvas>
</div>
<div class="col-md-12">
<h2>Vector Visualization</h2>
<div id="vectorVis"></div>
</div>
<div class="col-md-12">
<h2>Vector Visualization Components</h2>
<div id="vectorVisComponents"></div>
</div>
</div>
</body>
</html>