-
Notifications
You must be signed in to change notification settings - Fork 0
/
feed.xml
243 lines (169 loc) · 16.4 KB
/
feed.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>jacob hoover vigly</title>
<description>Postdoc at MIT, Department of Brain and Cognitive Sciences.
</description>
<link>https://jahoo.github.io/</link>
<atom:link href="https://jahoo.github.io/feed.xml" rel="self" type="application/rss+xml"/>
<pubDate>Thu, 03 Oct 2024 09:53:48 -0400</pubDate>
<lastBuildDate>Thu, 03 Oct 2024 09:53:48 -0400</lastBuildDate>
<generator>Jekyll v4.3.1</generator>
<item>
<title>The Cost of Information</title>
<description><p>I just submitted the final version of my dissertation to McGill. Here is a link to it:</p>
<ul>
<li><a href="/assets/dissertation.pdf"><em>The Cost of Information: Looking beyond Predictability in Language Processing</em></a></li>
</ul>
</description>
<pubDate>Fri, 02 Aug 2024 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2024/08/02/dissertation.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2024/08/02/dissertation.html</guid>
<category>dissertation</category>
</item>
<item>
<title>surprisal and KL</title>
<description>\[\global\def\colorKL{\color{4fa283}}
\global\def\colorR{\color{ec8c62}}
\global\def\R{\colorR\mathrm{R}}\]
<p>Consider any setting where a distribution over some latent variable \(Z\) changes when conditioning on some outcome \(\breve u\) of an observable random variable. The change can be quantified as <em>KL divergence</em>, \(\operatorname{\colorKL KL}(p_{Z\mid \breve u}\|p_{Z})\). This divergence can be decomposed into <em>surprisal</em> of \(\breve u\) minus another term, which I’ll call \(\R\):</p>
\[\begin{aligned}
\operatorname{\colorKL KL}(p_{Z\mid \breve u}\|p_{Z})
&amp;&amp; = &amp;&amp; -\log p(\breve u)
&amp;&amp; - &amp;&amp; \mathop{\mathbb{E}}_{p_{Z\mid \breve u}}[-\log p(\breve u\mid z)]\\
\operatorname{\colorKL KL}(\operatorname{posterior}\|\operatorname{prior})
&amp;&amp; = &amp;&amp; \operatorname{surprisal}
&amp;&amp; - &amp;&amp;
\underbrace{\mathop{\mathbb{E}}_{\operatorname{posterior}}[-\log \operatorname{lik}]}_{\operatorname{\colorR R}}
\end{aligned}\]
<p>Since KL is nonnegative, R can take on values between 0 and the surprisal. Put another way, this implies that surprisal upper-bounds the amount by which the distribution changes. Note that if surprisal is large and R is also large, KL is small—that is, despite the observation containing a large amount of information, it does not result in a large change in the distribution.</p>
<h3 id="interactive-illustration">Interactive illustration</h3>
<p>Manipulate prior and likelihood sliders below to see posterior and resulting surprisal partition:</p>
<iframe width="100%" height="1509" frameborder="0" src="https://observablehq.com/embed/@postylem/kl-and-surprisal?cells=viewof+showOtherKL%2Cplot1_1%2Cplot1_2%2Cviewof+dim%2Cviewof+useLogInput%2Cviewof+allowZeroes%2Cinput1%2Cviewof+scale_prior%2Cviewof+scale_likelihood%2Cviewof+applyScaleLikelihood1%2Cmodification_plots%2Cviewof+whetherPlotLogSpace%2Cviewof+maxUnits%2Cviewof+base"></iframe>
</description>
<pubDate>Tue, 10 Oct 2023 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2023/10/10/surprisal-and-KL.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2023/10/10/surprisal-and-KL.html</guid>
<category>note</category>
</item>
<item>
<title>Plausibility of Sampling for Processing</title>
<description><p>I just posted a preprint:</p>
<p><a href="https://osf.io/qjnpv">🔗 <em>The Plausibility of Sampling as an Algorithmic Theory of Sentence Processing</em></a>.</p>
<p>This work is a collaboration with <a href="https://people.linguistics.mcgill.ca/~morgan/">Morgan Sonderegger</a>, <a href="http://colala.berkeley.edu/people/piantadosi/">Steve Piantadosi</a>, and <a href="https://todonnell.github.io/">Tim O’Donnell</a>. It is based on the well-documented observation that for humans, the difficulty to process a given item of linguistic input depends on how predictable it is in context—more surprising words take longer to process. However, most existing theories of processing cannot simply and directly predict this behavior. What algorithm might be capable of explaining this phenomenon?</p>
<p>In this work, we focus on a class of algorithms whose runtime does naturally scale in surprisal—those that involve repeatedly sampling from the prior. Our first contribution is to show that <strong>simple examples of such algorithms predict runtime to increase superlinearly with surprisal, and also predict variance in runtime to increase.</strong> These two predictions stand in contrast with literature on surprisal theory (<a href="https://www.aclweb.org/anthology/N01-1021">Hale, 2001</a>; <a href="https://doi.org/10.1016/j.cognition.2007.05.006">Levy, 2008</a>) which argues that the expected processing cost should increase linearly with surprisal, and makes no prediction about variance.</p>
<p>In the second part of this paper, we conduct an empirical study of the relationship between surprisal and reading time, using a collection of modern language models to estimate surprisal, and fitting Generalized Additive Models of the relationship. We find that with better language models, reading time increases superlinearly in surprisal, and also that variance increases. These results are consistent with the predictions of sampling-based algorithms.</p>
<hr />
<p><br /></p>
<dl>
<dt><em>update 2023-07</em></dt>
<dd>Published in the journal <em>Open Mind</em> (2023) 7: 350–391. [<a href="https://doi.org/10.1162/opmi_a_00086">🔗 open access</a>]</dd>
</dl>
</description>
<pubDate>Fri, 21 Oct 2022 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2022/10/21/plausibility-sampling-processing.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2022/10/21/plausibility-sampling-processing.html</guid>
<category>note</category>
<category>paper</category>
</item>
<item>
<title>LaTeX for Linguistics tutorial</title>
<description><p>I’m leading a short workshop on \(\LaTeX{}\) for Linguistics today. Resources are</p>
<ul>
<li>in an <a href="https://www.overleaf.com/read/qvdscvjbtjxr">Overleaf document here</a></li>
<li>and <a href="https://github.com/postylem/latex-tutorial">on GitHub here</a></li>
</ul>
</description>
<pubDate>Wed, 05 Oct 2022 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2022/10/05/LaTeX-tutorial.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2022/10/05/LaTeX-tutorial.html</guid>
<category>note</category>
</item>
<item>
<title>Density of transformed random variable</title>
<description><!-- Note to self, remove the whole assets/transform-pdf/ dir from this website if you ever get around to making this post actually generate from markdown instead of this hacky version -->
<div>
<iframe src="/assets/transform-pdf/q/notebooks/transform-pdf.html" width="100%" height="9150" frameborder="none">
</iframe>
</div>
</description>
<pubDate>Fri, 02 Sep 2022 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2022/09/02/transform-pdf.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2022/09/02/transform-pdf.html</guid>
<category>note</category>
</item>
<item>
<title>Rejection sampling</title>
<description><!-- Note to self, remove the html file and the whole /site_libs/ dir from this website if you ever get around to making this post actually generate from markdown instead of this hacky version -->
<p><em>Rejection sampling</em> refers to a particular algorithm involving drawing samples from one distribution in order to estimate some other distribution, by rejecting or accepting the samples obtained in a smart way. In this note I’m exploring this algorithm a little with some simulations, and also showing how a different, similar, algorithm can be seen as a special case of the general version (because it wasn’t at all obvious to me at first how they were related).</p>
<div>
<iframe src="/assets/rejection-sampling-expo.html" width="100%" height="6150" frameborder="none">
</iframe>
</div>
</description>
<pubDate>Mon, 29 Aug 2022 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2022/08/29/rejection-sampling-expo.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2022/08/29/rejection-sampling-expo.html</guid>
<category>note</category>
</item>
<item>
<title>Linguistic Dependencies and Statistical Dependence</title>
<description><p>At <a href="https://2021.emnlp.org/">EMNLP</a> (virtually) I presented work (with <a href="https://aclanthology.org/people/w/wenyu-du/">Wenyu Du</a>, <a href="https://scholar.google.it/citations?user=DJon7w4AAAAJ&amp;hl">Alessandro Sordoni</a>, and <a href="https://scholar.google.com/citations?user=iYjXhYwAAAAJ&amp;hl">Timothy J. O’Donnell</a>) titled <em>Linguistic Dependencies and Statistical Dependence</em>.</p>
<div style="text-align: center;"><img width="400" src="/assets/2021-11-07-EMNLP-dependency-dependence-fig.png" /></div>
<p>In this work, we compared <em>linguistic dependency</em> trees to dependency trees representing <em>statistical dependence</em> between words, which we extracted from mutual information estimates using pretrained language models. Computing accuracy scores we found that the accuracy of the extracted trees was only as high as a simple linear baseline that connects adjacent words, even with strong controls. We also found considerable differences between pretrained LMs.</p>
<ul>
<li>
<p>Paper is <a href="http://dx.doi.org/10.18653/v1/2021.emnlp-main.234">here</a>.</p>
</li>
<li>Poster is <a href="/assets/pdfs/2021.10.11.EMNLP.poster.pdf">here</a>.</li>
<li>Talk slides are <a href="/assets/pdfs/2021.10.11.EMNLP.talk-slides.pdf">here</a>.</li>
<li>Code is available <a href="https://github.com/mcqll/cpmi-dependencies">here</a>.</li>
</ul>
</description>
<pubDate>Sun, 07 Nov 2021 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2021/11/07/EMNLP-dependency-dependence.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2021/11/07/EMNLP-dependency-dependence.html</guid>
<category>presentation</category>
<category>paper</category>
</item>
<item>
<title>Nivre's parsing examples animated</title>
<description><p>I made a simple flipbook animation of deterministic dependency parsing algorithms examples from <a href="https://doi.org/10.1162/coli.07-056-R1-07-027">Nivre (2008)</a>, using <a href="https://ctan.org/pkg/beamer"><code class="language-plaintext highlighter-rouge">beamer</code></a>, because apparently I’m living in 2006.</p>
<div style="text-align: center;"><a href="/assets/2021-04-16-deterministic-dependency-parsing.pdf"><img width="600" src="/assets/2021-04-16-deterministic-dependency-parsing-example.png" /></a></div>
<p><br /></p>
<p>The <a href="/assets/2021-04-16-deterministic-dependency-parsing.pdf">PDF here</a> is meant to be viewed in presentation mode / single-page view. The LaTeX code is <a href="/assets/2021-04-deterministic-dependency-parsing.zip">in this zip file</a>, in case that’s useful to anyone.</p>
</description>
<pubDate>Fri, 16 Apr 2021 00:00:00 -0400</pubDate>
<link>https://jahoo.github.io/2021/04/16/deterministic-dependency-parsing.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2021/04/16/deterministic-dependency-parsing.html</guid>
<category>note</category>
</item>
<item>
<title>Simplest least-squares in Julia</title>
<description><div>
<iframe src="/assets/simplest_linear_regression_example.html" width="100%" height="1800" frameborder="none">
</iframe>
</div>
</description>
<pubDate>Mon, 11 Jan 2021 00:00:00 -0500</pubDate>
<link>https://jahoo.github.io/2021/01/11/simplest_linear_regression_example.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2021/01/11/simplest_linear_regression_example.html</guid>
<category>note</category>
</item>
<item>
<title>Training Tensor Trains</title>
<description><p>I worked on a project with Jonathan Palucci exploring the trainability of a certain simple kind of <a href="https://tensornetwork.org/">tensor network</a>, called the Tensor Trains, or Matrix Product States.</p>
<div style="text-align: center;"><img width="400" src="/assets/2020-12-22-training-tensor-trains-fig2.png" /></div>
<p>There is a general correspondence between tensor networks and graphical models, and in particular, when restricted to non-negative valued parameters, Matrix Product States are equivalent to Hidden Markov Models. <a href="https://arxiv.org/abs/1907.03741">Glasser <em>et al</em>. 2019</a> discussed this correspondence, and proved theoretical results about these non-negative models, as well as similar real– and complex–valued tensor trains. They supplemented their theoretical results with evidence from numerical experiments. In this project, we re-implemented models from their paper, and also implemented time-homogeneous versions of their models.
We replicated some of their results for non-homogeneous models, adding a comparison with homogeneous models on the same data. We found evidence that homogeneity decreases ability of the models to fit non-sequential data, but preliminarily observed that on sequential data (for which the assumption of homogeneity is justified), homogeneous models achieved an equally good fit with far fewer parameters. Surprisingly, we also found that the more powerful non time-homogeneous positive MPS performs identically to a time homogeneous HMM.</p>
<p>📊 Poster –&gt; <a href="/assets/pdfs/2020.12.15.tensor-trains-poster.pdf">here (PDF)</a>.</p>
<p>📄 Writeup –&gt; <a href="/assets/pdfs/2020.12.22.tensor-trains-writeup.pdf">here (PDF)</a>.</p>
<p>💻 Code –&gt; <a href="https://github.com/postylem/tensor_network_project">on GitHub</a>.</p>
</description>
<pubDate>Tue, 22 Dec 2020 00:00:00 -0500</pubDate>
<link>https://jahoo.github.io/2020/12/22/training-tensor-trains.html</link>
<guid isPermaLink="true">https://jahoo.github.io/2020/12/22/training-tensor-trains.html</guid>
<category>presentation</category>
</item>
</channel>
</rss>