-
Notifications
You must be signed in to change notification settings - Fork 44
/
Copy pathindex.html
1071 lines (938 loc) · 48.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<!-- START: two lines up starts content from: inc/top.default.html -->
<!-- END: this line ends content from: inc/top.default.html -->
<!-- START: this line starts content from: inc/head.default.html -->
<head>
<link rel="stylesheet" href="../../ioccc.css">
<link href="https://fonts.googleapis.com/css2?family=Outfit:wght@100..900&display=swap" rel="stylesheet">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>2019/mills - Most in need to be tweeted</title>
<link rel="icon" type="image/x-icon" href="../../favicon.ico">
<meta name="description" content="2019 IOCCC entry mills - Most in need to be tweeted">
<meta name="keywords" content="IOCCC, 2019, IOCCC 2019, IOCCC entry, mills, Most in need to be tweeted">
</head>
<!-- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -->
<!-- !!! DO NOT MODIFY THIS FILE - This file is generated by a tool !!! -->
<!-- !!! DO NOT MODIFY THIS FILE - This file is generated by a tool !!! -->
<!-- !!! DO NOT MODIFY THIS FILE - This file is generated by a tool !!! -->
<!-- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -->
<!-- END: this line ends content from: inc/head.default.html -->
<!-- -->
<!-- This web page was formed via the tool: bin/readme2index.sh -->
<!-- The content of main section of this web page came from: 2019/mills/README.md -->
<!-- -->
<!-- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -->
<!-- !!! Do not modify this web page, instead modify the file: 2019/mills/README.md !!! -->
<!-- !!! Do not modify this web page, instead modify the file: 2019/mills/README.md !!! -->
<!-- !!! Do not modify this web page, instead modify the file: 2019/mills/README.md !!! -->
<!-- !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -->
<!-- Markdown content was converted into HTML via the tool: bin/md2html.sh -->
<!-- START: this line starts content from: inc/body.default.html -->
<body>
<!-- END: this line ends content from: inc/body.default.html -->
<!-- START: this line starts content from: inc/topbar.default.html -->
<div class="theader">
<nav class="topbar">
<div class="container">
<div class="logo">
<a href="../../index.html" class="logo-link">
IOCCC
</a>
</div>
<div class="topbar-items">
<div class="item">
<span class="item-header">
Entries
</span>
<div class="sub-item">
<div class="outfit-font">
<a href="../../years.html" class="sub-item-link">
Winning entries
</a>
</div>
<div class="outfit-font">
<a href="../../authors.html" class="sub-item-link">
Winning authors
</a>
</div>
<div class="outfit-font">
<a href="../../location.html" class="sub-item-link">
Location of authors
</a>
</div>
<div class="outfit-font">
<a href="../../bugs.html" class="sub-item-link">
Bugs and (mis)features
</a>
</div>
<div class="outfit-font">
<a href="../../faq.html#fix_an_entry" class="sub-item-link">
Fixing entries
</a>
</div>
<div class="outfit-font">
<a href="../../faq.html#fix_author" class="sub-item-link">
Updating author info
</a>
</div>
</div>
</div>
<div class="item">
<span class="item-header">
Status
</span>
<div class="sub-item">
<div class="outfit-font">
<a href="../../news.html" class="sub-item-link">
News
</a>
</div>
<div class="outfit-font">
<a href="../../status.html" class="sub-item-link">
Contest status
</a>
</div>
<div class="outfit-font">
<a href="../../next/index.html" class="sub-item-link">
Rules and guidelines
</a>
</div>
<div class="outfit-font">
<a href="../../markdown.html" class="sub-item-link">
Markdown guidelines
</a>
</div>
<div class="outfit-font">
<a href="../../SECURITY.html" class="sub-item-link">
Security policy
</a>
</div>
</div>
</div>
<div class="item">
<span class="item-header">
FAQ
</span>
<div class="sub-item">
<div class="outfit-font">
<a href="../../faq.html" class="sub-item-link">
Frequently Asked Questions
</a>
</div>
<div class="outfit-font">
<a href="../../quick-start.html#enter" class="sub-item-link">
Enter the IOCCC
</a>
</div>
<div class="outfit-font">
<a href="../../faq.html#compiling" class="sub-item-link">
Compiling entries
</a>
</div>
<div class="outfit-font">
<a href="../../faq.html#running_entries" class="sub-item-link">
Running entries
</a>
</div>
<div class="outfit-font">
<a href="../../faq.html#help" class="sub-item-link">
How to help
</a>
</div>
</div>
</div>
<div class="item">
<span class="item-header">
About
</span>
<div class="sub-item">
<div class="outfit-font">
<a href="../../index.html" class="sub-item-link">
Home page
</a>
</div>
<div class="outfit-font">
<a href="../../about.html" class="sub-item-link">
About the IOCCC
</a>
</div>
<div class="outfit-font">
<a href="../../judges.html" class="sub-item-link">
The Judges
</a>
</div>
<div class="outfit-font">
<a href="../../thanks-for-help.html" class="sub-item-link">
Thanks for the help
</a>
</div>
<div class="outfit-font">
<a href="../../contact.html" class="sub-item-link">
Contact us
</a>
</div>
</div>
</div>
</div>
</div>
</nav>
<div class="header-mobile-menu">
<noscript>
<a href="../../nojs-menu.html" class="topbar-js-label">
Please Enable JavaScript
</a>
</noscript>
<button id="header-open-menu-button" class="topbar-mobile-menu">
<img
src="../../png/hamburger-icon-open.png"
alt="hamburger style menu icon - open state"
width=48
height=48>
</button>
<button id="header-close-menu-button" class="hide-content">
<img
src="../../png/hamburger-icon-closed.png"
alt="hamburger style menu icon - closed state"
width=48
height=48>
</button>
<div id="mobile-menu-panel" class="hide-content">
<div class="mobile-menu-container">
<div class="mobile-menu-wrapper">
<div class="mobile-menu-item">
Entries
</div>
<div class="mobile-submenu-wrapper">
<a class="mobile-submenu-item" href="../../years.html">
Winning entries
</a>
<a class="mobile-submenu-item" href="../../authors.html">
Winning authors
</a>
<a class="mobile-submenu-item" href="../../location.html">
Location of authors
</a>
<a class="mobile-submenu-item" href="../../bugs.html">
Bugs and (mis)features
</a>
<a class="mobile-submenu-item" href="../../faq.html#fix_an_entry">
Fixing entries
</a>
<a class="mobile-submenu-item" href="../../faq.html#fix_author">
Updating author info
</a>
<a class="mobile-submenu-item" href="../../thanks-for-help.html">
Thanks for the help
</a>
</div>
</div>
<div class="mobile-menu-wrapper">
<div class="mobile-menu-item">
Status
</div>
<div class="mobile-submenu-wrapper">
<a class="mobile-submenu-item" href="../../news.html">
News
</a>
<a class="mobile-submenu-item" href="../../status.html">
Contest status
</a>
<a class="mobile-submenu-item" href="../../next/index.html">
Rules and guidelines
</a>
<a class="mobile-submenu-item" href="../../markdown.html">
Markdown guidelines
</a>
<a class="mobile-submenu-item" href="../../SECURITY.html">
Security policy
</a>
</div>
</div>
<div class="mobile-menu-wrapper">
<div class="mobile-menu-item">
FAQ
</div>
<div class="mobile-submenu-wrapper">
<a class="mobile-submenu-item" href="../../faq.html">
Frequently Asked Questions
</a>
<a class="mobile-submenu-item" href="../../quick-start.html#enter">
Enter the IOCCC
</a>
<a class="mobile-submenu-item" href="../../faq.html#compiling">
Compiling entries
</a>
<a class="mobile-submenu-item" href="../../faq.html#running_entries">
Running entries
</a>
<a class="mobile-submenu-item" href="../../faq.html#help">
How to help
</a>
</div>
</div>
<div class="mobile-menu-wrapper">
<div class="mobile-menu-item">
About
</div>
<div class="mobile-submenu-wrapper">
<a class="mobile-submenu-item" href="../../index.html">
Home page
</a>
<a class="mobile-submenu-item" href="../../about.html">
About the IOCCC
</a>
<a class="mobile-submenu-item" href="../../judges.html">
The Judges
</a>
<a class="mobile-submenu-item" href="../../contact.html">
Contact us
</a>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
var headerOpenMenuButton = document.getElementById("header-open-menu-button");
var headerCloseMenuButton = document.getElementById("header-close-menu-button");
var mobileMenuPanel = document.getElementById("mobile-menu-panel");
headerOpenMenuButton.addEventListener("click", () => {
headerOpenMenuButton.classList.remove("topbar-mobile-menu");
headerOpenMenuButton.classList.add("hide-content");
headerCloseMenuButton.classList.remove("hide-content");
headerCloseMenuButton.classList.add("topbar-mobile-menu");
mobileMenuPanel.classList.remove("hide-content");
mobileMenuPanel.classList.add("topbar-mobile-panel");
});
headerCloseMenuButton.addEventListener("click", () => {
headerCloseMenuButton.classList.remove("topbar-mobile-menu");
headerCloseMenuButton.classList.add("hide-content");
mobileMenuPanel.classList.add("hide-content");
mobileMenuPanel.classList.remove("topbar-mobile-panel");
headerOpenMenuButton.classList.add("topbar-mobile-menu");
headerOpenMenuButton.classList.remove("hide-content");
});
</script>
<!-- END: this line ends content from: inc/topbar.default.html -->
<!-- START: this line starts content from: inc/header.default.html -->
<div class="header">
<a href="../../2011/zucker/index.html">
<img src="../../png/ioccc.png"
alt="IOCCC image by Matt Zucker"
width=300
height=110>
</a>
<h1>The International Obfuscated C Code Contest</h1>
<h2>2019/mills - Most in need to be tweeted</h2>
<h3>Machine Learning on text</h3>
</div>
<!-- END: this line ends content from: inc/header.default.html -->
<!-- START: this line starts content from: inc/navbar.mid.html -->
<div class="navbar">
<a class="Left" href="../lynn/index.html">← 2019/lynn</a>
<a class="Left" href="../index.html">↑ 2019 ↑</a>
<a class="Left" href="../poikola/index.html">2019/poikola →</a>
<a class="Right" href="https://github.com/ioccc-src/winner/blob/master/2019/mills/prog.c">C code</a>
<a class="Right" href="https://github.com/ioccc-src/winner/blob/master/2019/mills/Makefile">Makefile</a>
<a class="Right" href="#inventory">Inventory</a>
<a class="Right" href="https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.ioccc.org%2F2019%2Fmills%2Findex.html">✓</a>
</div>
<!-- END: this line ends content from: inc/navbar.mid.html -->
<!-- START: this line starts content from: inc/before-content.default.html -->
<div class="content" id="content">
<!-- END: this line ends content from: inc/before-content.default.html -->
<!-- START: this line starts content for HTML phase 20 by: bin/output-index-author.sh via bin/md2html.sh -->
<!-- START: this line starts content generated by: bin/output-index-author.sh -->
<h2 id="author">Author:</h2>
<ul>
<li>Name: <a href="../../authors.html#Christopher_Mills">Christopher Mills</a><br>
Location: <a href="../../location.html#US">US</a> - <em>United States of America</em> (<em>United States</em>)</li>
</ul>
<!-- END: next line ends content generated by: bin/output-index-author.sh -->
<!-- END: this line ends content for HTML phase 20 by: bin/output-index-author.sh via bin/md2html.sh -->
<!-- START: this line starts content for HTML phase 21 by: bin/pandoc-wrapper.sh via bin/md2html.sh -->
<!-- BEFORE: 1st line of markdown file: 2019/mills/README.md -->
<h2 id="to-build">To build:</h2>
<pre><code> make</code></pre>
<h3 id="bugs-and-misfeatures">Bugs and (Mis)features:</h3>
<p>The current status of this entry is:</p>
<blockquote>
<p><strong>STATUS: INABIAF - please DO NOT fix</strong></p>
</blockquote>
<p>For more detailed information see <a href="../../bugs.html#2019_mills">2019/mills in bugs.html</a>.</p>
<h2 id="to-use">To use:</h2>
<pre><code> make cpclean
# Let this run for about about an hour and then kill it:
./prog Shakespeare.txt</code></pre>
<h2 id="try">Try:</h2>
<pre><code> ./try.sh</code></pre>
<p>However, as the binary model files used to produce the output are in an
implementation-specific format, your mileage may vary.</p>
<h2 id="judges-remarks">Judges’ remarks:</h2>
<p>Can a machine learn?</p>
<p>Some say so.</p>
<p>But can a machine learn to write like Shakespeare? Can it write rules and
guidelines for the IOCCC?</p>
<p>You decide. :-)</p>
<p><strong>Historic note</strong>: The award title use of the word “<strong>tweeted</strong>”
should be regarded as a IOCCC anachronism. Over the years the
maximum size of a <em>tweet</em> changed since this entry won the IOCCC.
Moreover, the <strong>IOCCC uses Mastodon</strong>.
instead of whatever someone (and especially those who appear to
have poor impulse control) chooses to call the platform where people
used to tweet.</p>
<p>See the
FAQ on “<a href="../../faq.html#try_mastodon">Mastodon</a>”.</p>
<h2 id="authors-remarks">Author’s remarks:</h2>
<h3 id="welcome-to-omlet">Welcome to OMLET! 🍳:</h3>
<p>OMLET is the <em>Obfuscated Machine Learning Environment Toolkit</em>, a
micro-framework for experimenting with <a href="https://en.wikipedia.org/wiki/Recurrent_neural_network">recurrent neural networks</a> (RNN).
OMLET lets you build, train and evaluate <a href="https://en.wikipedia.org/wiki/Deep_learning">deep neural networks</a> (DNN). Why
invest hours reading documentation and megabytes of disk space on a
full-featured DNN framework like <a href="https://www.tensorflow.org">TensorFlow</a> or <a href="http://torch.ch">Torch</a> when you can have
full RNN functionality in less than 4 KB!</p>
<p>OMLET has the following features:</p>
<ul>
<li>User-programmable network configurations and hyperparameters.</li>
<li>Support for various types of recurrent and feed-forward neural networks
including vanilla RNNs, LSTMs and GRUs of depths of up to 99 layers.</li>
<li>No limit on parameter size (except for those imposed by the system).</li>
<li>Training and inference modes, with periodic checkpointing.</li>
<li>Advanced <a href="https://arxiv.org/abs/1412.6980">Adam</a> optimizer with <a href="https://www.fast.ai/2018/07/02/adam-weight-decay/">weight decay</a> for simplified
training.</li>
<li>Hyperparameters support for batch sizing, learning rate schedule, weight
decay and gradient clipping.</li>
<li>Easily extensible (requires some expertise in the C programming language).</li>
<li>Friendly markdown documentation.</li>
</ul>
<p>OMLET is based on <a href="https://karpathy.ai">Andrej Karpathy’s</a> <a href="https://github.com/karpathy/char-rnn">character-level language model</a>
as described in his blog post <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/"><em>The Unreasonable Effectiveness of Recurrent
Neural Networks</em></a>. I’ve included a small sample dataset to use for
training, but you can have even more fun by downloading some larger datasets:</p>
<ul>
<li><a href="https://cs.stanford.edu/people/karpathy/char-rnn/linux_input.txt">Linux kernel source</a></li>
<li><a href="https://cs.stanford.edu/people/karpathy/char-rnn/warpeace_input.txt"><em>War and Peace</em></a></li>
<li><a href="https://github.com/ryanmcdermott/trump-speeches">Donald Trump’s 2016 campaign speeches</a></li>
</ul>
<h3 id="getting-started-with-omlet">Getting started with OMLET</h3>
<p>OMLET has three operating modes:</p>
<ul>
<li><em>Training</em>, which trains a network from scratch.</li>
<li><em>Continuation</em>, which trains starting from an existing checkpoint.</li>
<li><em>Inference</em>, which uses an existing trained network to make predictions.</li>
</ul>
<p>For your first OMLET experiment, we will try training a simple single-level
RNN to write some <a href="https://en.wikipedia.org/wiki/William_Shakespeare">Shakespeare</a> plays. Start by typing</p>
<pre><code> make</code></pre>
<p>After it builds, train it using</p>
<pre><code> ./prog Shakespere.txt</code></pre>
<p>This will immediately start outputting gibberish to the output, e.g.</p>
<pre><code> ./prog Shakespere.txt</code></pre>
<p>produces:</p>
<pre><code> sins ohennAu
T-teooclelp tiThoWy
g
nlakuafy
e
sselW usnsofueB Aoee pasfUsuslhe ooM ot Wou moy
me neltAl -no IoyI mhuyakse inT-l chu ghenn ffo? fnsoe yhyye
ue nnfrlass heUthole saounlcesyee pee
t,
T0:0% 3.210888
o',,vU
An ,hTf lnm Far rur:s moilt WoEgrv wonds mith Aog thernw
Rni So
co Nnd :
For an bImy pgafoun:
Wf'r hom wortiverita
int fod mous Eheledet,
Tho he theket nonS wnu-ang dorlaMSp
nrocWiSe tflg 'o.
T0:0% 2.995950
d whecedhencrysesil yr bn,
we hh y thiwt
hut fithlot,
Fmdy s he alt
Vh th no dh foud bobt werw:s Aotnf Fhwi't whe, eusu
lhh thele wewcond ary soupfy wind tDont couc ths:
er fucwald oncli hen bos, f
T0:1% 2.878945</code></pre>
<p>The gibberish is the networks attempt to write a play. So far, it’s not very
successful! Between the chunks of gibberish are training progress reports
that look like</p>
<pre><code> T0:1% 2.878945</code></pre>
<p>This tells you that we are 1% through training epoch 0, and that the training
<a href="https://en.wikipedia.org/wiki/Loss_function">loss</a> is about 2.88. As the training continues, the training loss will
reduce and the generated snippets will improve quickly:</p>
<pre><code> ses, kuth
LAs of the wish,
As, I nos you,
Yov not to nalll,
Tr tot wonds.
First Sondy, llt lrte, our, tw.
First.
BRUTUS:
Helsting the kith gops of hoch Whay, fars surd what to,
The cownens golt te.
T0:12% 2.259114
eplerrotrur tandans one wiok thy or thach and cullice ded yourssting
And wours:
Whed ur surt.
SINENIUS:
On we lain bith
rerytund: tich lon hyivetetgor.
VOLUONIA:
He brich nom dove worthan then wise,
T0:13% 2.254926</code></pre>
<p>It’s already started to figure out things about Shakespeare plays – how to
spell short English words, how long lines tend to be, and that characters take
turns speaking with their names capitalized. The training loss has dropped to
2.25 and the improvement is noticeable.</p>
<p>Eventually, we will finish with the training data set and move on to a
validation cycle:</p>
<pre><code> y fath onother,
I sucess,
For I me the west crare.
TRANIO:
Whow and'd have not to had you you one in my lapteny she very ame come me a gut and shourd aghir you as ignested; shend to make I strem
To h
T0:99% 1.924063
notnce gaud and is nicked thou day, ha the dusing you disaid: in thim, you things in ere thee thus erile Iht that tare theme my hast thesp thou shay: thou not eaten-or-ho-bess resing: I the but had d
T0:99% 1.923128
V0:0% 1.678907
V0:9% 1.700527
V0:18% 1.733179
V0:27% 1.714891
V0:36% 1.716672
V0:45% 1.782946
V0:54% 1.835629
V0:63% 1.876108
V0:72% 1.906814
V0:81% 1.924096
V0:90% 1.954492
V0:99% 1.969287
serfs you'll alliencseard:
We
got you? before
I say.
Farstred dentlentecaly, sir, I it one bosticield
All me the backnour mino,
Whith capitaned mid! but stell the ifvemion
Willerity.
First Cumfol of
T1:0% 1.885619</code></pre>
<p>Validation cycles are used to test the network to see if it has learned to
generalize – how well it performs on data it hasn’t seen before (as opposed to
the training data that the network will see many times as it trains). Progress
on the validation set is also displayed with a validation progress report that
looks like</p>
<pre><code> V0:36% 1.716672</code></pre>
<p>which means we are 36% of the way through the validation for epoch 0, and the
validation loss is about 1.72. Comparing the validation loss and the
training loss will give you an idea of how well the network is learning and
can let you know if the network is <a href="https://en.wikipedia.org/wiki/Overfitting">overfitting or underfitting</a>.</p>
<p>As part of the training process, the data set, which for OMLET is a
file you gave on the command line, is divided into
<a href="https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets">training and validation sets</a> (by default, 95% of the data is used for
training, but like most OMLET parameters, you can change this at compile-time).</p>
<p>For examples of input text files, try:</p>
<pre><code> make text</code></pre>
<p>and then examine the resulting <code>IOCCC-hints.output.txt</code>
<code>Shakespeare.output.txt</code> <code>IOCCC-Rules-Guidelines.output.txt</code>
<code>Eugene_Onegin.output.txt</code> temporary files.</p>
<p>At the end of the validation run, OMLET writes out a checkpoint file with a
name like <code>cp01_1.970</code>. This saves the state of the run at the start of the
epoch 1, after computing a validation loss of 1.970. The checkpoint is
helpful if you need to stop and restart training. You can stop training by
doing <code>Control-C</code>.</p>
<p>You can continue training from a previous checkpoint by providing the
checkpoint file name as the second parameter, for example:</p>
<pre><code> ./prog Shakespeare.txt cp01_1.970</code></pre>
<p>After the validation cycle finishes, OMLET begins the next epoch by restarting
training at the beginning of the training set. Training continues forever,
until you quit it with <code>Control-C</code>. You should monitor the checkpoints to
see that the validation loss continues to drop. If it rises, the network has
probably started to overfit on the training data.</p>
<p>Once you’ve trained the validation loss as low as it will go, you can use
OMLET to run the network in inference mode which uses the frozen checkpoint
parameters to generate data. Inference mode takes the checkpoint file as
<em>standard input</em> (not on the command line) and hence must be run with a
command like</p>
<pre><code> ./prog < cp55_1.807</code></pre>
<p>Running it produces an infinite amount of generated output, until you hit
<code>Control-C</code> to stop it.</p>
<p>Note that if you decide to change networks or use a different input file, you
will want to delete all the checkpoint files because the format depends on
both the network and the input – using the wrong checkpoint is likely to
cause a crash.</p>
<h3 id="experimenting-with-different-networks">Experimenting with different networks</h3>
<p>The default network for OMLET (the one you get if you type <code>make</code> with the
executable name <code>prog</code>) is the simplest recurrent neural network. It looks
like</p>
<pre><code> h = tanh(Wxh * x + Whh * h' + Bh)
y = Why * h + By</code></pre>
<p>where</p>
<ul>
<li><code>x</code> is the <em>input vector</em></li>
<li><code>y</code> is the <em>output vector</em></li>
<li><code>h</code> is the <em>hidden state vector</em></li>
<li><code>h'</code> is the previous value of <code>h</code></li>
<li><code>Wxh</code>, <code>Whh</code>, and <code>Why</code> are <em>weight matrices</em></li>
<li><code>Bh</code> and <code>By</code> are <em>bias vectors</em>.</li>
<li><code>tanh(3)</code> is the <a href="https://en.wikipedia.org/wiki/Hyperbolic_function">hyperbolic tangent function</a></li>
</ul>
<p>The <code>W</code>’s and <code>B</code>’s are the trainable parameters of the network, and the
process of training is optimizing the values of these parameters to minimize
the loss of the network across the training set.</p>
<p>It is the presence of the hidden state vector that allows the RNN to
“remember” the past. We can see what would happen if we removed this hidden
state. If you type</p>
<pre><code> make lin1</code></pre>
<p>OMLET will create an <a href="https://en.wikipedia.org/wiki/ADALINE">ADALINE</a> network that does</p>
<pre><code> y = Wxy * x + By</code></pre>
<p>This is a simple <a href="https://en.wikipedia.org/wiki/Linear_map">linear</a> <a href="https://en.wikipedia.org/wiki/Feedforward_neural_network">feed-forward network</a>. You can run it with</p>
<pre><code> ./lin1 Shakespeare.txt</code></pre>
<p>The linear network won’t be able to get past the gibberish stage, because it
lacks history:</p>
<pre><code> ./lin1 Shakespeare.txt</code></pre>
<p>produces:</p>
<pre><code> UERond w,
Gir:
KINof s, mesther s thouth.
E:
KINTret, at fu,
GOMy t, as sth kesewit sooos atse ang k, ck,
Sotheouserivesthecowhet been's, t he, h nre; t and, har wiread of pincer cedst sur has, ut:
T14:67% 2.465115
UKESpan,
NGaromy soreate e m esewfoure pamitherarjulthengeoly tl.
NG s at e! w.
WAllinoully?
Wamisw ofilem:
I'delandinarrstath har aksubly s cath Whern t Is, weciss:
GLat s; llde.
Y aterit dsthence
T14:67% 2.465404</code></pre>
<p>It is able to guess at what character is likely to follow the current one
(by doing a <a href="https://en.wikipedia.org/wiki/Linear_regression">linear regression</a>), but it lacks any history beyond that to
guide it.</p>
<p>You might be wondering about the role of the <code>tanh(3)</code> function in the RNN.
<code>tanh(3)</code> acts as an <a href="https://en.wikipedia.org/wiki/Activation_function">activation function</a> which adds <a href="https://en.wikipedia.org/wiki/Nonlinear_system">nonlinearity</a>
to the network and allows it to solve complicated problems. Without
nonlinearity, all of the linear functions would fold together into a single
matrix-vector multiply and you’d effectively regress to the linear network
above. Alas, even adding a nonlinearity to the feed-forward network
(creating a <a href="https://en.wikipedia.org/wiki/Perceptron">perceptron</a>) does not improve the performance because we
still lack the history provided by the hidden state vector (although if you
want to try it yourself, you can do so with <code>make per1</code> - note that’s not <code>perl</code>
the language but <code>per</code> with the digit 1).</p>
<h3 id="going-deeper">Going deeper:</h3>
<p>We can try to improve the RNN’s performance by stacking RNN modules atop each
other:</p>
<pre><code> h1 = RNN(h1', x)
h2 = RNN(h2', h1)
y = Why * h2 + By</code></pre>
<p>with <code>RNN(h, x)</code> defined as above. Each RNN module has its own set of
parameters and its own hidden state vector. This will improve the network’s
performance, at the cost of a much larger parameter space.</p>
<hr style="width:10%;text-align:left;margin-left:0">
<p><strong>IMPORTANT NOTE</strong>:</p>
<p>Since OMLET uses the system stack for network storage,
larger networks may cause OMLET to crash (typically with a message like
Segmentation fault) unless the system stack size is first increased.
The exact command for doing so depends on your shell and your system’s
hard limits. On sh/ksh/bash shells, you can view the hard limit
with ulimit -Hs and set it with ulimit -s 65532 (replacing 65532
with the actual hard limit). On csh/tcsh shells, you can view
the hard limit with limit -h stacksize and set it with
limit stacksize 65532 (replacing 65532 with the actual hard limit).</p>
<hr style="width:10%;text-align:left;margin-left:0">
<p>You can try the deeper network by doing</p>
<pre><code> make rnn2</code></pre>
<p>(or even <code>make rnn3</code> if you want a three-layer RNN) and train it with</p>
<pre><code> ./rnn2 Shakespeare.txt</code></pre>
<p>The additional depth should allow the network to make better predictions (it
can represent more complicated history), but it may take a long time to train
-- both because the network (being larger) now requires more time to train and
because of the <a href="https://en.wikipedia.org/wiki/Vanishing_gradient_problem">vanishing and exploding gradient problem</a>, which might
keep it from ever reaching its potential.</p>
<h3 id="lstms-and-grus">LSTMs and GRUs</h3>
<p>RNNs are particularly hard to train because they are trained using
<a href="https://en.wikipedia.org/wiki/Backpropagation_through_time">backpropagation through time</a>. The RNN is trained by effectively
converting it into a non-recurrent network by making many copies of it
and propagating the hidden state through the copies. During training,
the backpropagation through many clones of the network amplifies the
gradient, worsening the exploding and vanishing gradient problem.</p>
<p><a href="https://en.wikipedia.org/wiki/Long_short-term_memory">Long Short Term Memory</a> networks (also called LSTMs) were developed to
solve this problem. Christopher Olah gives a good description of them
at his <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">blog posting</a>. You can build a two-level LSTM by doing</p>
<pre><code> make lstm2</code></pre>
<p>and train with it with</p>
<pre><code> ./lstm2 Shakespeare.txt</code></pre>
<p>The LSTM is much easier to train because it explicitly decides how to update
its hidden state via “gates”. These gates are called</p>
<ul>
<li><em>input gate</em>, which decides what part of the input to pay attention to</li>
<li><em>forget gate</em>, which decides what part of the hidden state to forget</li>
<li><em>output gate</em>, which decides what part of the hidden state is used to
produce the output</li>
</ul>
<p>The basic LSTM equations are</p>
<pre><code> f = sigmoid(Wxf * x + Whf * h' + Bf)
i = sigmoid(Wxi * x + Whi * h' + Bi)
o = sigmoid(Wxo * x + Who * h' + Bo)
c = f * c' + i * tanh(Wxc * x + Whc * h' + Bc)
h = o * tanh(c)</code></pre>
<p>Where</p>
<ul>
<li><code>x</code> is the input vector</li>
<li><code>h</code> is the hidden state (and the output to the next layer)</li>
<li><code>c</code> is the <em>cell state</em> which represents the “memory” of the LSTM</li>
<li><code>h'</code> and <code>c'</code> are the previous values of <code>h</code> and <code>c</code> respectively</li>
<li><code>f</code> is the <em>forget gate</em> that tells the LSTM what portion of the hidden
state to forget</li>
<li><code>i</code> is the <em>input gate</em> that tells the LSTM what portion of the input
vector to pay attention to</li>
<li><code>o</code> is the <em>output gate</em> that tells the LSTM what portion of the cell
state to use to generate the hidden state</li>
<li><code>Wxf</code>, <code>Whf</code>, <code>Wxi</code>, <code>Whi</code>, <code>Wxo</code>, <code>Who</code>, <code>Wxc</code>, and <code>Whc</code> are trainable
parameter matrices</li>
<li><code>Bf</code>, <code>Bi</code>, <code>Bo</code>, and <code>Bc</code> are trainable bias vectors</li>
<li><code>tanh(3)</code> is the hyperbolic tangent function</li>
<li><code>sigmoid</code> is the <a href="https://en.wikipedia.org/wiki/Logistic_function">logistic function</a></li>
</ul>
<p>There are several LSTM variants (see <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">C. Olah’s blog post</a> for more
examples). One important one is the <a href="https://en.wikipedia.org/wiki/Gated_recurrent_unit">gated recurrent unit</a>. GRUs are
simplified versions of an LSTM which combine the gates together, meaning they
require fewer learned parameters. This allows them to train faster than a
generic LSTM. You can build a two-layer GRU with</p>
<pre><code> make gru2</code></pre>
<h3 id="building-your-own-networks">Building your own networks</h3>
<p>The OMLET <code>Makefile</code> comes with one-, two- and three-layer RNNs, LSTMs and
GRUs, along with simpler feed-forward networks like multi-layer perceptrons and
a linear network. This isn’t the limit of OMLET’s power – you can create your
own networks by modifying the <code>Makefile</code>. Networks are passed in on the
compiler’s command-line by using <code>-D</code> directives. The network is defined by
a <code>-DNW='...'</code> command which consists of a series of comma-separated
assignments. For example, the simple one-layer RNN could be defined like</p>
<pre><code> -DNW=' x = I(n), hp = I(128), \
h = C(hp, T(A(L(128, x), L(128, hp)))), \
y = L(n, h)'</code></pre>
<p>The network declares <code>x</code> as an input vector (there must be a declaration for
<code>x</code>). It is declared as <code>I(n)</code>, which is an input vector of size <code>n</code>, which
is the number of characters of the input alphabet (OMLET computes this from
the input file at the start of training). OMLET will arrange to present the
input character as a <a href="https://en.wikipedia.org/wiki/One-hot">one-hot</a> vector based on the current input
character.</p>
<p>The second declaration, <code>hp</code>, declares the previous hidden state vector (what
we called <code>h'</code> above). We declare this to be of size 128 – an arbitrary
choice. A larger state vector can (theoretically) carry more state, but at
a cost of larger parameter matrices and longer training time. You can
experiment with increasing the hidden vector size and see.</p>
<p>The third line is the core of the RNN. It sets <code>h</code>, the hidden vector output
to be the sum of two linear elements specified by <code>L</code>. The <code>L</code> function
takes two parameters – the output vector size (which must match the size of
<code>h</code>) and the input vector. <code>L</code> will compute <code>y = W * x + B</code> where each <code>L</code>
has its own <code>W</code> (weight) and <code>B</code> (bias) training parameters. Both <code>x</code> and
<code>hp</code> are sent through <code>L</code> and the result passed through the <code>A</code> function,
which does vector addition. That result is passed through <code>T</code> which does
element-wise <code>tanh(3)</code> activation.</p>
<p>Next, we wrap the whole thing with the <code>C</code> function. <code>C</code> connects <code>hp</code>
with <code>h</code>, causing the new value of <code>h</code> to be passed to the <code>hp</code> vector on
the next iteration of the algorithm (allowing the RNN to retain state in <code>h</code>).</p>
<p>Finally, the whole result is passed through another instance of <code>L</code>, this time
producing a vector of size <code>n</code>, which will have the negative log
<a href="https://en.wikipedia.org/wiki/Likelihood_function">likelihood</a> function. This is assigned to <code>y</code>, which is the output of
the network (and hence must also be declared).</p>
<p>OMLET will take the <code>y</code> result and pass it through the <a href="https://en.wikipedia.org/wiki/Softmax_function">softmax</a> function,
which converts the log probabilities into a probability distribution. In
inference mode, this is used to select the next character to emit. In
training mode, this is used to generate the loss which is backpropagated.</p>
<p>As an example of a more complicated network, we can look at a two-layer
GRU network:</p>
<pre><code> -DHS=128, \
-DNW=' x = I(n), \
y = L(n, MD(MD(x)))' \
-DBK=' hp = I(HS), \
z = S(A(L(HS, x), L(HS, hp))), \
r = S(A(L(HS, x), L(HS, hp))), \
c = T(A(L(HS, x), L(HS, hp))), \
zc = OG(1, -1, z), \
h = C(hp, A(M(zc, hp), M(z, c))), \
y = h'</code></pre>
<p>We are using a few new tricks here – first, we are defining <code>HS</code> as the size
of the hidden and cell vectors. There’s nothing special about this name, it’s
just convenient to specify it so we don’t have a bunch of constants in the
code. Second, the network itself is very simple – it declares <code>x</code> and has
the matrix that converts the <code>HS</code>-sized hidden vector back to the <code>n</code>-sized
alphabet vector… but it now calls <code>MD</code>, which is the user-definable
module (here we are using it twice, to have two cascaded GRU blocks). The
<code>MD</code> function performs the sub-network defined by the <code>BK</code> compile-time
parameter (specified in the <code>-DBK='...'</code> setting). This sub-module again
takes an <code>x</code> parameter and produces a <code>y</code> output. Inside it, we declare
<code>hp</code> and <code>h</code>, the previous and current state vector, plus equations for the
various GRU gates (these use <code>S</code> for the sigmoid activation function).
One final new call is <code>OG</code> which does offset and gain, performing
<code>y = offset + gain * x</code> where <code>offset</code> (the first parameter) and <code>gain</code> (the
second) are constants. We are using this here to compute <code>(1 - z)</code> for the
GRU’s linear interpolator.</p>
<p>The full set of available function blocks follows:</p>
<ul>
<li><code>I(s)</code>: declares a vector (input or state) of size <code>s</code></li>
<li><code>L(s, x)</code>: learnable linear function <code>y = W * x + B</code> with an output vector
size of <code>s</code></li>
<li><code>CM(x)</code>: learnable element-wise gain function <code>y = W * x</code></li>
<li><code>A(a, b)</code>: element-wise add: <code>y = a + b</code></li>
<li><code>M(a, b)</code>: element-wise multiply: <code>y = a * b</code></li>
<li><code>S(x)</code>: sigmoid activation function <code>y = sigmoid(x)</code></li>
<li><code>T(x)</code>: hyperbolic tangent activation function <code>y = tanh(x)</code></li>
<li><code>C(xp, x)</code>: copy <code>x</code> to <code>xp</code> in the next time step (propagate through
time)</li>
<li><code>OG(o, g, x)</code>: apply a constant offset and gain: <code>y = o + g * x</code></li>
<li><code>MD(x)</code>: apply the sub-network specified by <code>BK</code></li>
</ul>
<p>Note: even if you don’t use <code>MD</code> in your network, you should still define
<code>BK</code> by adding <code>-DBK='y=x'</code> to the command line, otherwise you will get a
compile-time error.</p>
<h3 id="hyperparameters">Hyperparameters</h3>
<p>OMLET has a large number of training and inference parameters which can be
changed by the user. All of these are set by <code>-D</code> on the compile command line.
The list of hyperparameters follows:</p>
<ul>
<li><code>TP</code>: Temperature parameter for use in inference mode. This divides the
log probabilities before softmax. A low temperature makes the model choose
safer but more boring choices. A high temperature takes more risks but
makes more mistakes. Default is 1.0, which uses the computed
probabilities.</li>
<li><code>N</code>: Batch size. This is the number of times the RNN is unrolled, so it
controls how far back in the past the RNN can see. The default is 50.
Larger batch sizes update the weights less frequently and allow the RNN to
see farther back in time, but at a cost of proportionally more memory.</li>
<li><code>TR</code>: The percentage of batches in the input data set that will be used
for training. The default value of 0.95 sets this as 95%.</li>
<li><code>LR</code>: The initial learning rate, the default is 0.002.</li>
<li><code>LE</code>: The epoch where the learning rate will start decaying. Defaults to
epoch 10.</li>
<li><code>LD</code>: Learning rate decay, per epoch (after <code>LE</code> epochs). The learning
rate is scaled by this number. Default is 0.97</li>
<li><code>WD</code>: Weight-decay parameter, to promote <a href="https://en.wikipedia.org/wiki/Regularization_(mathematics)">regularization</a>. The default
is 0.00008.</li>
<li><code>RS</code>: The random scale for weight initialization. Weight parameters will
be initialized to be between <code>-RS</code> to <code>+RS</code>. The default is 0.15.</li>
<li><code>CL</code>: Clamp value for gradients. Gradients will be limited to the range
<code>-CL</code> to <code>+CL</code>. Default is 5.</li>
<li><code>B1</code>: Momentum mean parameter for Adam optimizer. Set to 0.9.</li>
<li><code>B2</code>: Momentum variance parameter for Adam optimizer, Set to 0.999.</li>
<li><code>EP</code>: Epsilon parameter for Adam, to provide numerical stability. Set to
0.00000001.</li>
<li><code>DI</code>: How often to print a training or validation progress message and
inference snippet. The default prints every 100 training batches.</li>
<li><code>SL</code>: The number of characters to print when doing an inference snippet.
Default is 200.</li>
<li><code>PF</code>: Format string for the checkpoint filename. The default is
<code>"cp%02d_%.3f"</code> which includes the epoch and validation loss. You may
wish to add a subdirectory to the name to keep checkpoint files out of
the current directory.</li>
</ul>
<!--
Copyright © 1984-2024 by Landon Curt Noll. All Rights Reserved.
You are free to share and adapt this file under the terms of this license:
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
For more information, see:
https://creativecommons.org/licenses/by-sa/4.0/
-->
<!-- AFTER: last line of markdown file: 2019/mills/README.md -->
<!-- END: this line ends content for HTML phase 21 by: bin/pandoc-wrapper.sh via bin/md2html.sh -->
<!-- START: this line starts content for HTML phase 22 by: bin/output-index-inventory.sh via bin/md2html.sh -->
<!-- START: this line starts content generated by: bin/output-index-inventory.sh -->
<div id="inventory">
<h1 id="inventory-for-2019mills">Inventory for 2019/mills</h1>
</div>
<h2 id="primary-files">Primary files</h2>
<ul>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/prog.c">prog.c</a> - entry source code</li>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/Makefile">Makefile</a> - entry Makefile</li>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/prog.orig.c">prog.orig.c</a> - original source code</li>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/try.sh">try.sh</a> - script to try entry</li>
</ul>
<h2 id="secondary-files">Secondary files</h2>
<ul>
<li><a href="2019_mills.tar.bz2">2019_mills.tar.bz2</a> - download entry tarball</li>
<li><a href="Eugene_Onegin.cp11_1.188">Eugene_Onegin.cp11_1.188</a> - training data from Eugene Onegin poetry</li>
<li><a href="Eugene_Onegin.txt.gz">Eugene_Onegin.txt.gz</a> - compressed Eugene Onegin poetry</li>
<li><a href="IOCCC-Rules-Guidelines.cp98_0.175">IOCCC-Rules-Guidelines.cp98_0.175</a> - training data from IOCCC 1984-2019 rules guidelines</li>
<li><a href="IOCCC-Rules-Guidelines.txt.gz">IOCCC-Rules-Guidelines.txt.gz</a> - compressed IOCCC rules and guidelines 1984-2019</li>
<li><a href="IOCCC-hints.cp09_1.809">IOCCC-hints.cp09_1.809</a> - training data from IOCCC hints files text for 1984-2018</li>
<li><a href="IOCCC-hints.txt.gz">IOCCC-hints.txt.gz</a> - compressed IOCCC hints files for 1984-2018</li>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/README.md">README.md</a> - markdown source for this web page</li>
<li><a href="Shakespeare.cp04_1.633">Shakespeare.cp04_1.633</a> - training data from Shakespeare text</li>
<li><a href="Shakespeare.txt.gz">Shakespeare.txt.gz</a> - compressed Shakespeare text</li>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/.entry.json">.entry.json</a> - entry summary and manifest in JSON</li>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/.gitignore">.gitignore</a> - list of files that should not be committed under git</li>
<li><a href="https://github.com/ioccc-src/winner/blob/master/2019/mills/.path">.path</a> - directory path from top level directory</li>
<li><a href="index.html">index.html</a> - this web page</li>
</ul>
<hr style="width:10%;text-align:left;margin-left:0">
<h4>
Jump to: <a href="#">top</a>
</h4>
<!-- END: next line ends content generated by: bin/output-index-inventory.sh -->
<!-- END: this line ends content for HTML phase 22 by: bin/output-index-inventory.sh via bin/md2html.sh -->