-
Notifications
You must be signed in to change notification settings - Fork 1
/
gothic.tex
8697 lines (7619 loc) · 546 KB
/
gothic.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% PhD thesis
\documentclass[12pt,twoside,openright]{report}
\usepackage{etex}
\usepackage[all]{xy}
\usepackage{pb-diagram}
\usepackage{amsfonts, amssymb, amsmath}
\usepackage{tikz}
\usepackage{bussproofs}
\usepackage{epsfig}
\usepackage{hyperref}
\usepackage{latexsym,url}
\usepackage{txfonts}
\usepackage{undertilde}
\usepackage[pdf]{pstricks}
\usepackage{pstricks-add, pst-grad, pst-plot}
\usepackage[tiling]{pst-fill}
\usepackage{pstcol}
\usepackage{pdfpages}
\usepackage{xltxtra}
\setmainfont{Gothica Textura Prescissa}
\PassOptionsToPackage{hyphens}{url}\usepackage{hyperref}
\usetikzlibrary{matrix,arrows}
\usetikzlibrary{backgrounds}
\usetikzlibrary{decorations.pathmorphing}
\usetikzlibrary{decorations.pathreplacing}
\usetikzlibrary{decorations.shapes}
\usetikzlibrary{decorations.markings}
\usetikzlibrary{calc}
% PsTricks defaults
\psset{linewidth=0.3pt,dimen=middle}
\psset{xunit=.70cm,yunit=0.70cm}
\tikzset{l/.style={font=\fontsize{8}{8}\selectfont}}
\tikzset{m/.style={font=\fontsize{6}{6}\selectfont}}
\tikzset{
electron/.style={postaction=decorate, decoration={markings, mark=at position 0.55 with {\arrow{triangle 45}}}}, photon/.style={decorate, decoration={coil, aspect=0}}
}
\definecolor{salmon}{rgb}{1,0.47,0.425}
%\definecolor{purple}{rgb}{0.5,0,0.5}
\textwidth = 6.5 in
\textheight = 9 in
\oddsidemargin = 0.0 in
\evensidemargin = 0.0 in
\topmargin = 0.0 in
\headheight = 0.0 in
\headsep = 0.0 in
\parskip = 0.1in
\parindent = 0.0in
% theorem environments
\newtheorem{thm}{Theorem}
\newtheorem{definition}[thm]{Definition}
\newtheorem{defn}{Definition}
\newtheorem{lemma}{Lemma}
%famous sets
\newcommand{\RR}{\mathbb{R}}
\newcommand{\QQ}{\mathbb{Q}}
\newcommand{\NN}{\mathbb{N}}
\newcommand{\CC}{\mathbb{C}}
\newcommand{\ZZ}{\mathbb{Z}}
%famous categories
\newcommand{\Cob}{\mathrm{Cob}}
\newcommand{\Hilb}{\mathrm{Hilb}}
\newcommand{\Set}{\mathrm{Set}}
\newcommand{\Fin}{\mathrm{Fin}}
\newcommand{\Cat}{{\rm Cat}}
\newcommand{\Tang}{{\rm Tang}}
\newcommand{\Lie}{{\rm Lie}}
\newcommand{\Grp}{{\rm Grp}}
\newcommand{\CommRing}{{\rm CommRing}}
\newcommand{\Diff}{{\rm Diff}}
\newcommand{\Rep}{{\rm Rep}}
\newcommand{\Vect}{{\rm Vect}}
\newcommand{\Braid}{{\rm Braid}}
\newcommand{\Rel}{{\rm Rel}}
\newcommand{\Super}{{\rm Super}}
\newcommand{\Th}{\mathrm{Th}}
%arrows
\newcommand{\maps}{\colon}
\newcommand{\iso}{\cong}
\newcommand{\isoto}{\stackrel{\sim}{\to}}
\newcommand{\To}{\Rightarrow}
%homs
\newcommand{\lHom}{\vdash}
\newcommand{\lhom}{\multimap}
\newcommand{\rhom}{\leftspoon}
\renewcommand{\hom}{{\rm hom}}
\newcommand{\HOM}[2]{{#1 \lhom #2}}
%products
\newcommand{\tensor}{\otimes}
\newcommand{\x}{\times}
\newcommand{\OTIMES}{{\boldmath$\otimes$ \,}}
\newcommand{\TENSOR}{{\boldmath$\hat{\otimes}$ \,}}
\newcommand{\CIRC}{{\boldmath$\circ$ \,}}
\newcommand{\LAMBDA}{{\boldmath$\lambda$ \,}}
\newcommand{\ab}[1]{\langle #1 \rangle} % "angle brackets"
%morphisms, rules, constants
\newcommand{\id}{{\rm i}}
\newcommand{\Id}{{\rm id}}
\newcommand{\ev}{{\rm ev}}
\newcommand{\coev}{{\rm coev}}
\newcommand{\eval}{{\rm eval}}
\newcommand{\app}{{\rm app}}
\newcommand{\assoc}{{\rm assoc}}
\newcommand{\unassoc}{{\rm unassoc}}
\newcommand{\braid}{{\rm braid}}
\newcommand{\Left}{{\rm left}}
\newcommand{\Right}{{\rm right}}
\newcommand{\unright}{{\rm unright}}
\newcommand{\unleft}{{\rm unleft}}
\newcommand{\Tensor}{{\rm tensor}}
\newcommand{\swap}{{\rm swap}}
\newcommand{\term}{{\rm term}}
\newcommand{\cp}{{\rm cp}}
\newcommand{\vp}{{\rm vp}}
\newcommand{\cut}{{\circ}}
\newcommand{\op}{{\rm op}}
\newcommand{\name}[1]{\ulcorner \! #1 \! \urcorner}
\newcommand{\inv}{${}^{-1}$}
%data types and programs
\newcommand{\plus}{{\rm plus}}
\newcommand{\Times}{{\rm times}}
\newcommand{\double}{{\rm double}}
\newcommand{\increment}{{\rm increment}}
\newcommand{\Day}{{\rm day}}
\newcommand{\Tuesday}{{\rm Tuesday}}
\newcommand{\integer}{{\rm integer}}
\newcommand{\duplicate}{{\rm duplicate}}
\newcommand{\curry}{{\rm curry}}
\newcommand{\uncurry}{{\rm uncurry}}
\newcommand{\compose}{{\rm compose}}
%chemical elements
\renewcommand{\H}{H}
\renewcommand{\O}{O}
%formatting commands
\newcommand{\apos}{{\boldmath${}'$}}
\newcommand{\cent}[1]{\begin{center} #1 \end{center}}
\newcommand{\al}[1]{\begin{align} #1 \end{align}}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\newcommand {\tuple}[1]{\langle #1 \rangle}
\newcommand{\di}[1]{\[\begin{diagram}#1\end{diagram}\]}
\newcommand{\xeq}[1]{{\displaystyle \mathop{=}_{#1}}}
\renewcommand{\text}{\mbox}
\newcommand{\QUOTE}[1]{\begin{quote} {\tt #1 } \end{quote}}
\newcommand{\mbold}[1]{\mbox{\bf #1}}
\newcommand{\too}{\Rightarrow}
\newcommand{\isotoo}{\; \tilde{\Rightarrow} \;}
\newcommand{\et}{\hspace{-0.08in}{\bf .}\hspace{0.1in}}
\newcommand{\Hom}{\mbox{Hom}}
\newcommand{\FinSet}{\mbox{FinSet}}
\newcommand{\Aut}{\mbox{Aut}}
\newcommand{\Span}{\mbox{Span}}
\newcommand{\Ob}{\mbox{Ob}}
\newcommand{\Mor}{\mbox{Mor}}
\newcommand{\Cocont}{\mbox{Cocont}}
\newcommand{\MatN}{\ensuremath{\mbox{Mat(\NN)}}}
\newcommand{\nCob}{n\mbox{Cob}}
\newcommand{\Tr}{\mbox{Tr}}
\newcommand{\comod}{\textnormal{Comod} }
\newcommand{\rev}{\textnormal{rev} }
\newcommand{\comodf}{\textnormal{Comod}_f }
\newcommand{\coalg}{ \textnormal{Coalg} }
\newcommand{\comonadic}{ \textnormal{Comonadic} }
\newcommand{\cocent}{ \textnormal{Cocent} }
\renewcommand{\mod}{ {\textnormal{Mod}} }
\newcommand{\vact}{ \V {\textnormal{-Act}} }
\newcommand{\catvv}{ \textnormal{Cat} / \negthinspace / \V }
\newcommand{\cotr} { \textnormal{Cotract} }
\newcommand{\vactv}{ {\mathcal{V}} {\textnormal{-Act}} / \V }
\newcommand{\catvf}{ \cat / {\V_f} }
\newcommand{\catvectf}{ \cat / {\vectf} }
\newcommand{\catv}{ \cat / \V }
\newcommand{\vactvv}{ \vact / \negthinspace / \V }
\newcommand{\Ach}{\ensuremath {A_{\textnormal{ch}}} }
\newcommand{\ench}{\ensuremath {\textnormal{End}}^{\vee} }
\newcommand{\coend}{\ensuremath {\textnormal{coend}} }
\newcommand{\V}{ {\mathcal{V}} }
\newcommand{\W}{ {\mathcal{W}} }
\newcommand{\A}{ {\mathcal{A}} }
%\newcommand{\B}{ {\mathcal{B}} }
\newcommand{\C}{ {\mathcal{K}} }
\newcommand{\D}{ {\mathcal{D}} }
\newcommand{\K}{ {\mathcal{K}} }
\renewcommand{\L}{ {\mathcal{L}} }
\newcommand{\M}{ {\mathcal{M}} }
\newcommand{\U}{ {\mathcal{U}} }
\newcommand{\X}{ {\mathcal{X}} }
\newcommand{\comoda}{ S }
\newcommand{\comodb}{ T }
\newcommand{\cat}{ {\textnormal{Cat}} }
\newcommand{\sq}{ {\textnormal{Sq}} }
\newcommand{\comp}{ \C }
%\newcommand{\gray}{ {\textnormal{Gray}} }
\newcommand{\obj}{ {\textnormal {obj}} }
\newcommand{\vect}{ {\textnormal{Vect}} }
\newcommand{\vectf}{ {\textnormal{Vect}}_f }
\newcommand{\E}{ {\textnormal{E}} }
\newcommand{\comon}{ {\textnormal{Comon}} }
\newcommand{\mon}{ {\textnormal{Mon}} }
\newcommand{\vopcat}{ \V^{\textnormal{op}} {\textnormal{-Cat}} }
\newcommand{\vopcatop}{ \vopcat^{\textnormal{op}} }
\newcommand{\vcat}{ \V {\textnormal{-Cat}} }
\newcommand{\set}{ {\textnormal{Set}} }
\newcommand{\disc}{ {\mathbb{D}} }
\newcommand{\Vop}{ {\V^{\textnormal{op}}} }
\newcommand{\desc}{ {\textnormal{Desc}} }
\newcommand{\lbr}{ [ \negthinspace [ }
\newcommand{\rbr}{ ] \negthinspace ] }
\newcommand{\psmon}{ \textnormal{PsMon} }
\newcommand{\brpsmon}{ \textnormal{BrPsMon} }
\newcommand{\sympsmon}{ \textnormal{SymPsMon} }
\newcommand{\balpsmon}{ \textnormal{BalPsMon} }
\newcommand{\cone}{ \textnormal{Cone} }
\newcommand{\wmonhom}{ \textnormal{WMon} }
\newcommand{\brwmonhom}{ \textnormal{BrWMon} }
\newcommand{\syllwmonhom}{ \textnormal{SyllWMon} }
\newcommand{\balwmonhom}{ \textnormal{BalWMon} }
\newcommand{\symwmonhom}{ \textnormal{SymWMon} }
%\newcommand{\assoc}{{\sf{a}}}
%\newcommand{\leftunit}{{\sf{l}}}
%\newcommand{\rightunit}{{\sf{r}}}
%\newcommand{\mqn}{ M_q(n)}
\newcommand{\acirc}{ {\stackrel{\mbox{\circle{2}}}{a}} }
\newcommand{\lcirc}{ {\stackrel{\mbox{\circle{2}}}{l}} }
\newcommand{\rcirc}{ {\stackrel{\mbox{\circle{2}}}{r}} }
\newcommand{\ctr}[1]{\begin{center} #1 \end{center}}
\newcommand{\oo}{\infty}
\newcommand{\alx}[1]{\begin{align*} #1 \end{align*}}
\renewcommand{\L}{\mathfrak{L}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\prf}{\noindent {\em Proof.} }
\newcommand{\QED}{\hfill $\square$}
\newcommand{\dom}{\mbox{\rm dom}}
\newcommand{\bin}{\mbox{bin}}
\newcommand{\Iota}{\mbox{{\tiny Iota}}}
\newcommand{\floor}[1]{\lfloor #1 \rfloor}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\newcommand{\fdom}{\mbox{\rm \footnotesize{dom}}}
\newcommand{\fbin}{\mbox{\rm \footnotesize{bin}}}
\newcommand{\COM}[2]{{#1 \choose #2}}
\newcommand{\Prob}{{\rm Prob}}
\newcommand{\Halt}{{\rm Halt}}
\newcommand{\ttime}{{\rm time}}
\newtheorem{fact}[thm]{Fact}
\newtheorem{lem}[thm]{Lemma}
\newtheorem{cor}[thm]{Corollary}
\newtheorem{example}[thm]{Example}
\newtheorem{sch}[thm]{Scholium}
\newcommand{\myproof}{\noindent {\em Proof.} }
\newcommand{\Space}{\mbox{Space}}
%pictures of cobordisms
\newcommand{\multc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (1.5,2.5)
.. controls (1.5,1.1) and (.4,1.6) .. (.5,0)
.. controls (.4,-.25) and (-.4,-.25) .. (-.5,0)
.. controls (-.4,1.6) and (-1.5,1.1) .. (-1.5,2.5)
-- (-.5,2.5)
.. controls (-.6,1.5)and (0.6,1.5) .. (.5,2.5)
-- (1.5,2.5);
\shadedraw (-1,2.5) ellipse (.5 and .2);
\shadedraw (1,2.5) ellipse (.5 and .2);
\end{scope}
\draw[dashed] (0.5,0) arc (0:180:.5 and .2);
}
\newcommand{\comultc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (1.5,0)
.. controls (1.5,1.4) and (.4,.9) .. (.5,2.5)
-- (-0.5,2.5)
.. controls (-.4,.9) and (-1.5,1.4) .. (-1.5,0)
.. controls (-1.4,-.25) and (-.6,-.25) .. (-.5,0)
.. controls (-.6,1) and (0.6,1) .. (.5,0)
.. controls (.6,-.25) and (1.4,-.25) .. (1.5,0);
\shadedraw (0,2.5) ellipse (.5 and .2);
\end{scope}
\draw[dashed] (1.5,0) arc (0:180:.5 and .2);
\draw[dashed] (-.5,0) arc (0:180:.5 and .2);
}
\newcommand{\identc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (.5,0) -- (.5,2.5) -- (-.5,2.5) -- (-.5,0)
.. controls (-.4,-.25) and (.4,-.25) .. (.5,0);
\shadedraw (0,2.5) ellipse (.5 and .2);
\end{scope}
\draw[dashed] (0.5,0) arc (0:180:.5 and 0.2);
}
\newcommand{\medidentc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (-.5,2) -- (-.5,0)
.. controls (-.4,-.25) and (.4,-.25) .. (.5,0)
-- (.5,2) -- (-.5, 2);
\shadedraw (0,2) ellipse (.5 and .2);
\end{scope}
\draw[dashed] (0.5,0) arc (0:180:.5 and 0.2);
}
\newcommand{\smallidentc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (-.5,1) -- (-.5,0)
.. controls (-.4,-.25) and (.4,-.25) .. (.5,0)
-- (.5,1) -- (-.5, 1);
\shadedraw (0,1) ellipse (.5 and .2);
\end{scope}
\draw[dashed] (0.5,0) arc (0:180:.5 and 0.2);
}
\newcommand{\birthc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (-.5,0)
.. controls (-.5,.9) and (0.5,.9) .. (.5,0)
.. controls (.4,-.25) and (-.4,-.25) .. (-.5,0);
\end{scope}
\draw[dashed] (0.5,0) arc (0:180:.5 and 0.2);
}
\newcommand{\deathc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (-.5,1)
.. controls (-.5,.1) and (0.5,.1) .. (.5,1)
-- (-.5,1);
\shadedraw (0,1) ellipse (.5 and .2);
\end{scope}
}
\newcommand{\ucrossc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (.5,0) -- (-1.5,2.5) -- (-.5,2.5) -- (1.5,0)
.. controls (1.4,-.25) and (.6,-.25) .. (.5,0);
\shadedraw (-.5,0) -- (1.5,2.5) -- (.5,2.5) -- (-1.5,0)
.. controls (-1.4,-.25) and (-.6,-.25) .. (-.5,0);
\shadedraw (-1,2.5) ellipse (.5 and .2);
\shadedraw (1,2.5) ellipse (.5 and .2);
\end{scope}
\draw[dashed] (-1.5,2.5) -- (.5,0);
\draw[dashed] (-.5,2.5) -- (1.5,0);
\draw[dashed](1.5,0) arc (0:180:.5 and 0.2);
\draw[dashed](-.5,0) arc (0:180:.5 and 0.2);
}
\newcommand{\zagc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (1.5,0) .. controls (1.6,2) and (-1.6,2) .. (-1.5,0)
.. controls (-1.4,-.25) and (-.6,-.25) .. (-.5,0)
.. controls (-.6,.8) and (0.6,.8) .. (.5,0)
.. controls (.6,-.25) and (1.4,-.25) .. (1.5,0);
\end{scope}
\draw [dashed] (1.5, 0) arc (0:180:.5 and 0.2);
\draw [dashed] (-.5, 0) arc (0:180:.5 and 0.2);
}
\newcommand{\zigc}{
\begin{scope}[left color=gray, right color=white]
\shadedraw (1.5,2) .. controls (1.6,0) and (-1.6,0) .. (-1.5,2)
-- (-.5,2)
.. controls (-.6,1.2) and (0.6,1.2) .. (.5,2)
-- (1.5,2);
\shadedraw (1,2) ellipse (.5 and .2);
\shadedraw (-1,2) ellipse (.5 and .2);
\end{scope}
}
% Defining this here since I do not use tac.cls
\newcommand{\pushright}[1]{#1}
\def\endproof{\pushright{$\Box$}%\vrule width5pt height 5pt depth 0pt}%
\penalty-700 \par\addvspace{\medskipamount}}
% "pushright" is a command from tac.cls
\begin{document}
\thispagestyle{empty}
\begin{center}
\begin{tikzpicture}
\fill (0,0) rectangle (\textwidth, 0.3);
\end{tikzpicture}
{\em The Department of Computer Science \\ The University of Auckland \\ New Zealand \\[240pt]}
\begin{tikzpicture}
\fill (0,0) rectangle (\textwidth, 0.3);
\end{tikzpicture}\\
\vspace{1.2em}{\fontsize{36}{36}\selectfont \bf Physic\$ and Computation}\\~\\
\begin{tikzpicture}
\fill (0,0) rectangle (\textwidth, 0.3);
\end{tikzpicture}\\
\end{center}
{\hfill \em \fontsize{16}{16}\selectfont
\begin{tabular}{rr}
& Michael Stay \\
& January 2015 \\ & \\
Supervisors: & \\
& Cristian Calude\\
& John C. Baez \\
\end{tabular}
}\\
\begin{tabular}{cp{50pt}p{300pt}}
\includegraphics[width=75pt]{shield.eps} & &
\vspace{-50pt} \textsc{\fontsize{16}{16}\selectfont A thesi\$ submitted in partial fulfillment of the requirement\$ of Doctor of Philosophy in Computer Science}
\end{tabular}\\
\pagebreak
\renewcommand{\thepage}{\roman{page}}
\chapter*{Abstract}
Thi\$ thesi\$ i\$ an exploration of some place\$ where idea\$ in computer science and physic\$ share a common mathematical structure. The first part of the thesi\$ deal\$ with partition function\$ in physic\$ and algorithmic information theory. In physic\$, partition function\$ are used to encode information about statistical system\$ in thermal equilibrium; in algorithmic information theory, they are used to encode information about the probability that a Turing machine will halt given a random program. We derive analogue\$ of Maxwell'\$ relation\$ in the algorithmic setting and consider thermodynamic cycle\$ such a\$ the Carnot cycle or Stoddard cycle. We also show that given a program and a probability $P$, we can effectively compute a time after which the probability that the program will eventually halt i\$ les\$ than $P$. Thi\$ idea of a time cutoff i\$ reminiscent of a high-energy cutoff in renormalization.
The second part of the thesi\$ review\$ symmetric monoidal closed categorie\$ and bicategorie\$. We begin with an expository chapter on symmetric monoidal closed categorie\$ and demonstrate how they form a broad generalization of the Curry-Howard isomorphism, including string diagram\$ in physic\$, cobordism\$ in topology, multiplicative intuitionistic linear logic, and the simply-typed lambda calculu\$ in computer science. We then go up one dimension and present the complete definition of a special kind of symmetric monoidal closed bicategory called a compact closed bicategory. We emphasize the combinatorial aspect\$ and prove that given a 2-category $T$ with finite product\$ and weak pullback\$, the bicategory $\Span_2(T)$ of object\$ of $T$, span\$, and isomorphism classe\$ of map\$ of span\$ i\$ compact closed. A\$ a corollary, certain bicategorie\$ of "resistor network\$" are compact closed.
\vfill
\pagebreak
\chapter*{Acknowledgement\$}
Thank\$ to Google for funding and 20\% time. Thank\$ to Cristian Calude and John Baez for year\$ of friendship, good advice, patience, and faith in me. Thank\$ most of all to my wife Miriam, for loving me through everything.
\vfill
\pagebreak
\setcounter{page}{9}
\tableofcontents
\vfill
\pagebreak
\renewcommand{\thepage}{\arabic{page}}
\setcounter{page}{1}
\chapter*{Introduction}
\addcontentsline{toc}{chapter}{Introduction}
There i\$ a nice analogy between the set of possible arrangement\$ of ga\$ particle\$ in a piston and the set of halting program\$ written in a given programming language. In Chapter I.1, we talk about "observables" in each case: just a\$ we can ask for the volume of the convex hull of the particle\$, their total kinetic energy, or how many particle\$ there are, we can ask for the length of a program, it\$ runtime, or the amount of memory it use\$. We associate thermodynamic conjugate variable\$ to each observable: just a\$ pressure i\$ conjugate to volume, we can talk about the "algorithmic pressure" that i\$ conjugate to the size of the program. We single out one observable property of a program to play the role of internal energy and then define the entropy of a distribution on program\$ and analogue\$ of Maxwell'\$ equation\$. Finally, by considering loop\$ in the pressure/volume space, we describe an "algorithmic heat engine".
Algorithmic entropy i\$ a special case of the entropy a\$ studied in statistical mechanic\$. A Gibb\$ ensemble i\$ a probability measure that maximize\$ the entropy subject to constraint\$ on the average value\$ of some observable\$. In most work on algorithmic entropy, the relevant observable i\$ the length of a program; however, the full structure of thermodynamic\$ only appear\$ when we consider multiple observable\$. We focu\$ on the log of the runtime $E$, the length $V$, and the output $N$. The Gibb\$ ensemble i\$ of the form
\[ p = \frac{1}{Z} e^{-\beta E(x) - \gamma V(x) - \delta N(x)} \]
for certain $\beta, \gamma, \delta,$ where
\[ Z = \sum_{x \in X} e^{-\beta E(x) - \gamma V(x) - \delta N(x)} \]
i\$ called the `partition function' of the ensemble and $X$ i\$ the domain of some universal Turing machine $U$. The partition function reduce\$ to the halting probability for $U$ when $\beta = \delta = 0$ and $\gamma = \ln 2$.
We derive an algorithmic analogue of the basic thermodynamic relation
\[ dE = TdS - PdV + μdN, \]
where $S$ i\$ the entropy, $T = 1/\beta, P = \gamma/\beta,$ and $\mu = -\delta/\beta$ are algorithmic version\$ of temperature, pressure, and chemical potential, respectively. Starting from thi\$ relation, we derive analogue\$ of Maxwell'\$ equation\$ and consider thermodynamic cycle\$ like the Carnot cycle or Stoddard cycle.
The halting probability of a universal Turing machine i\$ uncomputable. However, in Chapter I.2, we show that we can effectively bound the probability that a particular program halt\$. Consider an $N$-bit program $p$ that run\$ for a long time without stopping---say, $2^{2N}$ step\$. There i\$ a program not much longer than $N$ bit\$ that will run $p$ and, if it ever halt\$, will output the number of step\$ $t$ required for $p$ to halt. Thi\$ i\$ a short program of length $N + c$ that produce\$ a large number $t$ whose length i\$ greater than $2N$, so $t$ must be nonrandom. The density of nonrandom number\$ near $t$ goe\$ to zero a\$ $t$ goe\$ to infinity, so the density of time\$ at which the program might halt also goe\$ to zero. By choosing a computably enumerable distribution on natural number\$, we turn densitie\$ into probabilities: given an $N$-bit program $p$ and a natural number $k,$ we show that we can effectively compute a critical time $t_0$ such that the probability that $p$ halt\$ after running at least $t_0$ step\$ i\$ les\$ than $2^{-k}.$ Manin \cite{ManinRenorm2} referenced the published version of thi\$ chapter and showed that thi\$ critical time i\$ formally equivalent to a high-energy cutoff when renormalizing quantum field theory.
It i\$ well-known that Wick rotation turn\$ equation\$ describing thermal system\$ into equation\$ describing quantum system\$. By replacing the energy scale $kT$ in a classical partition function by the imaginary energy $i\hbar/t$, we get a quantum partition function. Consider a large collection of harmonic oscillator\$ at temperature $T$. The relative probability of finding any given oscillator with energy $E$ i\$ $\exp(-E/k_B T)$, where $k_B$ i\$ Boltzmann'\$ constant. The average value of an observable $Q$ i\$, up to a normalizing constant,
\[ \sum_j Q_j e^{-E_j / (k_B T)}.\]
Now consider a single quantum harmonic oscillator in a superposition of basi\$ state\$, evolving for a time $t$ under a Hamiltonian $H$. The relative phase change of the basi\$ state with energy $E$ i\$ $\exp(-E it/ \hbar),$ where $\hbar$ i\$ Planck'\$ constant. The probability amplitude that a uniform superposition of state\$ $|\psi\rangle = \sum_j |j\rangle$ evolve\$ to an arbitrary superposition $|Q\rangle = \sum_j Q_j |j\rangle$ i\$, up to a normalizing constant,
\begin{align*}
& \langle Q|e^{-iHt/\hbar}|\psi\rangle \\
= & \sum_j Q_j e^{-E_j it/ \hbar}\langle j|j\rangle \\
= & \sum_j Q_j e^{-E_j it/ \hbar}.
\end{align*}
Feynman'\$ path integral formulation of quantum mechanic\$ consider\$ a sum over path\$ $\gamma$ rather than a sum over state\$, each weighted by a phase $e^{-iS(\gamma)/\hbar}$, where $S(\gamma)$ i\$ the classical action of the path. When we move from quantum mechanic\$ to quantum field theory, the partition function sum\$ over diagram\$ rather than path\$. Feynman diagram\$ form a category: there i\$ a trivial diagram for any set of particle\$ where they do not interact at all, and we can compose any two diagram\$ where the output particle\$ of one diagram match the input particle\$ of the next. Thi\$ category i\$ equipped with certain structure that also appear\$ in programming language\$, linear logic, and topology.
The shared structure i\$ called a "symmetric monoidal closed category"; the contribution of Chapter II.1 of thi\$ thesi\$ i\$ an exposition of existing work on symmetric monoidal closed categorie\$ and the extension of the appropriate version of the Curry-Howard isomorphism to Feynman diagram\$ and Hilbert space\$.
A category consist\$ of a collection of "objects" and, for each pair $(x, y)$ of object\$, a set of "morphisms" from $x$ to $y$. Morphism\$ are composable if the target of one matche\$ the source of the next; every object ha\$ an identity morphism from the object to itself; and composition i\$ associative and unital. Feynman diagram\$ form a category in which object\$ are particle type\$ and morphism\$ are graph\$ of interaction\$ between them. We interpret Feynman diagram\$ in the category of Hilbert space\$ and linear transformation\$; that i\$, we associate to each particle a Hilbert space of state\$ and to each diagram a linear transformation that tell\$ how those state\$ change over time. In computer science, we have a category of data type\$ with typed function\$ between them; in the Haskell community, thi\$ category i\$ called "Hask". In linear logic, we have categorie\$ whose object\$ are proposition\$ and whose morphism\$ are constructive proof\$. In topology, we have categorie\$ whose object\$ are manifold\$ of a certain dimension and whose morphism\$ are cobordisms.
A "monoidal" category let\$ u\$ pair up object\$ and morphism\$. We can juxtapose Feynman diagram\$ to get a new diagram and take the tensor product of Hilbert space\$ to get a new Hilbert space. Programming language\$ usually provide some way of combining data type\$ into a new one; Python and Scala, for instance, have "tuple" type constructor\$. In linear logic, the assertion that both the proposition\$ P and Q hold i\$ itself a proposition. In topology, the disjoint union of two manifold\$ i\$ a manifold. A "braided" monoidal category let\$ u\$ move object\$ past each other using an isomorphism called the "braiding". A "symmetric" monoidal category i\$ one in which braiding twice i\$ the identity.
A symmetric monoidal "closed" category ha\$ a notion of a "function type" or "implication" or "time reversal". Given two Haskell data type\$ {\tt P} and {\tt Q}, the data type {\tt P -> Q} describe\$ typed function\$ between them. In linear logic, given two proposition\$ $P$ and $Q$, the assertion that $P$ implie\$ $Q$ i\$ a proposition. In topology, given any cobordism from a manifold $P$ to a manifold $Q$, we can "bend" the input around and make it an output, getting a cobordism from the empty manifold to $Q$ and the reverse orientation of $P$.
\begin{center}
\begin{tikzpicture}
\identc
\node [font=\fontsize{18}{18}\selectfont] at (1.5,1) {$\mapsto$};
\begin{scope}[xshift=4cm]
\zagc
\end{scope}
\end{tikzpicture}
\end{center}
Similarly, given a Feynman diagram from a set $P$ of particle\$ to a set $Q$, we can "bend" the input set around and make it an output, getting a diagram from photon\$ to $Q$ and the antiparticle\$ of $P$:
\begin{center}
\begin{tikzpicture}
\coordinate (A) at (1,3);
\coordinate (D) at (1,0);
\draw[electron] (A) -- node [left, pos=0.5] {$e$} (D);
\node [font=\fontsize{18}{18}\selectfont] at (2.5,1.5) {$\mapsto$};
\coordinate (E) at (4,3);
\coordinate (F) at (4,1);
\coordinate (G) at (3,0);
\coordinate (H) at (5,0);
\draw[photon] (E) -- (F);
\draw[electron] (G) -- node [above left, pos=0.5] {$e$} (F);
\draw[electron] (F) -- node [above right, pos=0.5] {$e$} (H);
\end{tikzpicture}
\end{center}
Given any linear transformation from a Hilbert space $P$ to a Hilbert space $Q$, we can use the notion of "gate teleportation" \cite{GC} to encode the transformation into a quantum state in $P^* \otimes Q.$ When the function type can be expressed in term\$ of a dual object and the tensor product, a\$ with Feynman diagram\$, the category of cobordism\$, or Hilbert space\$, we call the category "compact closed".
Feynman diagram\$ were designed to represent quantum system\$ interacting. We assign a Hilbert space of quantum state\$ to each particle such that the pairing and braiding operation\$ are preserved: if we assign the space $U$ to one particle and $V$ to another, then we have to assign $U \otime\$ V$ to the pair of particle\$. We assign a linear transformation to each vertex that tell\$ how the quantum state\$ evolve. Thi\$ kind of structure-preserving map i\$ called a "braided monoidal functor"; every monoidal functor also preserve\$ dual\$, so the entire structure of a compact closed category i\$ preserved.
We can consider such functor\$ from other compact closed categorie\$ into the category Hilb of Hilbert space\$ and linear transformation\$. For example, we can think of manifold\$ a\$ modeling empty curved space, and cobordism\$ a\$ modeling spacetime. A braided monoidal functor from the category of manifold\$ and cobordism\$ to Hilb assign\$ a Hilbert space of quantum state\$ to space and a linear transformation to spacetime, giving a toy model of quantum gravity. Thi\$ toy model doe\$ not include matter; it only talk\$ about topological change\$ in space over time, so the model i\$ called a "topological quantum field theory".
The structure of a compact closed category can be generalized to "bicategories", where in addition to morphism\$ between object\$ we have 2-morphism\$ between morphism\$. In Chapter II.2, we lay out the complete definition of a compact closed bicategory (the part\$ of which have not appeared together in a single place before) and then prove that variou\$ useful bicategorie\$ are compact closed. We take special note of bicategorie\$ of "span\$".
In particular, given a category $T$ with pullback\$, we can define a bicategory $\Span(T)$ whose
\begin{itemize}
\item object\$ are those of $T$,
\item morphism\$ from $A$ to $B$ are "span\$" consisting of an "apex" object $C$ and an ordered pair of morphism\$ $(f:C\to A, g:C \to B)$ from $T$, and
\item 2-morphism\$ are morphism\$ from $C$ to $C'$ such that the obviou\$ diagram commutes.
\end{itemize}
If $T$ i\$ the category of finite set\$, then we can think of the subset of $C$ consisting of those element\$ $c$ such that $f(c) = a$ and $g(c) = b$ a\$ being a "matrix element" at row $a$ and column $b$. If we simply count these subset\$, we get a matrix of natural number\$, and the pullback correspond\$ to matrix multiplication. By using categorie\$ $T$ other than the category of finite set\$, we get a vast generalization of linear algebra. We conclude Chapter II.2 with a proof that when we take the monoidal tricategory of span\$ described by Hoffnung \cite{Hoffnung} and mod out by 3-isomorphism\$, we get a compact closed bicategory.
Compact closed bicategorie\$ are interesting to u\$ because they open up at least three area\$ for future research. First, Lawvere showed that we can think of categorie\$ with product\$ a\$ a kind of programming language. A\$ programmer\$, we can write a Java interface that describe\$ the operation\$ on a monoid (identity element and multiplication) and write test\$ to check the relation\$ (associativity and unit laws). Thi\$ interface, together with the test\$, i\$ a presentation of a category with product\$, called "the Lawvere theory for monoid\$". Every implementation of the interface describe\$ a functor from the Lawvere theory for monoid\$ into the category Set and vice versa. The tao of categorification suggest\$ there should be a "higher Lawvere theory of symmetric monoidal closed categorie\$"; the fact that the currying adjunction between tensor and the internal hom i\$ an isomorphism of profunctor\$ mean\$ that thi\$ higher theory ought to be a compact closed bicategory and take model\$ in Prof.
Second, Lambek and Scott showed that simply-typed lambda calculu\$ form\$ a cartesian closed category, but only if we {\em ignore the proces\$ of computation.} Thi\$ i\$ only possible because lambda calculu\$ i\$ confluent: it doe\$ not matter in what order the rewrite rule\$ are applied. Concurrent calculi like Milner'\$ pi calculu\$ are not confluent; once we can expres\$ contention for resources---like a deposit to and a withdrawal from the same bank account---it matter\$ a great deal in which order the rewrite\$ occur. Thi\$ suggest\$ that we need to explicitly account for rewrite\$ using 2-morphism\$ in a bicategorical setting. The higher theory of a symmetric monoidal closed category above provide\$ many of the piece\$ we need for the pi calculus: the unit object can be the zero proces\$, the tensor product can be concurrency, and the internal hom should be involved in putting a proces\$ under a prefix. The 2-morphism\$ for adjunction\$ in thi\$ bicategory drawn a\$ string diagram\$ look amazingly like synchronization on a name.
Finally, thi\$ bicategorical approach should also fit better with physics: rewrite\$ are processe\$ that occur over time like particle interaction\$ do. Extended topological quantum field theorie\$ should be functor\$ between compact closed bicategorie\$, so it i\$ not unreasonable to expect a nice quantum interpretation of some kind of linear pi calculus.
\setcounter{chapter}{0}
\part{Algorithmic information theory}
\chapter{Algorithmic thermodynamics}
The first mathematical structure we will examine that appear\$ in both computer science and physic\$ i\$ the partition function. Partition function\$ encode how likely it i\$ to find a system in a given state. From the partition function, we can compute the "entropy", the number of bit\$ required to pick out the particular state the system i\$ in. In statistical mechanic\$, the bit\$ describe the position\$ and momenta of a collection of particle\$; in algorithmic information theory, the bit\$ describe a program.
Algorithmic entropy can be seen as a special case of entropy as studied in statistical mechanics. This viewpoint allows us to apply many techniques developed for use in thermodynamics to the subject of algorithmic information theory. In particular, suppose we fix a universal prefix-free Turing machine and let $X$ be the set of programs that halt for this machine. Then we can regard $X$ as a set of `microstates', and treat any function on $X$ as an `observable'. For any collection of observables, we can study the Gibbs ensemble that maximizes entropy subject to constraints on expected values of these observables. We illustrate this by taking the log runtime, length, and output of a program as observables analogous to the energy $E$, volume $V$ and number of molecules $N$ in a container of gas. The conjugate variables of these observables allow us to define quantities which we call the `algorithmic temperature' $T$, `algorithmic pressure' $P$ and `algorithmic potential' $\mu$, since they are analogous to the temperature, pressure and chemical potential. We derive an analogue of the fundamental thermodynamic relation $d E = TdS - P d V + \mu d N$, and use it to study thermodynamic cycles analogous to those for heat engines. We also investigate the values of $T, P$ and $\mu$ for which the partition function converges. At some points on the boundary of this domain of convergence, the partition function becomes uncomputable. Indeed, at these points the partition function itself has nontrivial algorithmic entropy.
\section{Introduction}\label{intro}
Many authors \cite{BGLVZ, Chaitin1975, FT1982, Kolmogorov1965, LevinZvonkin, Solomonoff1964, Szilard1929, Tadaki2008} have discussed the analogy between algorithmic entropy and entropy as defined in statistical mechanics: that is, the entropy of a probability measure $p$ on a set $X$. It is perhaps insufficiently appreciated that algorithmic entropy can be seen as a \textit{special case} of the entropy as defined in statistical mechanics. We describe how to do this in Section \ref{entropy}.
This allows all the basic techniques of thermodynamics to be imported to algorithmic information theory. The key idea is to take $X$ to be some version of `the set of all programs that eventually halt and output a natural number', and let $p$ be a Gibbs ensemble on $X$. A
Gibbs ensemble is a probability measure that maximizes entropy subject to constraints on the mean values of some observables --- that is, real-valued functions on $X$.
In most traditional work on algorithmic entropy, the relevant observable is the length of the program. However, much of the interesting structure of thermodynamics only becomes visible when we consider several observables. When $X$ is the set of programs that halt and output a natural number, some other important observables include the output of the program and logarithm of its runtime. So, in Section \ref{thermodynamics} we illustrate how ideas from thermodynamics can be applied to algorithmic information theory using these three observables.
To do this, we consider a Gibbs ensemble of programs which maximizes entropy subject to constraints on:
\begin{itemize}
\item
$E$, the expected value of the logarithm of the program's runtime (which we treat as analogous to the energy of a container of gas),
\item
$V$, the expected value of the length of the program (analogous to the volume of the container), and
\item
$N$, the expected value of the program's output (analogous to the number of molecules in the gas).
\end{itemize}
This measure is of the form
\[ p = \frac{1}{Z} e^{-\beta E(x) -\gamma V(x) - \delta N(x)} \]
for certain numbers $\beta, \gamma, \delta$, where the normalizing factor
\[ Z = \sum_{x \in X} e^{-\beta E(x) -\gamma V(x) - \delta N(x)} \]
is called the `partition function' of the ensemble. The partition function reduces to Chaitin's number $\Omega$ when $\beta = 0$, $\gamma = \ln 2$ and $\delta = 0$. This number is uncomputable \cite{Chaitin1975}. However, we show that the partition function $Z$ is computable when $\beta > 0$, $\gamma \ge \ln 2$, and $\delta \ge 0$.
We derive an algorithmic analogue of the basic thermodynamic relation
\[ dE = T dS - P dV + \mu dN . \]
Here:
\begin{itemize}
\item
$S$ is the entropy of the Gibbs ensemble,
\item
$T = 1/\beta$ is the `algorithmic temperature' (analogous to the temperature of a container of gas). Roughly speaking, this counts how many times you must double the runtime in order to double the number of programs in the ensemble while holding their mean length and output fixed.
\item
$P = \gamma/\beta$ is the `algorithmic pressure' (analogous to pressure). This measures the tradeoff between runtime and length. Roughly speaking, it counts how much you need to decrease the mean length to increase the mean log runtime by a specified amount, while holding the number of programs in the ensemble and their mean output fixed.
\item
$\mu = -\delta/\beta$ is the `algorithmic potential' (analogous to chemical potential). Roughly speaking, this counts how much the mean log runtime increases when you increase the mean output while holding the number of programs in the ensemble and their mean length fixed.
\end{itemize}
Starting from this relation, we derive analogues of Maxwell's relations and consider thermodynamic cycles such as the Carnot cycle or Stoddard cycle. For this we must introduce concepts of `algorithmic heat' and `algorithmic work'.
\begin{center}
\includegraphics[scale=0.15, angle=0.3]{piston.eps}
\end{center}
Charles Babbage described a computer powered by a steam engine;
we describe a heat engine powered by programs! We admit that the significance of this line of thinking remains a bit mysterious. However, we hope it points the way toward a further synthesis of algorithmic information theory and thermodynamics. We call this hoped-for synthesis `algorithmic thermodynamics'.
\section{Related Work}
Li and Vit\'anyi use the term `algorithmic thermodynamics' for describing physical states using a universal prefix-free Turing machine $U$. They look at the smallest program $p$ that outputs a description $x$ of a particular microstate to some accuracy, and define the physical entropy to be
\[ S_A(x) = (k \ln 2)(K (x) + H_x), \]
where $K(x) = |p|$ and $H_x$ embodies the uncertainty in the actual state given $x$. They summarize their own work and subsequent work by others in chapter eight of their book \cite{LV}. Whereas they consider $x=U(p)$ to be a microstate, we consider $p$ to be the microstate and $x$ the value of the observable $U$. Then their observables $O(x)$ become observables of the form $O(U(p))$ in our model.
Tadaki \cite{Tadaki2002} generalized Chaitin's number $\Omega$ to a function
$\Omega^D$ and showed that the value of this function is compressible by a factor of exactly $D$ when $D$ is computable. Calude and Stay \cite{CSNatural2006} pointed out that this generalization was formally equivalent to the partition function of a statistical mechanical system where temperature played the role of the compressibility factor, and studied various observables of such a system. Tadaki \cite{Tadaki2008} then explicitly constructed a system with that partition function: given a total length $E$ and number of programs
$N,$ the entropy of the system is the log of the number of $E$-bit strings in $\dom(U)^N.$ The temperature is
\[ \frac{1}{T} = \left.\frac{\Delta E}{\Delta S}\right|_N. \]
In a follow-up paper \cite{Tadaki2009}, Tadaki showed that various other quantities like the free energy shared the same compressibility properties as $\Omega^D$. In this thesis, we consider multiple variables, which is necessary for thermodynamic cycles, chemical reactions, and so forth.
Manin and Marcolli \cite{MM2009} derived similar results in a broader context and studied phase transitions in those systems. Manin \cite{ManinRenorm1, ManinRenorm2}
also outlined an ambitious program to treat the infinite runtimes one finds in undecidable problems as singularities to be removed through the process of renormalization. In a manner reminiscent of hunting for the proper definition of the
"one-element field" $F_{un},$ he collected ideas from many different places and considered how they all touch on this central theme. While he mentioned a runtime cutoff as being analogous to an energy cutoff, the renormalizations he presented are uncomputable. In this thesis, we take the log of the runtime as being analogous to the energy; the randomness described by Chaitin and Tadaki then arises as the infinite-temperature limit.
\section{Algorithmic Entropy}\label{entropy}
To see algorithmic entropy as a special case of the entropy of a probability measure, it is useful to follow Solomonoff
\cite{Solomonoff1964} and take a Bayesian viewpoint. In Bayesian probability theory, we always start with a probability measure called a `prior', which describes our assumptions about the situation at hand before we make any further observations. As we learn more, we may update this prior. This approach suggests that we should define the entropy of a probability measure
\textit{relative to another probability measure} --- the prior.
A probability measure $p$ on a finite set $X$ is simply a function $p
\maps X \to [0,1]$ whose values sum to 1, and its entropy is defined as follows:
\[ S(p) = -\sum_{x \in X} p(x) \ln p(x) .\]
But we can also define the entropy of $p$ relative to another probability measure $q$:
\[ S(p,q) = -\sum_{x \in X} p(x) \ln\frac{p(x)}{q(x)} \raisebox{0.4ex}{.}\]
This {\bf relative entropy} has been extensively studied and goes by various other names, including `Kullback--Leibler divergence' \cite{KL}
and `information gain' \cite{Renyi}.
The term `information gain' is nicely descriptive. Suppose we initially assume the outcome of an experiment is distributed according to the probability measure $q$. Suppose we then repeatedly do the experiment and discover its outcome is distributed according to the measure $p$. Then the information gained is $S(p,q)$.
Why? We can see this in terms of coding. Suppose $X$ is a finite set of signals which are randomly emitted by some source. Suppose we wish to encode these signals as efficiently as possible in the form of bit strings. Suppose the source emits the signal $x$ with probability $p(x)$, but we erroneously believe it is emitted with probability $q(x)$. Then $S(p,q)/\ln 2$ is the expected extra message-length per signal that is required if we use a code that is optimal for the measure $q$ instead of a code that is optimal for the true measure, $p$.
The ordinary entropy $S(p)$ is, up to a constant, just the relative entropy in the special case where the prior assigns an equal probability to each outcome. In other words:
\[ S(p) = S(p,q_0) + S(q_0) \]
when $q_0$ is the so-called `uninformative prior', with $q_0(x) =
1/|X|$ for all $x \in X$.
We can also define relative entropy when the set $X$ is countably infinite. As before, a probability measure on $X$ is a function $p
\maps X \to [0,1]$ whose values sum to 1. And as before, if $p$ and
$q$ are two probability measures on $X$, the entropy of $p$ relative to $q$ is defined by
\begin{equation}
\label{relative_entropy}
S(p,q) = -\sum_{x \in X} p(x) \, \ln\frac{p(x)}{q(x)} \raisebox{0.4ex}{.}
\end{equation}
But now the role of the prior becomes more clear, because there is no probability measure that assigns the same value to each outcome!
In what follows we will take $X$ to be --- roughly speaking --- the set of all programs that eventually halt and output a natural number. As we shall see, while this set is countably infinite, there are still some natural probability measures on it, which we may take as priors.
To make this precise, we recall the concept of a universal prefix-free Turing machine. In what follows we use {\bf string} to mean a bit string, that is, a finite, possibly empty, list of 0's and 1's. If $x$ and $y$ are strings, let $x||y$ be the concatenation of $x$ and $y.$ A \textbf{prefix} of a string $z$ is a substring beginning with the first letter, that is, a string $x$ such that $z = x||y$ for some $y$. A \textbf{prefix-free} set of strings is one in which no element is a prefix of any other. The \textbf{domain} $\dom(M)$ of a Turing machine $M$ is the set of strings that cause $M$ to eventually halt. We call the strings in $\dom(M)$ \textbf{programs}. We assume that when the $M$ halts on the program
$x$, it outputs a natural number $M(x)$. Thus we may think of the machine $M$ as giving a function $M \maps \dom(M) \to \N$.
A \textbf{prefix-free Turing machine} is one whose halting programs form a prefix-free set. A prefix-free machine $U$ is {\bf universal}
if for any prefix-free Turing machine $M$ there exists a constant
$c$ such that for each string $x$, there exists a string $y$ with
\[ U(y) = M(x) \; \mbox{ and } \; |y| < |x| + c. \]
Let $U$ be a universal prefix-free Turing machine. Then we can define some probability measures on $X = \dom(U)$
as follows. Let
\[ |\cdot | \maps X \to \N \]
be the function assigning to each bit string its length. Then there is for any constant $\gamma \ge \ln 2$ a probability measure
$p$ given by
\[ p(x) = \frac{1}{Z} e^{-\gamma |x|}. \]
Here the normalization constant $Z$ is chosen to make the numbers
$p(x)$ sum to 1:
\[ Z = \sum_{x \in X} e^{-\gamma |x|} .\]
It is worth noting that for computable real numbers $\gamma \ge \ln 2$, the normalization constant $Z$ is uncomputable \cite{Tadaki2002}. Indeed, when $\gamma = \ln 2$, $Z$ is Chaitin's famous number $\Omega$. We return to this issue in Section \ref{computability}.
Let us assume that each program prints out some natural number as its output. Thus we have a function
\[ N \maps X \to \N \]
where $N(x)$ equals $i$ when program $x$ prints out the number $i$. We may use this function to `push forward' $p$ to a probability measure
$q$ on the set $\N$. Explicitly:
\[
q(i) = \displaystyle {\sum_{x \in X : N(x) = i}} e^{-\gamma |x|} \; .
\]
In other words, if $i$ is some natural number, $q(i)$ is the probability that a program randomly chosen according to the measure
$p$ will print out this number.
Given any natural number $n$, there is a probability measure
$\delta_n$ on $\N$ that assigns probability 1 to this number:
\[ \delta_n(m) = \left\{ \begin{array}{cl} 1 & \textrm{if } m = n \\
0 & \textrm{otherwise.}
\end{array} \right.
\]
We can compute the entropy of $\delta_n$ relative to $q$:
\begin{equation}
\label{relative.entropy}
\begin{array}{ccl} S(\delta_n,q) &=&
\displaystyle{ -\sum_{i \in \N} \delta_n(i) \,
\ln \frac{\delta_n(i)}{q(i)}} \\
\\ &=& \displaystyle{ -\ln \left( \sum_{x \in X \colon N(x) = n}
e^{-\gamma |x|} \right) + \ln Z .} \\
\end{array}
\end{equation}
Since the quantity $\ln Z$ is independent of the number $n$, and uncomputable, it makes sense to focus attention on the other part of the relative entropy:
\[ \displaystyle{ -\ln \left( \sum_{x \in X \colon N(x) = n}
e^{-\gamma |x|} \right) .}
\]
If we take $\gamma = \ln 2$, this is precisely the \textbf{algorithmic entropy} \cite{Chaitin1976,LevinZvonkin} of the number $n$. So, up to the additive constant $\ln Z$, we have seen that \textit{algorithmic entropy is a special case of relative entropy}.
One way to think about entropy is as a measure of surprise: if you can predict what comes next --- that is, if you have a program that can compute it for you --- then you are not surprised. For example, the first 2000 bits of the binary fraction for 1/3 can be produced with this short Python program:
\begin{center}
{\tt print "01" * 1000}
\end{center}
But if the number is complicated, if every bit is surprising and unpredictable, then the shortest program to print the number does not do any computation at all! It just looks something like
\begin{center}
{\tt print "101000011001010010100101000101111101101101001010"}
\end{center}
Levin's coding theorem \cite{Levin1974} says that the difference between the algorithmic entropy of a number and its \textbf{Kolmogorov complexity} --- the length of the shortest program that outputs it
--- is bounded by a constant that only depends on the programming language.
So, up to some error bounded by a constant, \textit{algorithmic information is information gain}. The algorithmic entropy is the information gained upon learning a number, if our prior assumption was that this number is the output of a randomly chosen program ---
randomly chosen according to the measure $p$ where $\gamma = \ln 2$.
So, algorithmic entropy is not just \textit{analogous} to entropy as defined in statistical mechanics: it is a \textit{special case}, as long as we take seriously the Bayesian philosophy that entropy should be understood as relative entropy. This realization opens up the possibility of taking many familiar concepts from thermodynamics, expressed in the language of statistical mechanics, and finding their counterparts in the realm of algorithmic information theory.
But to proceed, we must also understand more precisely the role of the measure $p$. In the next section, we shall see that this type of measure is already familiar in statistical mechanics: it is a Gibbs ensemble.
\section{Algorithmic Thermodynamics}\label{thermodynamics}
Suppose we have a countable set $X$, finite or infinite, and suppose $C_1, \dots, C_n \maps X \to \R$ is some collection of functions. Then we may seek a probability measure $p$
that maximizes entropy subject to the constraints that the mean value of each observable $C_i$ is a given real number $\overline{C}_i$:
\[ \sum_{x \in X} p(x) \, C_i(x) = \overline{C}_i .\]
As nicely discussed by Jaynes \cite{Jaynes1957,Jaynes2003}, the solution, if it exists, is the so-called {\bf Gibbs ensemble}:
\[ p(x) = \frac{1}{Z} e^{-(s_1 C_1(x) + \cdots + s_n C_n(x))} \]
for some numbers $s_i \in \R$ depending on the desired mean values
$\overline{C}_i$. Here the normalizing factor $Z$ is called the
{\bf partition function}:
\[ Z = \sum_{x \in X} e^{-(s_1 C_1(x) + \cdots + s_n C_n(x))} \;. \]
In thermodynamics, $X$ represents the set of {\bf microstates} of some physical system. A probability measure on $X$ is also known as an
{\bf ensemble}. Each function $C_i \maps X \to \R$ is called an {\bf observable}, and the corresponding quantity $s_i$ is called the {\bf conjugate variable} of that observable. For example, the conjugate of the energy $E$ is the inverse of temperature $T$, in units where Boltzmann's constant equals 1. The conjugate of the volume $V$ --- of a piston full of gas, for example --- is the pressure $P$ divided by the temperature. And in a gas containing molecules of various types, the conjugate of the number $N_i$ of molecules of the $i$th type is minus the `chemical potential' $\mu_i$, again divided by temperature. For easy reference, we list these observables and their conjugate variables below.
\begin{center}
\renewcommand{\arraystretch}{2.3}
\begin{tabular}{c|c}
\multicolumn{2}{c}{\bf{THERMODYNAMICS}} \\
\hline Observable & Conjugate Variable\\
\hline energy: $E$ & $\displaystyle{\frac{1}{T}}$ \\
volume: $V$ & $\displaystyle{\frac{P}{T}}$ \\
number: $N_i$ & $\displaystyle{-\frac{\mu_i}{T}}$ \\
\end{tabular}
\end{center}
\renewcommand{\arraystretch}{1}
Now let us return to the case where $X=\dom(U)$. Recalling that programs are bit strings, one important observable for programs is the length:
\[ |\cdot | \maps X \to \N .\]
We have already seen the measure
\[ p(x) = \frac{1}{Z} e^{-\gamma |x|} .\]
Now its significance should be clear! This is the probability measure on programs that maximizes entropy subject to the constraint that the mean length is some constant $\ell$:
\[ \sum_{x \in X} p(x) \, |x| = \ell . \]
So, $\gamma$ is the conjugate variable to program length.
There are, however, other important observables that can be defined for programs, and each of these has a conjugate quantity. To make the analogy to thermodynamics as vivid as possible, let us arbitrarily choose two more observables and treat them as analogues of energy and the number of some type of molecule. Two of the most obvious observables are `output' and `runtime'. Since Levin's computable complexity measure \cite{Levin1973} uses the logarithm of runtime as a kind of `cutoff' reminiscent of an energy cutoff in renormalization, we shall arbitrarily choose the log of the runtime to be analogous to the energy, and denote it as
\[ E \maps X \to [0,\infty) \]
Following the chart above, we use $1/T$ to stand for the variable conjugate to $E$. We arbitrarily treat the output of a program as analogous to the number of a certain kind of molecule, and denote it as
\[ N \maps X \to \N . \]
We use $-\mu/T$ to stand for the conjugate variable of $N$. Finally, as already hinted, we denote program length as
\[ V \maps X \to \N \]
so that in terms of our earlier notation, $V(x) = |x|$. We use
$P/T$ to stand for the variable conjugate to $V$.
\begin{center}
\renewcommand{\arraystretch}{2.3}
\begin{tabular}{c|c}
\multicolumn{2}{c}{\bf{ALGORITHMS}} \\
\hline Observable & Conjugate Variable\\
\hline log runtime: $E$ & $\displaystyle{\frac{1}{T}}$ \\
length: $V$ & $\displaystyle{\frac{P}{T}}$ \\
output: $N$ & $\displaystyle{-\frac{\mu}{T}}$ \\
\end{tabular}
\end{center}
\renewcommand{\arraystretch}{1}
Before proceeding, we wish to emphasize that the analogies here were chosen somewhat arbitrarily. They are merely meant to illustrate the application of thermodynamics to the study of algorithms. There may or may not be a specific `best' mapping between observables for programs and observables for a container of gas! Indeed, Tadaki \cite{Tadaki2008} has explored another analogy, where length rather than log run time is treated as the analogue of energy. There is nothing wrong with this. However, he did not introduce enough other observables to see the whole structure of thermodynamics, as developed in Sections \ref{elementary}-\ref{cycles}
below.
Having made our choice of observables, we define the partition function by
\[ Z = \sum_{x \in X} e^{-\frac{1}{T}(E(x) + P V(x) - \mu N(x))} \; .\]
When this sum converges, we can define a probability measure on $X$, the Gibbs ensemble, by
\[ p(x) = \frac{1}{Z} e^{-\frac{1}{T}(E(x) + P V(x) - \mu N(x))} \; .\]
Both the partition function and the probability measure are functions of $T, P$ and $\mu$. From these we can compute the mean values of the observables to which these variables are conjugate:
\[
\begin{array}{ccc}
\overline{E} &=& \displaystyle{\sum_{x \in X}} p(x) \, E(x) \\
\\
\overline{V} &=& \displaystyle{\sum_{x \in X}} p(x) \, V(x) \\
\\
\overline{N} &=& \displaystyle{\sum_{x \in X}} p(x) \, N(x)
\end{array}
\]
In certain ranges, the map $(T,P,\mu) \mapsto (\overline{E},
\overline{V}, \overline{N})$ will be invertible. This allows us to alternatively think of $Z$ and $p$ as functions of $\overline{E}, \overline{V},$ and $\overline{N}$. In this situation it is typical to abuse language by omitting the overlines which denote `mean value'.
\subsection{Elementary Relations} \label{elementary}
The entropy $S$ of the Gibbs ensemble is given by
\[ S = - \sum_{x \in X} p(x)\, \ln p(x) .\]
We may think of this as a function of $T, P$ and $\mu$, or alternatively --- as explained above --- as functions of the mean values $E, V,$ and $N$. Then simple calculations, familiar from statistical mechanics \cite{Reif}, show that
%see Reif eqs.\ 8.7.4 to 8.7.5
\begin{equation}
\label{derivative1}
\displaystyle{\left.\frac{\partial S}{\partial E}\right|_{V,N}} =
\displaystyle{\frac{1}{T}}
\end{equation}
\begin{equation}
\label{derivative2}
\displaystyle{\left.\frac{\partial S}{\partial V}\right|_{E,N}} =
\displaystyle{\frac{P}{T}}
\end{equation}
\begin{equation}
\label{derivative3}
\displaystyle{\left.\frac{\partial S}{\partial N}\right|_{E,V}} =
-\displaystyle{\frac{\mu}{T}} .
\end{equation}
We may summarize all these by writing
%see Reif eqs.\ 8.7.6 and 8.7.8
\[ dS = \frac{1}{T} dE + \frac{P}{T} dV - \frac{\mu}{T} dN \]
or equivalently
\begin{equation}
\label{differentials}
dE = T dS - P dV + \mu dN .
\end{equation}
Starting from the latter equation we see:
\begin{equation}
\label{derivatives1}
\displaystyle{\left.\frac{\partial E}{\partial S}\right|_{V,N}} =
\displaystyle{T}
\end{equation}
\begin{equation}
\label{derivatives2}
\displaystyle{\left.\frac{\partial E}{\partial V}\right|_{S,N}} =
\displaystyle{-P}
\end{equation}
\begin{equation}
\label{derivatives3}
\displaystyle{\left.\frac{\partial E}{\partial N}\right|_{S,V}} =
\displaystyle{\mu} .
\end{equation}
With these definitions, we can start to get a feel for what the conjugate variables are measuring. To build intuition, it is useful to think of the entropy $S$ as roughly the logarithm of the number of programs whose log runtimes, length and output lie in small ranges $E
\pm \Delta E$, $V \pm \Delta V$ and $N \pm \Delta N$. This is at best approximately true, but in ordinary thermodynamics this approximation is commonly employed and yields spectacularly good results. That is why in thermodynamics people often say the entropy is the logarithm of the number of microstates for which the observables $E, V$ and $N$ lie within a small range of their specified values \cite{Reif}.
If you allow programs to run longer, more of them will halt and give an answer. The \textbf{algorithmic temperature}, $T$, is roughly the number of times you have to double the runtime in order to double the number of ways to satisfy the constraints on length and output.
The \textbf{algorithmic pressure}, $P$, measures the tradeoff between runtime and length \cite{CSHalting2006}: if you want to keep the number of ways to satisfy the constraints constant, then the freedom gained by having longer runtimes has to be counterbalanced by shortening the programs. This is analogous to the pressure of gas in a piston: if you want to keep the number of microstates of the gas constant, then the freedom gained by increasing its energy has to be counterbalanced by decreasing its volume.
Finally, the \textbf{algorithmic potential} describes the relation between log runtime and output: it is a quantitative measure of the principle that most large outputs must be produced by long programs.
\subsection{Thermodynamic Cycles} \label{cycles}
One of the first applications of thermodynamics was to the analysis of heat engines. The underlying mathematics applies equally well to algorithmic thermodynamics. Suppose $C$ is a loop in $(T,P,\mu)$
space. Assume we are in a region that can also be coordinatized by the variables $E,V,N$. Then the change in {\bf algorithmic heat}
around the loop $C$ is defined to be
\[ \Delta Q = \oint_C T dS .\]
Suppose the loop $C$ bounds a surface $\Sigma$. Then Stokes' theorem implies that
\[ \Delta Q = \oint_C T dS = \int_{\Sigma} dT dS . \]
However, Equation (\ref{differentials}) implies that
\[ dT dS = d(T dS) = d(dE + P dV - \mu dN) = + dP dV - d\mu dN \]
since $d^2 = 0$. So, we have
\[ \Delta Q = \int_{\Sigma} (dP dV - d\mu dN) \]
or using Stokes' theorem again
\begin{equation}
\label{loop}
\Delta Q = \int_C (P dV - \mu dN).
\end{equation}
In ordinary thermodynamics, $N$ is constant for a heat engine using gas in a sealed piston. In this situation we have
\[
\Delta Q = \int_C P dV .
\]
This equation says that the change in heat of the gas equals the work done on the gas --- or equivalently, minus the work done \emph{by} the gas. So, in algorithmic thermodynamics, let us define $\int_C P dV$
to be the {\bf algorithmic work} done on our ensemble of programs as we carry it around the loop $C$. Beware: the algorithmic work has the same units as the observable we choose to play the role of internal energy. The algorithmic work is only the same as the `computational work', meaning the amount of computation done by a program as it runs, if we choose to use the runtime as the analogue of internal energy. We have chosen to use the log runtime instead, so the concepts are related, but not the same. If we had chosen to use the length of a program to represent the internal energy, then algorithmic work would be measured in bits and the two concepts would be completely distinct.
To see an example of a cycle in algorithmic thermodynamics, consider the analogue of the heat engine patented by Stoddard in 1919
\cite{Stoddard1919}. Here we fix $N$ to a constant value and consider the following loop in the $PV$ plane:
\begin{center}
\begin{tikzpicture}
\draw [->] (-.25,1)--(-.25,4);
\draw [->] (-.25,1)--(4,1);
\draw [->] (1,2)--(1,3.5);
\draw [->] (1,3.5) .. controls (1.5, 2.5) and (1.75, 2.5) .. (3,2);
\draw [->] (3,2)--(3,1.5);
\draw [->] (3,1.5).. controls (1.5,1.75) and (1.75, 1.75) .. (1,2);
\node at (-.25,2) [left] {$P$};
\node at (2,1) [below] {$V$};
\node at (1,2.75) [left] {1};
\node at (2,2.5) [above] {2};
\node at (3,1.75) [right] {3};
\node at (2,1.7) [below] {4};
\begin{scope}[font=\fontsize{5}{5}\selectfont]
\node at (1,2) [below left] {$(P_1, V_1)$};
\node at (1,3.5) [above left] {$(P_2, V_1)$};
\node at (3,2) [above right] {$(P_3, V_2)$};
\node at (3,1.5) [below right] {$(P_4, V_2)$};
\end{scope}
\end{tikzpicture}
\end{center}
We start with an ensemble with algorithmic pressure $P_1$ and mean length $V_1$. We then trace out a loop built from four parts:
\begin{enumerate}
\item
\emph{Isometric}. We increase the pressure from $P_1$ to $P_2$ while keeping
the mean length constant. No algorithmic work is done on the ensemble
of programs during this step.
\item
\emph{Isentropic.} We increase the length from $V_1$ to $V_2$ while keeping
the number of halting programs constant. High pressure means
that we are operating in a range of runtimes where if we increase
the length a little bit, many more programs halt. In order
to keep the number of halting programs constant,
we need to shorten the runtime significantly. As we gradually
increase the length and lower the runtime, the pressure drops to $P_3$.
The total difference in log runtime is the algorithmic
work done on the ensemble during this step.
\item
\emph{Isometric.} Now we decrease the pressure from $P_3$ to $P_4$ while
keeping the length constant. No algorithmic work is done during this step.
\item
\emph{Isentropic.} Finally, we decrease the length from $V_2$ back to $V_1$
while keeping the number of halting programs constant.
Since we are at low pressure, we need only increase the
runtime a little. As we gradually decrease the length
and increase the runtime, the pressure rises slightly
back to $P_1$. The total increase in log runtime is minus the algorithmic
work done on the ensemble of programs during this step.
\end{enumerate}
The total algorithmic work done on the ensemble per cycle is the difference in log runtimes between steps 2 and 4.
\subsection{Further Relations} \label{maxwell}
From the elementary thermodynamic relations in Section
\ref{elementary}, we can derive various others. For example, the so-called `Maxwell relations' are obtained by computing the second derivatives of thermodynamic quantities in two different orders and then applying the basic derivative relations, Equations (\ref{derivatives1}-\ref{derivatives3}). While trivial to prove, these relations say some things about algorithmic thermodynamics which may not seem intuitively obvious.
We give just one example here. Since mixed partials commute, we have:
\[ \left.\frac{\partial^2 E}{\partial V \partial S}\right|_N =
\left.\frac{\partial^2 E}{\partial S \partial V}\right|_N . \]
Using Equation (\ref{derivatives1}), the left side can be computed as follows:
\[ \left.\frac{\partial^2 E}{\partial V \partial S}\right|_N =
\left.\frac{\partial}{\partial V}\right|_{S,N}
\left.\frac{\partial E}{\partial S}\right|_{V,N} =
\left.\frac{\partial T}{\partial V}\right|_{S,N} \]
Similarly, we can compute the right side with the help of Equation (\ref{derivatives2}):
\[ \left.\frac{\partial^2 E}{\partial S \partial V}\right|_N =
\left.\frac{\partial}{\partial S}\right|_{V,N}
\left.\frac{\partial E}{\partial V}\right|_{S,N} =
-\left.\frac{\partial P}{\partial S}\right|_{V,N} . \]
As a result, we obtain:
\[ \left.\frac{\partial T}{\partial V}\right|_{S,N} =
-\left.\frac{\partial P}{\partial S}\right|_{V,N} . \]
We can also derive interesting relations involving derivatives of the partition function. These become more manageable if we rewrite the partition function in terms of the conjugate variables of the observables $E, V$, and $N$:
\begin{equation}
\label{new.variables}
\beta = \frac{1}{T}\raisebox{0.4ex}{,} \quad
\gamma = \frac{P}{T}\raisebox{0.4ex}{,} \quad
\delta = -\frac{\mu}{T}\raisebox{0.4ex}.
\end{equation}
Then we have
\[ Z = \sum_{x \in X} e^{-\beta E(x) -\gamma V(x) - \delta N(x)} \]
Simple calculations, standard in statistical mechanics
\cite{Reif}, then allow us to compute the mean values of observables as derivatives of the logarithm of $Z$
with respect to their conjugate variables. Here let us revert to using overlines to denote mean values:
\[
\begin{array}{ccc}
\overline{E} &=
\displaystyle{\sum_{x \in X} p(x) \, E(x)} &=
- \displaystyle{\frac{\partial}{\partial \beta} \ln Z} \\
\\
\overline{V} &=
\displaystyle{\sum_{x \in X} p(x) \, V(x)} &=
- \displaystyle{\frac{\partial}{\partial \gamma} \ln Z} \\
\\
\overline{N} &=
\displaystyle{\sum_{x \in X} p(x) \, N(x)} &=
- \displaystyle{\frac{\partial}{\partial \delta} \ln Z}
\end{array}
\]
We can go further and compute the variance of these observables using second derivatives:
\[
\begin{array}{ccc}
{(\Delta E)^2} &=
\displaystyle{\sum_{x \in X} p(x) (E(x)^2 - \overline{E}^2)} &=
\displaystyle{ \frac{\partial^2}{\partial^2 \beta} \ln Z }
\end{array}
\]
and similarly for $V$ and $N$. Higher moments of $E, V$ and
$N$ can be computed by taking higher derivatives of $\ln Z$.
\subsection{Convergence}
So far we have postponed the crucial question of convergence:
for which values of $T,P$ and $\mu$ does the partition function
$Z$ converge? For this it is most convenient to treat $Z$ as a function of the variables $\beta, \gamma$ and $\delta$
introduced in Equation (\ref{new.variables}). For which values of $\beta, \gamma$ and $\delta$ does the partition function converge?
First, when $\beta = \gamma = \delta = 0,$ the contribution of each program is 1. Since there are infinitely many halting programs, $Z(0,0,0)$ does not converge.
Second, when $\beta = 0, \gamma = \ln 2,$ and $\delta = 0,$ the partition function converges to Chaitin's number
\[ \Omega = \sum_{x \in X} 2^{-V(x)}. \]
To see that the partition function converges in this case, consider this mapping of strings to segments of the unit interval:
\begin{center}
\begin{tikzpicture}
\draw (0,2)--(0,0);
\draw (8,2)--(8,0);
\draw (4,1.5)--(4,0);
\draw (2,1)--(2,0);
\draw (6,1)--(6,0);
\draw (1,.5)--(1,0);
\draw (3,.5)--(3,0);
\draw (5,.5)--(5,0);
\draw (7,.5)--(7,0);
\node at (4,1.75) {empty};