-
-
Notifications
You must be signed in to change notification settings - Fork 3
/
pEmit2.e
4862 lines (4618 loc) · 193 KB
/
pEmit2.e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
--
-- pEmit2.e
-- ========
--
--constant TRAP = #1144
--constant TRAP = #1158
--constant TRAP = #118C
--constant TRAP = #1234 -- DEV investigate remaining uses thoroughly, then remove.
-- This is the long-awaited final stage: creating the executable. In the case of -c,
-- that means a PE format .exe/.dll, or ELF format file, whereas when interpreting,
-- that means some allocated memory that gets filled with code (ditto data).
--
-- To summarise, we have read and tokenised the source (ptok.e), parsed and generated
-- intermediate code (pmain.e and psym.e), scanned, analysed, and converted that to
-- a "flat" machine code representation (pilx86.e), and lots of other stuff I have
-- not even mentioned, all before we got to here. If you have not yet done so, then
-- I must strongly suggest running demo\arwendemo\filedump.exw on a few test files,
-- to get a feel for the scope and complexity of the task now at hand, and bear in
-- mind I have kept things as simple as possible! No, really, I have!! Honest!!!
--
--- GOALS (short term)
-- * make symtab[i][S_State] of 0 mean unused/deleted [done]
-- * make symtab[1..15] just 1..15 in the exe, and cope with that as needed. [done]
-- * get T_pathset/T_fileset emit working [done]
-- * get filedump to show mov edx,[] (hw) [done]
-- * try alternate output orders, to defeat false positives... [done, as constant datab4code]
-- * general cleanup... (get the full test set working first!) [compiled done]
-- * 64-bit hello world [done]
-- * plant h4s in gvar table [done]
constant pathslast = 0 -- (broken/never got working)
-- (the idea was that moving an app from one directory or machine to
-- another, should not suddenly make it stop (or start) working.)
--global -- now needed in pbinary.e (23/3/16)
constant mapsymtab = 01 -- (implies newEmit) [expected to be 1 in all releases][DEV] [DEV broken]
integer symtabmax
--global -- now needed in pbinary.e (23/3/16)
sequence symtabmap
--constant mapgvars = 0
--sequence gvarmap (we are assigning them here anyway...)
--DEV newEBP (I think this can go, what with filedump.exw 'n all)
--constant debug = 0
--DEV broken:
--constant show_full_symtab = 0 -- You should NOT want or need to set this to 1.
-- (It was used to improve the use of opCleanup.)
-- If anything, if you are new to symtab dumps you
-- probably want a cut down version - by using the
-- "-nodiag" command line option.
constant isShortJmp = #010000 -- isJmp that has been found to fit in a byte.
-- (fixup as isJmp, patch 5->2 or 6->2)
-- (not global: use only in scanforShortJmp/blurph,
-- and not before, but otherwise like isJmp etc,
-- and therefore must not clash with pglobal.e)
--
-- Fixup code: branch straighten, check for dword offsets that fit in a byte,
-- ========== calculate actual code size, convert opcode indexes to dword
-- offsets into VM, relative addresses and variable indexes into
-- absolute addresses, and adjust all other offsets as necessary
-- by the changing of any dword->byte form instructions. (whew)
--
-- if binding then create .exe file (using specified resource file) else
-- if interpreting just poke to memory and execute it immediately.
--
-- if binding, then we need to write symtab to section 6, whereas
-- if interpreting we can just use the existing symtab directly.
--
-- Note that if binding, there is no source at the point of execution, hence
-- there can be no trace, profile, or profile_time going on.
--
--DEV
-- Also note that critical parts of this process, specifically those where
-- performance matters and anything to do with licencing issues, have been
-- coded in low level assembly (not open source for obvious reasons).
--
-- Technical note/sug: it may be sensible to generate separate constant values by source file,
-- to mitigate/minimise the locking requirements on shared constants, which never felt right.
-- Obviously, if pglobals.e defines global constant phixversion = {0,6,3} then that (single)
-- value is used wherever "phixversion" is used, and the app must be thread-safe/lock aware
-- if it is indeed using "phixversion" for whatever reason in separate threads. However, if
-- barney.e also happens to use "{0,6,3}", that should not interfere in any way whatsoever
-- with phixversion, but as things stand, it /does/, /even/ for constant private = {0,6,3}... [DEV]
--dev/sug:
-- Note: #ilASM treats [0], [4], etc as offsets into the data section. If for some reason you
-- (still, given that DOS days are long since gone) want to reference a fixed literal address,
-- then you must code it using something more like "mov eax,#40000", "mov edx,[eax]".
-- Data Section
-- ============
-- The data section of a Phix-generated executable contains the following:
-- sigPhx, The signature and layout fields really only exist for communication between pemit2.e
-- layout and filedump.exw, and in fact this was (re-)written with the latter firmly in mind.
-- The value of layout has no specific meaning; it can be eg a version, or a bitfield.
-- Note these are 4+4 bytes on both 32 and 64-bit, whereas all following 4/h4 go 8.
-- Update: on file, layout is as above some small number for the use of filedump.exw,
-- but at runtime, that dword [ds+4] is used to hold a 32-bit random number seed. [[ DEV why? see VM\pRand.e (only file that uses it/use a local var in there instead) ]]
-- symptr When this is non-zero (see below), it should be maxgvar*4+40, on 32-bit,
-- or *8+72 on 64-bit, ie it locates symtab[1] rather than slack/maxlen/length etc.
--27/2/15:
-- *NB* For nested opInterp/thread safety, use [vsb_root+8/16] in preference to this.
--23/3/15: (maybe)
-- Note that maxgvar/gvarptr are correctly located via [symptr][T_EBP].
-- Obviously they are right here when compiled, but things get switched about when
-- interpreting...
--DEV: maxop/optable are deprecated; this is now done via symtab[T_optable]
-- (I am however far more inclined to leave a maxop of 0 unused than reclaim it) [DEV will be reclaimed for relocs:]
--< maxop The number of entries in the following table, set from pops2.e/maxNVop2?. [DEV]
--< optable When interpreting, there is already a copy of the heap manager, file i/o, printf(),
--< subscripting, append/prepend, peek/poke, and so on, already in memory, so we may as
--< well use them and save ourselves some time. Note that a maxop of zero is perfectly
--< valid, as is any optable[n], either of which trigger appropriate recompilation.
--< The lack of an opInterpret is also a fair reason to deliberately set maxop to 0.
--< (At the time of writing, there are no confirmed entries, just ideas/suggestions.)
--> relocs When this is non-zero (dll only) points to a reloc-ref table (0=done/none)
-- maxgvar The number of entries in the following table.
--DEV *NB* the correct way to locate maxgvar/gvars[N] is via symtab[T_ds4], since that
-- will match the symtab located via symtabptr @ [ds+8], whereas this ([ds+16] or so) --DEV :%pGetSymPtr
-- matches the one as compiled, and often "wrong" for code invoked via the optable.
-- gvars A gvar is a static global/file-level variable or constant. Each occupies precisely
-- one slot in memory, which is shared by all threads (requiring, of course, that the
-- application use approriate locking) and is unaffected by recursion. In contrast,
-- tvars, which include all parameters and local variables, exist on the (virtual,
-- heap-based) stack, so they can hold private values for different nested/recursive
-- invocations, and likewise are owned and can only be accessed by a single thread.
-- Note that constants may be duplicated on a per-source basis, to reduce the locking
-- requirements/gotchas for multiple threads, especially on reference count updates.
-- (The precise details of such duplications are yet to be designed/confirmed. [DEV])
-- (So pmain.e/psym.e must create additional symtab entries to contain the different
-- S_FPno (plus an S_Slink for var idx, as set below), and this program creates the
-- duplicate copies for each different S_FPno it encounters on the S_Clink chain.
-- As always, the primary driving force here is that what works in standalone code
-- works just as well when it is incorporated into a larger program, and hence any
-- duplication is to keep multiple instances of literals such as "hello" separate,
-- as opposed to a global constant hello="hello" being used in more than one file.) [ DEV: still not happy with this...]
-- BIG PROBLEM: things like {}.......
-- TIP: use repeat(0,0) instead of {} and repeat(' ',0) instead of "" to ensure thread safety... [see test ideas in _TODO.TXT]
-- A reference to gvar[n] can be resolved as (max[NV]op+n)*4+16, or *8+24 on 64-bit,
-- obviously fixup()'d in pbinary.e relative to the start of the data section. Note
-- that when interpreting, a gvar[n] reference directly uses the symtab[n][S_value]. --DEV what about unassigned?
-- symtab The symbol table, required for diagnostics, dynamic routine-id, etc. If you have
-- not yet done so, I must recommend running "p -d -nodiag test.exw" on a few snippets
-- of code, and examining the resulting list.asm, as well as skimming pglobals&psym.e.
-- Note that, unlike optable and gvars, the symtab is a bona fide Phix-style sequence,
-- with a full header including type, length, etc. This allows it to be used directly
-- in hll code (pdiag, prtnid, etc) both when the program is interpreted and compiled.
-- The one exception is [S_value], which may contain the unintialised value #40000000,
-- something which a genuinue hll-built sequence could never possibly contain.
-- The compiler (psym.e/syminit()) preloads the symbol table with over 500 entries at
-- the start, so it ain't exactly small, no matter how hard I try to remove duplicates
-- and unused entries. If -nodiag is specified, and such things as routine_id are not
-- used, or resolved at compile-time, it may be possible to omit the symbol table, and
-- set the pointer at offset 8 to zero, but that has not yet been attempted or tested. **DEV (symptr of 0, for smaller executables) [low priority]
-- Update: note that pcmdln.e uses symtab[T_cmdlnflg] and symtab[T_fileset][1], and
-- abort([0]) also uses symtab[T_cmdlnflg] to decide whether to ExitProcess or ret. [BLUFF/DEV]
-- I will however say that an extra 100K+ per executable (or 0.00004% or 1/2,500,000th
-- of a these-days-quite-small-250Gb drive) is a small price to pay for human-readable
-- error messages and being able to jump directly to the offending source code line.
-- misc float/string/sequence element values for gvar/symtab[i]. Note that some of these
-- may have/need a reference count >1. These are deliberately emitted after everything
-- else, with appropriate entries "backpatched" as we do so, in order to keep the
-- "locate x" code as simple as possible, and make life easier for filedump.exw. Also
-- note these are emitted "inside out", for example if the program source contains say
-- 38 instances of a variable/parameter named "count", the executable/symbol table
-- only needs one "count" label/string, quite possibly with a reference count of 38.
-- This is achieved using tt_traverse() and (eg) ReconstructSequence(), that is instead
-- of "for i=1 to length(symtab/gvars) do recursive_dump(i)" or somesuch.
--
-- 32-bit:
-- 00000000 sigPhx x4 "Phix"
-- 00000004 layout h4 #00000001
-- 00000008 symptr h4 #00001024 (raw address of symtab[1], h8 on 64-bit)
-- - opcodes
--< 0000000C maxop 4 238 (0 /is/ valid, at #00000010 on 64-bit)
--< 00000010 opInit h4 #0000204C opcode[1] (entry point, in code section)
--< ...
--> 0000000C relocs 4 #00003047 raw ptr to ref relocs (dll only, 0=done/none)
-- - gvars:
--< 000003C4 maxgvar 4 785 (at maxop*4+16, or *8+24 on 64-bit)
--> 00000010 maxgvar 4 785 size of following table
-- gvar[1] h4 #40004016 (name replaced, except for temps)
-- gvar[2] h4 #4000406C (see #000101B0)
-- gvar[3] h4 #00000001 1
-- ...
-- - symtab (at maxgvar*4+20, or *8+32)
-- 00001010 slack 4 0 (should be 0)
-- 00001014 maxlen 4 6836 (should be length*4+20, or *8+40)
-- 00001018 length 4 1704
-- 0000101C refcount 4 1 (should be 1, some later on may be >1)
-- 00001020 type h4 #80000000 (should be #80000000)
-- 00001024 symtab[n] (name) h4 #40230023 (see #008C008C) [0 is valid too]
-- ...
-- - gvar[n]/symtab[m]
-- 00002010 slack 4 0 (should be 0)
-- 00002014 maxlen 4 56 (should be length*4+20, or *8+40)
-- 00002018 length 4 12
-- 0000201C refcount 4 1 (should be 1)
-- 00002020 type h4 #80000000 (should be #80000000)
-- 00002024 S_Name h4 #40027436 <desc> eg i (#00004024) [-1 /is/ valid]
-- 00002028 S_NTyp h4 #00000001 S_Const
-- 0000202C S_FPno 4 1
-- 00002030 S_State h4 #00000001 <copy csome code from plist.e>
-- ...
-- 00003047 reloc offset table 1 0 bytes 4..255 are offsets,
-- (ie offset+=byte; dword[ds+offset]+=refdiff)
-- (3 unused/illegal),
-- 2=word offset follows
-- 1=dword offset follows
-- 0=table terminator(byte)
--/*
S_Nlink = 5, -- name chain (see below)
S_Slink = 6, -- scope/secondary chain (see below)
-- constants and variables [S_NTyp<=S_TVar]
S_vtype = 7, -- variable type [see notes below]
S_value = 8, -- value [see note below]
S_Clink = 9, -- constant chain (S_NTyp=S_Const only, see below)
S_Tidx = 9, -- thread idx (S_NTyp=S_Tvar only)
S_ErrV = 10, -- {'v', file, line, col}; see pmain.e[-35]
--DEV not newEmit:
S_ConstChain = 10, -- see notes below (constant ref/count optimisations)
S_Init = 11, -- Initialised chain (known init if non-0/see S_Const note below)
S_ltype = 12, -- local type (see pltype.e)
S_maxlv = 13, -- last entry for var (see pltype.e)
S_gInfo = 14, -- (see note below)
S_gNew = 15,
-- namespaces
S_nFno = 7, -- namespace fileno [see note below]
-- routines [S_NTyp>=S_Type]
S_sig = 7, -- routine signature, eg {'F',T_integer} (nb S_sig must be = S_vtype)
S_Parm1 = 8, -- first parameter. (idx to symtab, follow S_Slink)
S_ParmN = 9, -- minimum no of parameters (max is length(S_sig)-1)
S_Ltot = 10, -- total no of parameters, locals, and temporary vars
-- (needed to allocate the stack frame space)
S_il = 11, -- intermediate code (also backpatch list)
S_ltab = 12, -- line table
S_1stl = 13, -- first line
S_Efct = 14, -- side effects
S_ErrR = 15 -- {'R', file, line, col}; see pmain.e[-60]
--*/
-- 64-bit:
-- 00000000 sigPhx x4 "Phix"
-- 00000004 layout h4 #00000001
-- 00000008 symptr h8 #0000000000000068 (raw address of symtab[1])
-- - opcodes
--< 00000010 maxop 8 1 (0 /is/ valid)
--< 00000018 opInit h8 #000000000000204C opcode[1] (entry point, in code section)
--< ...
--> 00000010 relocs 8 #0000000000003047 raw ref relocs (dll only, 0=done/none)
-- - gvars
--< 00000020 maxgvar 8 3 (at maxop*8+24)
--< 00000028 gvar[1] h8 #4000000000004016 (name replaced, except for temps)
--< 00000030 gvar[2] h8 #400000000000406C (see #000101B0)
--< 00000038 gvar[3] h8 #0000000000000001 1
--> 00000018 maxgvar 8 3 (at maxop*8+24)
--> 00000020 gvar[1] h8 #4000000000004016 (name replaced, except for temps)
--> 00000028 gvar[2] h8 #400000000000406C (see #000101B0)
--> 00000030 gvar[3] h8 #0000000000000001 1
-- ...
-- - symtab (at (maxop+maxgvar)*8+32)
-- 00000040 slack 8 0 (should be 0)
-- 00000048 maxlen 8 248 (should be length*8+40)
-- 00000050 length 8 26
-- 00000058 refcount 8 1 (should be 1, others may be >1)
-- 00000060 type h8 #8000000000000000 (should be #8000000000000000[+1])
-- 00000068 symtab[n] (name) h8 #4000000000230023 (see #008C008C) [0 is valid too]
-- ...
-- - gvar[n]/symtab[m]
-- 00002010 slack 8 0 (should be 0)
-- 00002018 maxlen 8 116 (should be length*8+20)
-- 00002020 length 8 12
-- 00002028 refcount 8 1 (should be 1)
-- 00002030 type h8 #8000000000000000 (should be #8000000000000000[+1])
-- 00002038 S_Name h8 #4000000000027436 <desc> eg i (#00004024) [-1 /is/ valid]
-- 00002040 S_NTyp h8 #0000000000000001 S_Const
-- 00002038 S_FPno 8 1
-- 00002050 S_State h8 #0000000000000001 <copy some code from plist.e>
--integer maxop
integer maxgvar
--
--string mzpe
--integer
-- ImageBase -- dword @ #B4, must be #400000
--integer SectionAlignment, -- dword @ #B8, must be #1000
-- FileAlignment, -- dword @ #BC, must be #200
-- SAless1, -- #00000FFF for rounding to section alignment
-- SAmask, -- #FFFFF000
-- FAless1, -- #000001FF for rounding to file alignment
-- FAmask, -- #FFFFFE00
-- SubsystemVersion, -- dword @ #C8, must be ssv310 or ssv400
-- SizeOfImage, -- dword @ #D0, rounded up to SectionAlignment, eg #8000
-- Subsystem, -- word @ #DC, must be CUI or GUI
-- ITaddr, -- RVA Import Table address, == ISvaddr -- dword @ #100, eg #2000
-- ITsize, -- RVA Import Table size, == ISvsize -- dword @ #104, eg #31D
-- RTaddr, -- RVA Resource Table address == RSvaddr -- dword @ #108, eg #5000
-- RTsize -- RVA Resource Table size == RSvsize -- dword @ #10C, eg #504
--integer
-- -- 6 sections: -- offset:examples vsize vaddr rsize raddr
-- DVvsize, DVvaddr, DVrsize, DVraddr, -- data for vm (fixed) -- #180:#54C #184:#1000 #188:#600 #18C:#400
-- ISvsize, ISvaddr, ISrsize, ISraddr, -- import section (fixed) -- #1A8:#31D #1AC:#2000 #1B0:#400 #1B4:#A00
-- VMvsize, VMvaddr, VMrsize, VMraddr, -- the virtual machine (fixed) -- #1D0:#1ED8 #1D4:#3000 #1D8:#2000 #1DC:#E00
-- RSvsize, RSvaddr, RSrsize, RSraddr, -- resource section (var len) -- #1F8:#504 #1FC:#5000 #200:#600 #204:#2E00
-- CSvsize, CSvaddr, CSrsize, CSraddr, -- user code (var start & len) -- #220:#10 #224:#6000 #228:#200 #22C:#3400
-- DSvsize, DSvaddr, DSrsize, DSraddr -- user data (var start & len) -- #248:#4 #24C:#7000 #250:#200 #254:#3600
--
--constant
-- ssv310 = #000A0003, -- SubsystemVersion: 3.10
-- ssv400 = #00000004, -- SubsystemVersion: 4.00
-- CUI = 3, -- Subsystem: console app
-- GUI = 2 -- Subsystem: gui app
--
constant --WORD = 2,
DWORD = 4,
QWORD = 8
-- bytemul = {0,#100,#10000,#1000000}
--DEV *2
--global
--string divm -- used by p2asm.e if dumpVM=1
-- verify the compiler is working properly:
--!/**/ #isginfo{divm,0b0100,MIN,MAX,integer,-2} -- 0b1000 better?! (see aside in readdivm())
--!/**/ #isginfo{divm,0b1000,MIN,MAX,integer,-2} -- (nb 0b1100 (ie 12) is /worse/ )
--!/**/ #isginfo{divm,0b1100,MIN,MAX,integer,-2} -- I can live with this...
--!/**/ #isginfo{divm,0b1000,MIN,MAX,integer,-2} -- Yay! (23/02/10)
--!/**/ #isginfo{divm,0b1000,MIN,MAX,integer,-1} -- OK? (24/06/10)
--!/**/ #isginfo{divm,0b1000,MIN,MAX,integer,-2} -- Yay! (18/01/12)
--function divmDword(integer i)
---- NB: i is 0-based
-- return divm[i+1]+divm[i+2]*#100+divm[i+3]*#10000+divm[i+4]*#1000000
--end function
--procedure setdivm(integer offset, atom v, integer dsize)
---- breakup v into dsize bytes in divm at offset.
---- used I think only to locate symtab and threadstack.
---- Note that offset as passed is 0-based, adjusted here(+1) to index divm.
---- dsize is WORD or DWORD
-- for i=1 to dsize do
-- divm[offset+i] = and_bits(v,#FF)
-- v = floor(v/#100)
-- end for
--end procedure
integer outfn,
-- fnr,
-- asmoptions,
-- vmaxpos
$
--sequence Names -- eg {"kernel32.dll",...}
--sequence HintNames -- eg {{{#40470,...},{"AllocConsole",...}},{..}}
--sequence thunkaddrs -- \ scratch vars, set from HintNames[i],
--sequence thunknames -- / where i is eg find("kernel32.dll",Names)
--constant dorsrc = 0
--sequence rsfilename
--integer rsrcRSraddr, rsrcRSrsize, rsrcRSvsize, rsrcCSvaddr, rsrcCSraddr
-- used in p2asm.e, plist.e
--DEV (USING THE ONE IN PEMIT.E!)
--global sequence code_section
-- code_section = repeat(0,rand(1000)) -- DEV Temp!
--sequence data_section
--type dst(sequence d)
-- return string(d)
--end type
--dst data_section
sequence data_section
--DEV (7/7/17) broken on lnx64...
--!/**/ #isginfo{data_section,0b1000,MIN,MAX,integer,-2} -- verify this is a string
function isString(object x)
-- avoid "probable logic errors" testing that data_section really is a string
-- (because p.exw contains "without type_check"....)
return string(x)
end function
if isString(0) then end if -- and prevent the compiler from optimising it away!
constant m4 = allocate(4),
m44 = {m4,4},
m42 = {m4,2}
procedure setcsDword(integer i, atom v)
-- set a dword in code_section
-- NB: offset passed here is 1-based index
poke4(m4, v) -- faster than doing divides etc. (idea from database.e)
code_section[i..i+3] = peek(m44)
end procedure
--DEV setcsQword?
--function gets5Dword(integer i)
---- used to get routine no for patching parameter info on forward calls
---- NB: i is 1-based
-- return s5[i]+s5[i+1]*#100+s5[i+2]*#10000+s5[i+3]*#1000000
--end function
--procedure sets5Dword(integer i, atom v)
---- used for patching parameter info on forward calls
---- NB: offset passed here is 1-based index
-- poke4(m4, v)
-- s5[i..i+3] = peek(m44)
--end procedure
--DEV sets5Qword?
function getdsDword(integer i)
-- used to get refcount for subsequence/substring patch
-- NB: i is 1-based (** unlike setdsDword **)
return data_section[i]+data_section[i+1]*#100+data_section[i+2]*#10000+data_section[i+3]*#1000000
end function
--NO: get two dwords independently
-- only properly useful for (31-bit, or at least <53-bit) integer results
function getdsQword(integer i) -- (integer result, see notes below [DEV])
-- NB: i is 1-based (** unlike setdsDword **)
--atom res = data_section[i+7]
integer res = data_section[i+7]
for j=i+6 to i by -1 do
res = res*#100+data_section[j]
end for
return res
end function
procedure setdsDword(integer i, atom v)
-- NB: i is 0-based index
poke4(m4, v)
data_section[i+1..i+4] = peek(m44)
end procedure
-- CAUTION:
-- In a purely 32-bit world, 64-bit floats have 53 bits of precision, so eg poke4(addr,#8000_0001) is fine.
-- In a purely 64-bit world, 80-bit floats have 64 bits of precision, so the equivalent is also fine.
-- However, a 32-bit app trying to poke8(a,#8000_0000_0000_0001), as might happen when a 32-bit compiler
-- is asked to produce a 64-bit executable, is, quite simply, going to go badly wrong.
-- The far better idea is to poke two dwords independently, without ever adding/shifting things 'n wotnot.
--DEV make me a builtin/autoinclude:
--global
--procedure poke8(atom addr, object v)
--atom vi
-- if atom(v) then
-- poke4(addr,and_bits(v,#FFFFFFFF))
-- poke4(addr+4,floor(v/#100000000))
-- else
-- for i = 1 to length(v) do
-- vi = v[i]
-- poke4(addr,and_bits(vi,#FFFFFFFF))
-- addr += 4
-- poke4(addr,floor(vi/#100000000))
-- addr += 4
-- end for
-- end if
--end procedure
--7/4/16 values such as -#FFFFFFFF are now valid for 32-bit p.exe creating 64-bit exe's:
--procedure setdsQword(integer i, integer v)
procedure setdsQword(integer i, atom v)
-- NB: i is 0-based index
if isFLOAT(v) then ?9/0 end if
poke4(m4, and_bits(v,#FFFFFFFF))
data_section[i+1..i+4] = peek(m44)
poke4(m4, floor(v/#100000000))
data_section[i+5..i+8] = peek(m44)
end procedure
integer dsidx -- 1-based index to data_section
integer dsize -- DWORD or QWORD
--procedure setds(atom v, integer sethigh=0, integer didx=dsidx) -- (DEV/SUG)
procedure setds(atom v, integer sethigh=0)
--DEV a bit of #ilASM might be even better...
-- set a dsize value at dsidx to v.
-- if the target should be a ref, we add #80000000 here, which is
-- "shr2+#20000000" in pbinary.e to create a "shr2+#40000000" ref.
if dsize=DWORD then
if sethigh then v += #80000000 end if
poke4(m4, v)
data_section[dsidx..dsidx+3] = peek(m44)
dsidx += 4
else
poke4(m4, and_bits(v,#FFFFFFFF))
data_section[dsidx..dsidx+3] = peek(m44)
dsidx += 4
v = floor(v/#100000000)
if sethigh then v += #80000000 end if
poke4(m4, and_bits(v,#FFFFFFFF))
data_section[dsidx..dsidx+3] = peek(m44)
dsidx += 4
end if
end procedure
procedure appenddsDword(atom v)
-- (aside: v is an atom mainly for 32-bit p.exe creating 64-bit exe's,
-- but otherwise should always be integer for 32->32 & 64->64)
string s
if isFLOAT(v) then ?9/0 end if
poke4(m4, v) -- faster than doing divides etc. (idea from database.e)
s = peek(m44)
data_section &= s
if X64 then
v = floor(v/#100000000)
poke4(m4, v)
s = peek(m44)
data_section &= s
end if
end procedure
procedure appenddsType(integer t)
--DEV 30/7/2013 plant a dummy (illegal) delete_routine:
-- data_section = append(data_section,0)
data_section = append(data_section,1)
data_section = append(data_section,0)
data_section = append(data_section,0)
if X64 then
data_section = append(data_section,0)
data_section = append(data_section,0)
data_section = append(data_section,0)
data_section = append(data_section,0)
end if
data_section = append(data_section,t)
end procedure
procedure appenddsBytes(sequence s)
integer ch
for i=1 to length(s) do
ch = and_bits(s[i],#FF)
data_section = append(data_section,ch)
end for
--if not isString(data_section) then ?9/0 end if
end procedure
procedure APIerror(integer i, string msg)
sequence x = APIerritem[i]
fileno = x[1]
tokline = x[2]
tokcol = x[3]
Abort(msg)
end procedure
without trace
integer d_addr
sequence s5sizes, s5v
--, s5symn
integer thisCSsize
--integer vi -- index into symtab
integer vi_active -- index into symtab
integer opbyte, i3
constant call_rel32 = #E8 -- 0o350 imm32 -- call rel32
constant jump_rel32 = #E9 -- 0o351 imm32 -- jmp rel32
--with trace
procedure scanforShortJmp(integer vi)
--
-- At heart, this routine scans for/counts/flags any possible
-- jmp dword (#E9 xx xx xx xx) to jmp byte (#EB xx) (ie 5->2)
-- and jcc dword (#0F 8x xx xx xx xx) to jcc byte (#7x xx) (ie 6->2)
-- obviously, when the offset will fit in a byte, that is.
--
-- It also removes "isDead" blocks, including associated fixups to the LineTab,
-- links up routines to be processed/fixups opFrame parameter info, and
-- just to be flash about it, performs branch straightening...
--
-- Note that while all offsets are adjusted by isDead removed, we cannot be
-- certain of the final values as affected by isShortJmp until the loop quits
-- (because someDone=0), by which time it is too late, hence we recalculate
-- them in (the eloquently named) blurph(), using the linked lists setup here.
--
integer short52, -- master counts for entire routine
short62, -- ""
--dead5j,
--dead6j,
someLeft, -- more to do: phase 1: 1=some backward jumps remain
-- 2=some fwd but no backward jumps
-- phase 2: resets 2->0 as it rescans the fwd (only).
-- else 0 means we've done the lot (all isShortJmp'd)
someDone, -- outer loop control (phase 3)
jmpOpRetf, -- control flow flag for branch straightening to opRetf, phase 1
i, -- gp idx to s5 (+1 in phase 1, linked list in phase 2/3)
c, -- s5[i] etc. nb long active liverange, take special care
c2, -- scratch version of c when we don't want to damage it.
k, -- gp dogsbody
-- kfirst, -- copy of [S_Parm1]
klast, -- jmp tgt limit for chainwalks
s5len, -- length(s5)
deadCode, -- phase 1:flag, phase 2: LineTab fixup & final sanity check
first, -- start of linked list of isDead/isJmp
last, -- end of linked list of isDead/isJmp
next, -- scratch chainwalk var
prev, -- ""
chunkend, -- block end for code packing (phase 2)
oidx, -- output idx for code packing (phase 2)
ltj -- LineTab entry (phase 2)
integer offset -- offset of the jump currently being examined
--atom offset -- offset of the jump currently being examined
--object symk -- symtab entry for opFrame patch (phase 1)
--integer u -- "" used flag
--trace(1)
--
-- Phase 1 - dead code removal: linkup all isDead and isJmp entries,
-- ======= correct offsets on backward jumps by isDead jumped over,
-- fixup opFrame/add to chain of routines to process, (now done in pilx86.e)
-- and perform branch straightening.
--
-- (phase 2 corrects fwd jmp offsets and removes isDead blocks)
-- (phase 3 completes the isJmp --> isShortJmp flagging)
--
-- A small pseudo-example:
--
-- s5 input: s5 output (after all 3 phases):
-- 1: {1A,2B,3C,4D, 1: {1A,2B,3C,4D,
-- 5: isDead+3,0,0, 5: 8E,9F,10,
-- 8: 8E,9F,10, 8: isShortJmp,15,0,-8,
-- 11: isJmp,0,0,-11, 12: 15,16,17,
-- 15: 15,16,17, 15: isShortJmp,0,8,3,
-- 18: isJmp,0,0,6, 19: 22,23,24,
-- 22: 22,23,24, 22: 28,29,30,31}
-- 25: isDead+3,0,0,
-- 28: 28,29,30,31}
--
-- Where nn: is just a guide index to show where s5[nn] is,
-- 1A,2B,3C,4D, .. 31 represent as-is x86 binary,
-- the -11 and 6 after isJmp are relative offsets, which
-- as you can see are adjusted to -8 and 3 respectively
-- after removal of the isDead blocks they jump over[*1],
-- the 15 and 8 after isShortJmp [*2] are indexes to
-- the next/prev is[Short]Jmp.
-- Note [*1] jump offsets are /not/ adjusted for jumping
-- over isShortJmps here, but in blurph().
-- [*2] temporary, ie those two byte positions will
-- be ignored/clobbered in blurph().
-- The updates as shown above are performed "in situ" within s5.
--
short52 = 0
short62 = 0
--dead5j = 0
--dead6j = 0
if q86 then
--trace(1)
s5len = length(s5)-5
deadCode = 0
someLeft = 0
first = s5[s5len+1]
last = s5[s5len+2]
--DEV if q86 is 2 then +3/+4 are other (ie isIL etc) first/last
i = first
while i do
c = s5[i]
next = s5[i+1]
--DEV no longer used under oldil...
if c>=isDead then
c -= isDead
if DEBUG then
if c<3 then ?9/0 end if
end if
deadCode = 1 -- flag some found
-- 14/4/10 all isAddr now left as-is, otherwise it fouls up error handling...
else -- isAddr/isJmp
-- elsif c!=isAddr then
i3 = i+3
offset = s5[i3]
--
-- Branch Straighten:
-- (NB: fairly obvious but worth stating, branch straightening
-- does not occur if there is an opLnt/p/pt in the way.)
--
jmpOpRetf = 0
--DEV 21/11/10 I think this is here because it messes up error handling?
--if c!=isAddr then
if c>isAddr then
while 1 do
c2 = i+4+offset -- target addr
opbyte = s5[c2]
if opbyte!=jump_rel32 then exit end if
-- target is an unconditional jump:
c2 += 1
opbyte = s5[c2]
c2 += 3
--DEV try c2 = s5[c2] here...
if opbyte<isJmp or opbyte>isShortJmp then -- nb backwd jump may hit an isShortJmp
if opbyte=isOpCode and s5[c2]=opRetf then
-- we need to unlink this from our chain...
prev = s5[i+2]
if prev then
s5[prev+1] = next
else
first = next
end if
if next then
s5[next+2] = prev
end if
-- special case: jmp/jcc,isJmp,0,0,offset to #E9,isOpCode,0,0,opRetf:
-- --> jmp/jcc,isOpCode,0,0,opRetf (nb opRetf only)
s5[i] = isOpCode
puts(1,"pemit2.e line 1306 (opRetf)\n")
?9/0
-- (s5[8]!=isJmpG or s5[11]!=tt[aatidx[opRetf]+EQ])) then
s5[i3] = opRetf
-- end if
jmpOpRetf = 1 -- no linkup, resume in outer loop
end if
exit
end if
c2 = s5[c2]+5
offset += c2 -- ;-)
s5[i3] = offset
--31/1/21
if c2=0 then exit end if
end while
end if
if c!=isBase then
if not jmpOpRetf then
if offset<=0 then
--
-- walk back down the chain, adjust offset by any isDead we leap over:
--
k = s5[i+1]
klast = i+4+offset
while k and k > klast do
c2 = s5[k]
if c2>isDead then
c2 -= isDead
offset += c2
end if
k = s5[k+2] -- k:=prev
end while
s5[i3] = offset
--24/4/2013:
-- if c!=isAddr then
if c!=isAddr
and s5[i-1]!=call_rel32 then -- (there is no "call byte offset" instruction on the x86, only jmp)
-- test as if this becomes a short jump:
if s5[i-1]=jump_rel32 then
k = (offset>=-131) -- += 3
else
k = (offset>=-132) -- += 4
end if
if k then
--if i=13 then ?9/0 end if
s5[i] = isShortJmp
if s5[i-1]=jump_rel32 then
short52 += 1
else
short62 += 1
end if
else
someLeft = 1
end if
end if
elsif not someLeft then
someLeft = 2
end if
end if
end if
end if
-- i = s5[i+1]
i = next
end while
else -- (not q86)
?9/0 --isBase will not work here
s5len = length(s5)
deadCode = 0
someLeft = 0
first = 0
last = 0
i = 1
jmpOpRetf = 0
while i<=s5len do
c = s5[i]
if c<isOpCode then -- as-is byte
i += 1
-- elsif c=isOpCode then
-- i3 = i+3
-- c = s5[i3]
-- i += 4
-- elsif c<isAddr then -- c=isVar/isILa/isIL
-- elsif c<isAddr then -- c=isOpCode/isVar/isILa/isIL
elsif c<isAddr then -- c=isOpCode/isAPIFn/isVar/isConstRef/isConstRefCount/isILa/isIL
i += 4
else -- c=isAddr/isJmp/isDead (no isShortJmp yet, btw)
--DEV no longer used under oldil...
if c>=isDead then
c -= isDead
if DEBUG then
if c<3 then ?9/0 end if
end if
deadCode = 1 -- flag some found
else -- isAddr/isJmp
i3 = i+3
offset = s5[i3]
--
-- Branch Straighten:
-- (NB: fairly obvious but worth stating, branch straightening
-- does not occur if there is an opLnt/p/pt in the way.)
--
while 1 do
c2 = i+4+offset -- target addr
opbyte = s5[c2]
if opbyte!=jump_rel32 then exit end if
-- target is an unconditional jump:
c2 += 1
opbyte = s5[c2]
c2 += 3
if opbyte<isJmp or opbyte>isShortJmp then -- nb backwd jump may hit an isShortJmp
if opbyte=isOpCode and s5[c2]=opRetf then
-- special case: jmp/jcc,isJmp,0,0,offset to #E9,isOpCode,0,0,opRetf:
-- --> jmp/jcc,isOpCode,0,0,opRetf (nb opRetf only)
puts(1,"pemit2.e line 1421 (opRetf)\n")
?9/0
-- (s5[8]!=isJmpG or s5[11]!=tt[aatidx[opRetf]+EQ])) then
s5[i] = isOpCode
s5[i3] = opRetf
i += 5
jmpOpRetf = 1 -- no linkup, resume in outer loop
end if
exit
end if
c2 = s5[c2]+5
offset += c2 -- ;-)
s5[i3] = offset
end while
if not jmpOpRetf then
if offset<=0 then
--
-- walk back down the chain, adjust offset by any isDead we leap over:
--
k = last
klast = i+4+offset
while k and k > klast do
c2 = s5[k]
if c2>isDead then
c2 -= isDead
offset += c2
end if
k = s5[k+2] -- k:=prev
end while
s5[i3] = offset
if c!=isAddr then
-- test as if this becomes a short jump:
if s5[i-1]=jump_rel32 then
k = (offset>=-131) -- += 3
else
k = (offset>=-132) -- += 4
end if
if k then
--if i=13 then ?9/0 end if
s5[i] = isShortJmp
if s5[i-1]=jump_rel32 then
short52 += 1
else
short62 += 1
end if
else
someLeft = 1
end if
end if
elsif not someLeft then
someLeft = 2
end if
c = 4 -- remainder of isJmp, as opposed to isDead bytes to be skipped
end if
end if
if jmpOpRetf then
jmpOpRetf = 0
else
--
-- linkup item:
--
s5[i+1] = 0 -- <next> = <end of chain>
s5[i+2] = last -- <prev> = last
if last then
s5[last+1] = i -- last's <next> := this
else
first = i
end if
last = i
i += c
end if
end if
end while
end if
--
-- Phase 2 - dead code removal: follow isDead/isJmp chain,
-- ======= fixup fwd isJmp offsets by isDead jumped over,
-- fixup LineTab entries by isDead blocks, and
-- pack code (remove isDead blocks, relink chain w/o them)
--
if deadCode then
if someLeft=2 then -- all backward jumps were in range, so reset,
someLeft = 0 -- as we are now going to rescan all fwd jumps.
end if
i = first
first = 0 -- create a brand new chain
last = 0
while 1 do
c = s5[i]
if c>isDead then
-- from now on, we have to pack the code.
-- (one of the most likely places for an isDead is a return statement;
-- avoid any needless s5[i]=s5[i] byte copying as much as possible)
oidx = i -- output idx
--
-- deal with that pesky LineTab thing first though...
--
deadCode = c-isDead
for j=1 to length(LineTab) do
ltj = LineTab[j]
if ltj>=i then -- skip -ve and <i entries at start
klast = s5[i+1]
k = j
while 1 do
while klast and ltj>=klast do
c2 = s5[klast]
if c2>isDead then
deadCode += c2-isDead
end if
klast = s5[klast+1]
end while
LineTab[k] = ltj-deadCode
k += 1
if k>length(LineTab) then exit end if
ltj = LineTab[k]
if ltj<0 then
k += 1
ltj = LineTab[k]
end if
end while
exit
end if
end for
--last = i
--
-- now onto copying the non-Dead bytes:
--
deadCode = 0
while i do -- process isDead blocks (outer loop)
c -= isDead
deadCode += c -- count size in bytes (for sanity check)
next = s5[i+1]
while 1 do -- process/relink isJmp entries (inner loop)
-- (c=4 here when processing consecutive isJmp entries, btw)
if next then
-- copy bytes i+c..next-1 to oidx++
chunkend = next-1
else
-- copy bytes i+c..s5len to oidx++
chunkend = s5len
-- (aside: next=0 which soon exits)
end if
for j=i+c to chunkend do
c2 = s5[j]
s5[oidx] = c2
oidx += 1
end for
i = next
if i=0 then exit end if -- *ALL DONE*
c = s5[i]
if c>=isDead then exit end if -- resume in outer loop
-- isJmp/isShortJmp/isAddr, copy and relink:
next = s5[i+1]
--
-- check for fwd jmp
--
i3 = i+3
offset = s5[i3]
if offset>0 then -- nb can/must not be isShortJmp yet
--
-- walk up the chain, adjust offset by any isDead we leap over:
--
k = next
-- klast = next+4+offset
klast = i+4+offset
while k and k < klast do
c2 = s5[k]
if c2>isDead then