-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes.txt
617 lines (390 loc) · 22.1 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
LANGUAGE NOTES
Nice ideas in Kotlin
- class members public by default.
- @Override
Terrible ideas in Kotlin
- static class methods go to ugly unacceptable companion object thing.
DOCU for REENTRANT SCANNERS: http://sector7.xray.aps.anl.gov/~dohnarms/programming/flex/html/Reentrant-Functions.html
/////////////////
FLEX/BISON NOTES
////////////////
Locations: a location has 2 positions: begin and end.
Those indicate where the token starter, and where it ended.
The strategy that we follow is that we set the start and end to equal values before every yylex call.
When we match a rule, we progress the end value, leaving the begin intact.
- YY_USER_ACTION Executed each time a rule matches, after yytext and yyleng were set, but before the action is triggered.
- location.step() changes the begin to be equal to the end. It does not affect the end.
////////////////
LLVM NOTES
////////////////
Context:
It is EVERYTHING! This is the wrapper of all Modules.
Module:
Think of it as a source file.
Eventhow, this definition can become a bit mode flexible, if it has to. (For example, it could mean a class also)
IRBuilder:
http://llvm.org/doxygen/classllvm_1_1IRBuilder.html
http://llvm.org/doxygen/classllvm_1_1IRBuilderBase.html
A uniform API for creating instructions and inserting them into a basic block:
either at the end of a BasicBlock, or at a specific iterator location in a block.
The builder "remembers" in which block we are inserting commands into. dont forget to call builder.SetInsertPoint()
before you insert anything.
Function:
http://llvm.org/doxygen/classllvm_1_1Function.html
This represents a Function in all it's mighty!
BasicBlock:
http://llvm.org/doxygen/classllvm_1_1BasicBlock.html#details
This represents a single basic block in LLVM. A basic block is simply a container of instructions that execute sequentially.
THis inherits from Value. Thus, it can be referenced by instructions as such.
For example, control flow.
/////////
EUL NOTES
/////////
Every type can have its own:
definition of ==
definition of []
definition of toBoolean
definition of toString
constructor by literal assignment (array literal, number literal etc)
llvm will produce an ir asm file
then, use llc testOut to create a .s assembly file
/////////////////////////
COMPILING PROCCESS - llvm
/////////////////////////
We initialize llvm before parsing.
This means that we can fully access llvm calls inside bison file.
Bison can talk directly to EulCodeGenContext. This can be helpful some times, for example to use actual llvm types in the AST.
Still, we will have to parse and save the whole AST before the actual parsing will start.
During this parsing, we can have a simple strategy, that wil shape EUL language'a nature:
In the begining we have the global scope, and a main function. The whole program lives in this scope.
Every time a scope opens (including the global one) two things happen: the pre parsing and the regular parsing.
PRE-PARSING
- It happens within bison.
- It processes only about declaration statements that do not need to be forward declared.
- This includes function declarations, class and struct declarations, type definitions.
- Sometimes this can mean digging a little within the statement, like class definitions a bit to build the type information.
- We DO NOT execute statements, we DO NOT evaluate expressions, even if we encounter one. We do not process sub scopes.
USEFUL BECAUSE: Everything that we setup here can be used wherever in the scope, and its sub-scopes, even before it actually appears on the file!
GOAL ACHIEVED: We do not need to forward declare those symbols.
GOAL ACHIEVED: We can implement a scoped pre-processor. This can be useful for all sort of stuff.
REGULAR PARSING
- We evaluate expressions and parse statements.
- We dive into the sub scopes: every sub scope will perform its own pre-parsing and parsing.
The global scope consists of all the source files, one after the other in the order with they were imported.
We will pre-parse them all, and then regular parse them all, then we are done.
//////////////////////////
//////////LLVM///////////
/////////////////////////
===== MODULE =====
- functions
- global variables
- symbol table entries
In general, a module is made up of a list of global values.
NOTE: Both functions and global variables are global values.
Global values are represented by a pointer to a memory location.
Every global value has a linkage type and a Visibility styles:
==== Linkage Types ====
APPLIES TO: Global variables and functions.
DEFAULT: external
NOTE: function declaration can be ONLY external or extern_weak
- Possible values -
vvvvvvvvvvvvvvv
private:
Global values with “private” linkage are only directly accessible by objects in the current module.
This doesn’t show up in any symbol table in the object file.
internal
Similar to private, but the value shows as a local symbol (STB_LOCAL in the case of ELF) in the object file.
This corresponds to the notion of the ‘static‘ keyword in C.
available_externally
Globals with “available_externally” linkage are never emitted into the object file corresponding to the LLVM module.
They exist to allow inlining and other optimizations to take place given knowledge of the definition of the global, which is known to be somewhere outside the module. Globals with available_externally linkage are allowed to be discarded at will, and are otherwise the same as linkonce_odr. This linkage type is only allowed on definitions, not declarations.
linkonce
Globals with “linkonce” linkage are merged with other globals of the same name when linkage occurs.
This can be used to implement some forms of inline functions, templates, or other code which must
be generated in each translation unit that uses it, but where the body may be overridden with a
more definitive definition later.
Unreferenced linkonce globals are allowed to be discarded!!!
Note that linkonce linkage does not actually allow the optimizer to inline the body of this
function into callers because it doesn’t know if this definition of the function is the
definitive definition within the program or whether it will be overridden by a stronger definition.
To enable inlining and other optimizations, use “linkonce_odr” linkage.
weak
“weak” linkage has the same merging semantics as linkonce linkage, except that unreferenced
globals with weak linkage may not be discarded.
This is used for globals that are declared “weak” in C source code.
common
Symbols with “common” linkage are merged in the same way as weak symbols,
and they may not be deleted if unreferenced.
common symbols may not have an explicit section, must have a zero initializer,
and may not be marked ‘constant‘. Functions and aliases may not have common linkage.
appending
“appending” linkage may only be applied to global variables of pointer to array type.
When two global variables with appending linkage are linked together, the two
global arrays are appended together.
This is the LLVM, typesafe, equivalent of having the system linker append together “sections”
with identical names when .o files are linked.
extern_weak
The semantics of this linkage follow the ELF object file model: the symbol is weak until linked,
if not linked, the symbol becomes null instead of being an undefined reference.
linkonce_odr, weak_odr
Some languages allow differing globals to be merged, such as two functions with different semantics.
Other languages, such as C++, ensure that only equivalent globals are ever merged
(the “one definition rule” — “ODR”). Such languages can use the linkonce_odr and weak_odr linkage types
to indicate that the global will only be merged with equivalent globals.
These linkage types are otherwise the same as their non-odr versions.
external
If none of the above identifiers are used, the global is externally visible, meaning that it
participates in linkage and can be used to resolve external symbol references.
==== Calling Conventions ====
//TODO read about Calling Conventions
==== Visibility styles ====
APPLIES TO: Global Variables and Functions
DEFAULT: default
NOTE: A symbol with internal or private linkage must have default visibility.
- “default” - Default style
On targets that use the ELF object file format, default visibility means that the
declaration is visible to other modules.
In shared libraries, it means that the declared entity may be overridden.
- “hidden” - Hidden style
Two declarations of an object with hidden visibility refer to the same object
if they are in the same shared object.
Usually, hidden visibility indicates that the symbol will not be placed into the dynamic
symbol table, so no other module (executable or shared library) can reference it directly.
- “protected” - Protected style
On ELF, protected visibility indicates that the symbol will be placed in the dynamic symbol table,
but that references within the defining module will bind to the local symbol.
That is, the symbol cannot be overridden by another module.
==== DLL Storage Classes ====
//TODO read about DLL Storage Classes
==== Thread Local Storage Models ====
APPLIES TO: Global Variables, Functions and Aliases
- localdynamic: For variables that are only used within the current shared library.
- initialexec: For variables in modules that will not be loaded dynamically.
- localexec: For variables defined in the executable and only used within it.
NOTE: A model can also be specified in a alias, but then it only governs how the alias is accessed. It will not have any effect in the aliasee.
==== Structure Types ====
GLOBAL VARIABLE DECLARATION
Global variables define regions of memory allocated at compilation time instead of run-time.
Global variable definitions must be initialized.
Global variables in other translation units can also be declared, in which case they don’t have an initializer.
A variable may be defined as a global constant, which indicates that the contents of the variable will never
be modified (enabling better optimization, allowing the global data to be placed in the read-only section
of an executable, etc).
Note that variables that need runtime initialization cannot be marked constant
as there is a store to the variable.
Global variables always define a pointer to their “content” type because they describe a regio n of memory.
Global variables can be marked with unnamed_addr which indicates that the address is not significant, only the content.
An explicit alignment may be specified for a global, which must be a power of 2.
If not present, or if the alignment is set to zero,
the alignment of the global is set by the target to whatever it feels convenient.
If an explicit alignment is specified, the global is forced to have exactly that alignment.
[@<GlobalVarName> =]
[Linkage] [Visibility] [DLLStorageClass] [ThreadLocal]
[unnamed_addr] [AddrSpace] [ExternallyInitialized]
<global | constant> <Type> [<InitializerConstant>]
[, section "name"] [, comdat [($name)]]
[, align <Alignment>]
==== FUNCTIONS ====
define
[linkage] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
[unnamed_addr] [fn Attrs] [section "name"] [comdat [($name)]]
[align N] [gc] [prefix Constant] [prologue Constant]
[personality Constant] { ... }
==== Aliases ====
Aliases, unlike function or variables, don’t create any new data.
They are just a new symbol and metadata for an existing position.
Aliases have a name and an aliasee that is either a global value or a constant expression.
Aliases that are not unnamed_addr are guaranteed to have the same address as the aliasee expression.
unnamed_addr ones are only guaranteed to point to the same content.
@<Name> =
[Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias <AliaseeTy> @<Aliasee>
==== Comdats ====
Starting with $
//TODO
==== Named Metadata ====
Starting with !
==== Parameter Attributes ====
The return type and each parameter of a function type may have a set of parameter attributes associated with them.
EXAMPLES:
- declare i32 @printf(i8* noalias nocapture, ...)
- declare i32 @atoi(i8 zeroext)
- declare signext i8 @returns_signed_char()
SOME OF THE VALUES VALUES:
- zeroext
- signext
- returned
- nonnull (LLVM cannot force that. It is good for optimizing, but we must garatee this during calling)
//TODO
==== Garbage Collector Strategy Names ====
Each function may specify a garbage collector strategy name, which is simply a string.
The supported values of name includes those built in to LLVM and any provided by loaded plugins.
SYNTAX: define void @f() gc "name" { ... }
==== Attribute Groups ====
Just to keep readable our .ll files
Starting with #
EXAMPLE: attributes #0 = { alwaysinline alignstack=4 }
==== Function Attributes ====
Function attributes are set to communicate additional information about a function.
SOME VALUES:
- alignstack<n>
- alwaysinline
- builtin
- inlinehint
- noreturn
- nounwind (no exceptions are thrown)
- readnone (something like PURE)
==== Data Layout ====
//TODO read
small vs big endian, the size of a pointer and stuff like that.
==== Target Triple ====
ARCHITECTURE-VENDOR-OPERATING_SYSTEM or:
ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT
This information is passed along to the backend so that it generates code for the proper architecture.
============== TYPES ===============
- void: no size, no value, no fun
- integer type: i8, i55, i1 etc: from 1 - 2^23-1 (about 8 million) bits!!!!
- Floating Point Types:
# half
# float
# double
# fp128
# x86_fp80
# ppc_fp128
- X86_mmx Type (limited instructions, no arrays)
- pointer type
specifies memory locations.
Note that LLVM does not permit pointers to void (void*)
nor does it permit pointers to labels (label*).
Use i8* instead.
SYNTAX: <type> *
EXAMPLES:
- [4 x i32]* // A pointer to array of four i32 values.
- i32 (i32*) * A pointer to a function that takes an i32*, returning an i32.
- i32 addrspace(5)*
- vector type
Vector types are used when multiple primitive data
are operated in parallel using a single instruction (SIMD).
The number of elements is a constant integer value larger than 0
SYNTAX: < <# elements> x <elementtype> >
EXAMPLES:
# <4 x i32> // Vector of 4 32-bit integer values.
# <8 x float>
# <2 x i64>
# <4 x i64*> //Vector of 4 pointers to 64-bit integer values.
- Label type: represents code labels
- metadata type:
The metadata type represents embedded metadata.
No derived types may be created from metadata except for function arguments.
- array type
arranges elements sequentially in memory
SYNTAX: [<# elements> x <elementtype>]
EXAMPLES:
# [40 x i32]
# [41 x i32]
# [4 x i8]
# [3 x [4 x i32]]
# [12 x [10 x float]]
# [2 x [3 x [4 x i16]]]
There is no restriction on indexing beyond the end of the array implied by a static type
- Structure Type
The structure type is used to represent a collection of
data members together in memory.
The elements of a structure may be any type that has a size.
Structures may optionally be “packed” structures, which indicate that the alignment of the struct is one byte.
In non-packed structs, padding between field types is inserted as defined by the DataLayout string in the module.
Structures can either be “literal” or “identified”.
# A literal structure is defined inline with other types (e.g. {i32, i32}*)
# identified types are always defined at the top level with a name
SYNTAX:
# %T1 = type { <type list> } ; Identified normal struct type
# %T2 = type <{ <type list> }> ; Identified packed struct type
EXAMPLES:
# { i32, i32, i32 }
# { float, i32 (i32) * } //A pair, where the first element is a float and the second element is a pointer to a function that takes an i32, returning an i32.
# <{ i8, i32 }> //A packed struct known to be 5 bytes in size.
- function Type:
<returntype> (<parameter list>)
examples:
# i32 (i32)
# float (i16, i32 *) *
# i32 (i8*, ...)
# {i32, i32} (i32) // <------- returns a struct
- Opaque Structure Types
corresponds (for example) to the C notion of a forward declared structure.
represent named structure types that do not have a body specified.
SYNTAX
# %X = type opaque
# %52 = type opaque
========= Constants ==========
# true (i1)
# false (i1)
# Integer constants (4, 123, -56)
# Floating point constants:
Floating point constants use standard decimal notation (e.g. 123.421),
exponential notation (e.g. 1.23421e+2),
or a more precise hexadecimal notation
The assembler requires the exact decimal value of a floating-point constant
the assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating decimal in binary.
# null
# Structure constants
Structure constants must have structure type,
and the number and types of elements must match those specified by the type.
EXAMPLE:
{ i32 4, float 17.0, i32* @G } //where “@G” is declared as “@G = external global i32
# Array constants:
Array constants must have array type, and the number and types of
elements must match those specified by the type
SYNTAX: [ i32 42, i32 11, i32 74 ]
As a special case, character array constants may also be represented as a double-quoted
string using the c prefix. For example: c"Hello World\0A\00"
# Vector constants
SYNTAX: < i32 42, i32 11, i32 74, i32 100 >
# zeroinitializer: a value to zero of any type, including scalar and aggregate types
# Metadata node
SYNTAX: !{!0, !{!2, !0}, !"test"}
Metadata can reference constant values, for example: !{!0, i32 0, i8* @global, i64 (i64)* @function, !"str"}
metadata is a place to attach additional information such as debug info
# Global Variable and Function Addresses:
@X = global i32 17
@Y = global i32 42
@Z = global [2 x i32*] [ i32* @X, i32* @Y ]
# Undefined Values
The string ‘undef‘ can be used anywhere a constant is expected,
expept ‘label‘ or ‘void‘.
Undefined values are useful because they indicate to the compiler that the program is well defined no matter what value is used.
This gives the compiler more freedom to optimize
EXAMPLE:
%A = add %X, undef
%C = undef
%A = xor undef, undef //This example points out that two ‘undef‘ operands are not necessarily the same
======== Addresses of Basic Blocks =======
//TODO
======== Constant Expressions ========
- Constant expressions are used to allow expressions involving other constants to be used as constants
- Constant expressions may be of any first class type and may involve any LLVM operation that does not have side effects
======= METADATA =======
- DICompileUnit: information about a compile unit
!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",
isOptimized: true, flags: "-O2", runtimeVersion: 2,
splitDebugFilename: "abc.debug", emissionKind: 1,
enums: !2, retainedTypes: !3, subprograms: !4,
globals: !5, imports: !6)
- DIFile: DIFile nodes represent files
!0 = !DIFile(filename: "path/to/file", directory: "/path/to/dir")
- DIBasicType: DIBasicType nodes represent primitive types, such as int, bool and float. tag
!0 = !DIBasicType(name: "unsigned char", size: 8, align: 8, encoding: DW_ATE_unsigned_char)
- DISubroutineType: represent subroutine types
!0 = !BasicType(name: "int", size: 32, align: 32, DW_ATE_signed)
!1 = !BasicType(name: "char", size: 8, align: 8, DW_ATE_signed_char)
!2 = !DISubroutineType(types: !{null, !0, !1}) ; void (int, char)
- DIDerivedType: nodes represent types derived from other types, such as qualified types.
!0 = !DIBasicType(name: "unsigned char", size: 8, align: 8, encoding: DW_ATE_unsigned_char)
!1 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !0, size: 32, align: 32)
- DICompositeType: represent types composed of other types, like structures and unions
- DISubrange: elements for DW_TAG_array_type variants of DICompositeType. count: -1 indicates an empty array.
- DIEnumerator: elements for DW_TAG_enumeration_type variants of DICompositeType.
- DITemplateTypeParameter: type parameters to generic source language constructs
- DITemplateValueParameter: value parameters to generic source language constructs.
- ... and more ...