New compiler: Initializers for global classic arrays and structs #2662

fernewelten · 2025-01-18T18:42:35Z

I've implemented the possibility to initialize global structs and global classic arrays, using a similar syntax as we (now) have for parameters.

Background

Global variables can be initialized at compile time: The compiler essentially prepares in a byte buffer an image of all the global variables with the initial values already set so that the variables can be used at run time without any further processing. (Managed variables can only be initialized to null this way.)

Of course, this can only work when the compiler knows what initial values those global variables should have. Previously, there was no way of specifying the initial values for struct variables and for classic arrays. Now there is a way.

I've implemented the following initializations:

Initializer for non-managed `struct` variables

struct Car
{
    bool HasAirbags;
    float MaxSpeed;
    float MaxAcceleration;
};
Car OldBugatti = { MaxAcceleration: 50.0, MaxSpeed: 199.8, }; // ← Comma in front of '}' is allowed but optional

Notes:

The ordering of the fields is immaterial. What matters is how the fields are named in the list.
All the fields are set to binary zeros where the init value isn't specified. In, e.g., the above example, HasAirbags is set to binary zeros, i.e., to false
The struct may have managed fields such as Strings, but the only value that you can initialize them to is null.
If a struct A extends a struct B and you initialize a variable of type A, then you can initialize all the fields from A and B by just naming them:

struct Vehicle 
{
    float MaxSpeed, MaxAcceleration;
};
struct Car extends Vehicle
{
    bool HasAirbags;
};
Car OldBugatti = { MaxAcceleration: 50.0, HasAirbags: false, };

You must use the notation fieldname : value. You cannot simply list the values without the fieldnames in some sequential order (I haven't implemented that for now).
A struct variable can have a field that is another struct. In this case, the initializers are nested:

struct Location 
{ 
    float Longitude, Latitude; 
};
struct PointOfInterest
{
    int AdmissionFee;
    Location WhereItIs;
};

PointOfInterest EiffelTower = {
        WhereItIs: { Longitude: 48.8584, Latitude: 2.2945 }, 
        AdmissionFee: 20 + 3,    // ← Simple 'int' or 'float' expressions are possible
};

Initializers for classic (i.e., non-dynamic) array variables

I provide several ways:

Sequence initialization

int Primes[10] = { 2, 3, 5, 7 };

Notes:

If your list contains fewer values than the dimension of the array, then the rest will be filled up with binary zeros. If your list contains more values, then the compiler will balk.
You may use simple int and float expressions that can be evaluated at compile time
You may not mix Sequence initialization and Named initialization (cf. below, including examples) within the same list
You may not initialize multi-dimensional arrays (cf 'multi-dimensional arrays' below) by just listing all the values without inner braces (cf. below for an example)

Named initialization

bool IsPrime[10] = {
    [2]: true,     // ← means, IsPrime[2] is 'true'
    [3]: true,     // ← means, IsPrime[3] is 'true' etc.
    [5]: true,
    [7]: true,
};

Notes:

All the non-mentioned indices are set to binary zeros
The compiler will balk if you attempt to set an index more than once
The sequence in the list is immaterial.
This variant is useful if most of the values in your array are zero and only a few values are different from zero (so-called sparse arrays)
You may not mix Named initialization and Sequence initialization within the same list. For instance, the compiler will balk at { 1, 2, [5]: 99 } or { [0]: 2, [1]: 3, 5, 7, }

Multi-dimensional classic arrays

An int Foo[2][3] is treated as 2 arrays, each of which has 3 values. So this is how the array is initialized:

int Spreadsheet[2][3] = {
    { 1, 2, 3, },
    { 4, 5, 6  }
};

Notes:

C++ offers the possibility to omit all the inner braces and simply list all the values. I have not implemented that. For instance, you cannot define int Spreadsheet[2][3] = { 1, 2, 3, 4, 5, 6);
You can use both Named and Sequence initialization for multi-dimensional arrays, but you may not mix them within the same list. So you can do, e.g.,

// The outer list is Named, both the inner lists are Sequence
int Spreadsheet[2][3] = {
    [0]: { 1, 2, 3, },
    [1]: { 4, 5, 6  },
};

Special case: Classic one-dimensional `char` arrays

These can also be initialized by a string literal.

char SafeCode[10] = "AABAACAAD";

Notes:

The string literal including the terminating \0 must be at most as long as the array, otherwise the compiler will balk. This is a safeguard attempt for when this array is passed as an argument to a function (e.g., Display(SafeCode);)
If the string literal is shorter than the array, then the rest of the array is filled up with binary zeros.

Local `struct`s and local classic arrays

Unfortunately, not supported for now. If you define an array within a function, you'll have to initialize its values the old-fashioned way. These initializations need to happen at runtime, anyway.

A `char[]` is considered an empty string whenever a '\0' is in its first byte, but a `std::string` cannot be handled that way: → use `clear()` and `empty()`. Some strings in the interface of the compiler have been converted from `char[]` into `std::string`. Update the handling within the parser to account for that.

These tests are often run in parallel, and then 'ccSetOption()' calls that happen concurrently clobber each other's effect. Convert to the 4-parameter function 'cc_compile()' of the new compiler, which has a dedicated parameter for the compiler options.

Recently, a lot of `char[]` in the interface to the compiler have been converted to `std::string`. Rewrite the functions that generate bytecode tests or that compare interface components to match. Also provide code to compare and check the `scrip.globaldata` that the compiler generates. Make some library functions `static`, as suggested by the MSVS compiler.

Add Googletest for the new function

ivan-mogilko · 2025-01-18T20:56:56Z

This sounds great! I shall test this soon.
I have a question regarding two unimplemented things though:

On struct initializer:

You must use the notation fieldname : value. You cannot simply list the values without the fieldnames in some sequential order (I haven't implemented that).

On local structs and arrays:

Unfortunately, not supported so far. If you define an array within a function, you'll have to initialize its values the old-fashioned way. These initializations need to happen at runtime, anyway.

Are these technically possible and may they be planned for the future, or not at all?

Personally, I believe that having a ordered struct initializer will be very convenient for simple structs and also consistent with the function argument list syntax (where you can pass arguments either ordered or named).

Local initialization: in theory these can be done by preallocating a struct and then filling them with results of expressions inside initializer. But also, since they are executed at runtime, these may contain non-constexpr expressions, like function calls and new operators.

fernewelten · 2025-01-19T01:18:15Z

You must use the notation, fieldname : value. You cannot simply list the values without the fieldnames in some sequential order (I haven't implemented that).

Yes, it's doable: The fields are ordered in the symbol table: They come in the order in which they were defined. If the struct has been extended, then the ancester fields and perhaps the fields of their ancesters etc. must be considered, too. We can define a sequence from that, e.g., ‘first the fields of the ancester's ancester, then the added fields of the ancester, then the added fields of the struct proper’ – and then an initializer list { field1, field2, field3, … } can be matched to this sequence.

(From a practical perspective of the user-programmer, a sequence list saves some typing in comparison to a named list, but when the fields in the struct get moved around afterwards for some reason then it's very error-prone to find and change around the respective initializations. I've been there. 😀 But that's an aside.)

Local initialization: in theory these can be done by preallocating a struct and then filling them with results of expressions inside initializer. But also, since they are executed at runtime, these may contain non-constexpr expressions, like function calls and new operators.

Yes, that's also doable. It would be a different piece of code that would have to be added to another place in the compiler than where the globals are initialized, but it can definitely be done.

I'm not against doing those two in principle, it's just that I haven't implemented them for now.

- `struct` components uninitialized after construction - Refer to data of a vector v as `&v[0]` instead of `v.data()` - LINUM directives left out or generated erroneously

fernewelten added 9 commits January 18, 2025 17:38

Fix bug: Only one-dimensional arrays can convert to string

ec073c9

Fix Revamp bytecode library functions

a809209

Convert symboltable googletests to 'TEST_F()'

07cdeb3

Symbol table: ArrayVartypeWithoutFirstDim()

0774e38

Add Googletest for the new function

String literals as initializers for global classic char arrays

cb218e1

Initialization of global structs and global classic arrays

913bfd4

ivan-mogilko added ags 4 related to the ags4 development context: script compiler labels Jan 18, 2025

fernewelten mentioned this pull request Jan 18, 2025

New compiler: support initializer lists #2152

Open

Fix small bugs found after upgrading MSVS

95287f0

- `struct` components uninitialized after construction - Refer to data of a vector v as `&v[0]` instead of `v.data()` - LINUM directives left out or generated erroneously

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New compiler: Initializers for global classic arrays and structs #2662

New compiler: Initializers for global classic arrays and structs #2662

fernewelten commented Jan 18, 2025 •

edited

Loading

ivan-mogilko commented Jan 18, 2025

fernewelten commented Jan 19, 2025 •

edited

Loading

New compiler: Initializers for global classic arrays and structs #2662

Are you sure you want to change the base?

New compiler: Initializers for global classic arrays and structs #2662

Conversation

fernewelten commented Jan 18, 2025 • edited Loading

Background

Initializer for non-managed struct variables

Initializers for classic (i.e., non-dynamic) array variables

Sequence initialization

Named initialization

Multi-dimensional classic arrays

Special case: Classic one-dimensional char arrays

Local structs and local classic arrays

ivan-mogilko commented Jan 18, 2025

fernewelten commented Jan 19, 2025 • edited Loading

fernewelten commented Jan 18, 2025 •

edited

Loading

Initializer for non-managed `struct` variables

Special case: Classic one-dimensional `char` arrays

Local `struct`s and local classic arrays

fernewelten commented Jan 19, 2025 •

edited

Loading