Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New compiler: Initializers for global classic arrays and structs #2662

Open
wants to merge 10 commits into
base: ags4
Choose a base branch
from

Conversation

fernewelten
Copy link
Contributor

@fernewelten fernewelten commented Jan 18, 2025

Fixes #2152

I've implemented the possibility to initialize global structs and global classic arrays, using a similar syntax as we (now) have for parameters.

Background

Global variables can be initialized at compile time: The compiler essentially prepares in a byte buffer an image of all the global variables with the initial values already set so that the variables can be used at run time without any further processing. (Managed variables can only be initialized to null this way.)

Of course, this can only work when the compiler knows what initial values those global variables should have. Previously, there was no way of specifying the initial values for struct variables and for classic arrays. Now there is a way.

I've implemented the following initializations:

Initializer for non-managed struct variables

struct Car
{
    bool HasAirbags;
    float MaxSpeed;
    float MaxAcceleration;
};
Car OldBugatti = { MaxAcceleration: 50.0, MaxSpeed: 199.8, }; // ← Comma in front of '}' is allowed but optional

Notes:

  • The ordering of the fields is immaterial. What matters is how the fields are named in the list.
  • All the fields are set to binary zeros where the init value isn't specified. In, e.g., the above example, HasAirbags is set to binary zeros, i.e., to false
  • The struct may have managed fields such as Strings, but the only value that you can initialize them to is null.
  • If a struct A extends a struct B and you initialize a variable of type A, then you can initialize all the fields from A and B by just naming them:
struct Vehicle 
{
    float MaxSpeed, MaxAcceleration;
};
struct Car extends Vehicle
{
    bool HasAirbags;
};
Car OldBugatti = { MaxAcceleration: 50.0, HasAirbags: false, }; 
  • You must use the notation fieldname : value. You cannot simply list the values without the fieldnames in some sequential order (I haven't implemented that for now).
  • A struct variable can have a field that is another struct. In this case, the initializers are nested:
struct Location 
{ 
    float Longitude, Latitude; 
};
struct PointOfInterest
{
    int AdmissionFee;
    Location WhereItIs;
};

PointOfInterest EiffelTower = {
        WhereItIs: { Longitude: 48.8584, Latitude: 2.2945 }, 
        AdmissionFee: 20 + 3,    // ← Simple 'int' or 'float' expressions are possible
};

Initializers for classic (i.e., non-dynamic) array variables

I provide several ways:

Sequence initialization

int Primes[10] = { 2, 3, 5, 7 };

Notes:

  • If your list contains fewer values than the dimension of the array, then the rest will be filled up with binary zeros. If your list contains more values, then the compiler will balk.
  • You may use simple int and float expressions that can be evaluated at compile time
  • You may not mix Sequence initialization and Named initialization (cf. below, including examples) within the same list
  • You may not initialize multi-dimensional arrays (cf 'multi-dimensional arrays' below) by just listing all the values without inner braces (cf. below for an example)

Named initialization

bool IsPrime[10] = {
    [2]: true,     // ← means, IsPrime[2] is 'true'
    [3]: true,     // ← means, IsPrime[3] is 'true' etc.
    [5]: true,
    [7]: true,
};

Notes:

  • All the non-mentioned indices are set to binary zeros
  • The compiler will balk if you attempt to set an index more than once
  • The sequence in the list is immaterial.
  • This variant is useful if most of the values in your array are zero and only a few values are different from zero (so-called sparse arrays)
  • You may not mix Named initialization and Sequence initialization within the same list. For instance, the compiler will balk at { 1, 2, [5]: 99 } or { [0]: 2, [1]: 3, 5, 7, }

Multi-dimensional classic arrays

An int Foo[2][3] is treated as 2 arrays, each of which has 3 values. So this is how the array is initialized:

int Spreadsheet[2][3] = {
    { 1, 2, 3, },
    { 4, 5, 6  }
};

Notes:

  • C++ offers the possibility to omit all the inner braces and simply list all the values. I have not implemented that. For instance, you cannot define int Spreadsheet[2][3] = { 1, 2, 3, 4, 5, 6);
  • You can use both Named and Sequence initialization for multi-dimensional arrays, but you may not mix them within the same list. So you can do, e.g.,
// The outer list is Named, both the inner lists are Sequence
int Spreadsheet[2][3] = {
    [0]: { 1, 2, 3, },
    [1]: { 4, 5, 6  },
};

Special case: Classic one-dimensional char arrays

These can also be initialized by a string literal.

char SafeCode[10] = "AABAACAAD";

Notes:

  • The string literal including the terminating \0 must be at most as long as the array, otherwise the compiler will balk. This is a safeguard attempt for when this array is passed as an argument to a function (e.g., Display(SafeCode);)
  • If the string literal is shorter than the array, then the rest of the array is filled up with binary zeros.

Local structs and local classic arrays

Unfortunately, not supported for now. If you define an array within a function, you'll have to initialize its values the old-fashioned way. These initializations need to happen at runtime, anyway.

A `char[]` is considered an empty string whenever a '\0' is in its first byte, but a `std::string` cannot be handled that way: → use `clear()` and `empty()`.

Some strings in the interface of the compiler have been converted from `char[]` into `std::string`. Update the handling within the parser to account for that.
These tests are often run in parallel, and then 'ccSetOption()' calls that happen concurrently clobber each other's effect.
Convert to the 4-parameter function 'cc_compile()' of the new compiler, which has a dedicated parameter for the compiler options.
Recently, a lot of `char[]` in the interface to the compiler have been converted to `std::string`. Rewrite the functions that generate bytecode tests or that compare interface components to match. Also provide code to compare and check the `scrip.globaldata` that the compiler generates.

Make some library functions `static`, as suggested by the MSVS compiler.
Add Googletest for the new function
@ivan-mogilko
Copy link
Contributor

This sounds great! I shall test this soon.
I have a question regarding two unimplemented things though:

  1. On struct initializer:

You must use the notation fieldname : value. You cannot simply list the values without the fieldnames in some sequential order (I haven't implemented that).

  1. On local structs and arrays:

Unfortunately, not supported so far. If you define an array within a function, you'll have to initialize its values the old-fashioned way. These initializations need to happen at runtime, anyway.

Are these technically possible and may they be planned for the future, or not at all?

Personally, I believe that having a ordered struct initializer will be very convenient for simple structs and also consistent with the function argument list syntax (where you can pass arguments either ordered or named).

Local initialization: in theory these can be done by preallocating a struct and then filling them with results of expressions inside initializer. But also, since they are executed at runtime, these may contain non-constexpr expressions, like function calls and new operators.

@fernewelten
Copy link
Contributor Author

fernewelten commented Jan 19, 2025

You must use the notation, fieldname : value. You cannot simply list the values without the fieldnames in some sequential order (I haven't implemented that).

Yes, it's doable: The fields are ordered in the symbol table: They come in the order in which they were defined. If the struct has been extended, then the ancester fields and perhaps the fields of their ancesters etc. must be considered, too. We can define a sequence from that, e.g., ‘first the fields of the ancester's ancester, then the added fields of the ancester, then the added fields of the struct proper’ – and then an initializer list { field1, field2, field3, … } can be matched to this sequence.

(From a practical perspective of the user-programmer, a sequence list saves some typing in comparison to a named list, but when the fields in the struct get moved around afterwards for some reason then it's very error-prone to find and change around the respective initializations. I've been there. 😀 But that's an aside.)

Local initialization: in theory these can be done by preallocating a struct and then filling them with results of expressions inside initializer. But also, since they are executed at runtime, these may contain non-constexpr expressions, like function calls and new operators.

Yes, that's also doable. It would be a different piece of code that would have to be added to another place in the compiler than where the globals are initialized, but it can definitely be done.

I'm not against doing those two in principle, it's just that I haven't implemented them for now.

- `struct` components uninitialized after construction
- Refer to data of a vector v as `&v[0]` instead of `v.data()`
- LINUM directives left out or generated erroneously
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ags 4 related to the ags4 development context: script compiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants