-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multidimensional arrays #1839
Multidimensional arrays #1839
Conversation
239368c
to
03d48f0
Compare
|
||
## Proposal | ||
|
||
We should add support of multidimensional arrays in Carbon via syntax extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with #1787 (see this comment), we've put this proposal in a procedurally awkward position, because we haven't yet adopted a proposal for one-dimensional arrays yet. That being the case, I'm not sure what the best way forward is, but it might make sense to defer this proposal until we've adopted a one-dimensional array proposal that this can build on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late reply and thank you for your review! As I can see, #1787 is just about array initialization syntax. Should I create a new proposal for 1D arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I create a new proposal for 1D arrays?
I think either you or @asoffer should; see the discussion in the #process channel on Discord.
```carbon | ||
var x: [i32; :, :]; | ||
``` | ||
For avoiding Undefined Behavior, `x` has shape `(0, 0)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does :
mean "deduce this dimension from the initializer", or does it mean "this dimension can vary at run-time"? I'd strongly prefer the first meaning, but if that's the case, this code should be a compile error, just like var x: auto;
is a compile error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that dimensions may vary at run-time. The most simple example is string (C# example):
string s = "Hello";
s += ", world!";
In Fortran, it oftenly used for allocating arrays when sizes are becoming known:
type(atom) :: atoms(:)
integer :: Natoms
Natoms = get_atoms_len()
allocate(atoms(Natoms))
In carbon, for simplifying (Sorry for Fortran-style):
var n_atoms: i64;
var atoms: [atom; :];
n_atoms = get_atoms_len();
allocate(atoms, shape = ( n_atoms ))
It can be rewritten as:
var n_atoms: i64 = get_atoms_len();
var atoms: [atom; n_atoms];
But I would prefer to have such constructions since they may be actively used for class fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like it would make the type of var x: [i32; 3, 2] = ((0, 1, 2), (3, 4, 5));
very different from the type of var y: [i32; :, :] = ((0, 1, 2), (3, 4, 5));
.
shape(x)
would presumably be a compile-time constant, butshape(y)
could not be.x
could store all its elements directly in the object (like astd::array
does), buty
has to be implemented with a pointer to heap-allocated memory.- The generated code for
x[2, 1]
can consist of a single memory access, but I believe the generated code fory[2, 1]
requires at least two memory accesses, even if you don't count the indirection through the pointer mentioned above. x
provides pointer stability, whereasy
does not:&x[0]
will never change, but&y[0]
could have different values over the course ofy
's lifetime.
Those differences are so basic that I think it would be misleading to use such similar syntax to represent them both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my point of view, var x: [i32; 3, 2] = ((0, 1, 2), (3, 4, 5));
and var y: [i32; :, :] = ((0, 1, 2), (3, 4, 5));
are the same. In y
, :
is used for skipping direct mention of dimensions' lengths. So,
shape(x)
andshape(y)
would presumably be a compile-time constant.x
andy
could store all its elements directly in the object (like astd::array
does)x
andy
provides pointer stability
I did not get why accessing for y
is as y[2][1]
, not y[2,1]
. It is not a nested array, so accessing will be via single memory access. For nested array, you are right about two memory accesses.
Let me separate var y: [i32; :, :] = ((0, 1, 2), (3, 4, 5));
and var z: [i32; :, :];
. In the case of y
, :
is a syntax sugar for skipping direct mention of dimensions' lengths. For z
, :
means that this array can be allocated/reallocated. And for z
array, it is true what you wrote about y
, except (3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably move this discussion to your 1D array proposal, but to answer your questions:
I did not get why accessing for
y
is asy[2][1]
, noty[2,1]
.
Sorry, that was a typo. I've fixed it now.
It is not a nested array, so accessing will be via single memory access. For nested array, you are right about two memory accesses.
Even if it's not nested I believe it needs two memory accesses. Let's assume the 2D array is implemented as a 1D array in row-major order. That means the element at row i and column j is located at index i + N * j in the underlying array, where N is the number of columns. So in order to access that element, I need to know the number of columns. If the number of columns can vary at run-time, that means I need to load the number of columns from memory before I can access the element. On the other hand, if the number of columns is fixed at compile time, I can avoid doing that load.
x[:, 0] = x[:, 1]; // x = ((3, 4, 5), (3, 4, 5)); | ||
``` | ||
|
||
### Iterators (?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use cases for this seem likely to be rare enough that named functions would be clearer, and would avoid the need for a new core-language syntax. In particular, I'm concerned about the ...
syntax conflicting with variadics.
@@ -0,0 +1,331 @@ | |||
# Multidimensional arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a piece of high-level feedback, I'd encourage you to look for ways to make this proposal smaller. A lot of the features you're proposing here seem like they could be follow-up proposals once the main multidimensional array feature is in place. Some of these features seem like they could be controversial, or at least require substantial discussion, and I don't want the main proposal to get bogged down.
} | ||
``` | ||
|
||
#### Elemental functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the background and rationale for this feature. For example, what problems does it solve? Can those problems be solved with libraries instead of language features? Is there precedent for this feature in other languages?
This might be a good piece to split out into a separate proposal.
} | ||
``` | ||
|
||
### Standard library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like another good piece to postpone to a future proposal. I'm not sure if we even have the right people on the project to properly review a full library design for multidimensional arrays, so it might be better to focus for now on the core-language functionality.
I relabeled this as a proposal because its only content is a proposal file. After a closer look, I see it's been a few weeks and geoffromer's requests for more explanation haven't been addressed. I've converted this back to draft so that you have time to address comments and add more detail to the proposal. |
Note, if you have any questions, #proposal-prs-and-process on Discord may be a good place to discuss proposal writing. |
We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please comment or remove the |
We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active or becomes active again, please reopen it. |
I opened #1661 and many folks thumbs it. I tried to implement some quantum chemistry (which is performance-critical) code and look which features are required for it.
Here, it is my point of view how Carbon may became popular in HPC community in comparison with C++, which has a lot application, but new Fortran applications are born and growing nowadays.