-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multidimensional arrays #1839
Multidimensional arrays #1839
Changes from all commits
03d48f0
aa1dfb9
76ff705
867a842
dc6aa86
49f3f3e
e18128e
d9b08ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,287 @@ | ||
# Multidimensional arrays | ||
|
||
<!-- | ||
Part of the Carbon Language project, under the Apache License v2.0 with LLVM | ||
Exceptions. See /LICENSE for license information. | ||
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
--> | ||
|
||
[Pull request](https://github.com/carbon-language/carbon-lang/pull/1839) | ||
|
||
<!-- toc --> | ||
|
||
## Table of contents | ||
|
||
- [Problem](#problem) | ||
- [Background](#background) | ||
- [Proposal](#proposal) | ||
- [Details](#details) | ||
- [Rationale](#rationale) | ||
- [Alternatives considered](#alternatives-considered) | ||
|
||
<!-- tocstop --> | ||
|
||
## Problem | ||
|
||
Multidimensional arrays are actively used in numerical methods, machine | ||
intelligence and data science. This is one feature than makes modern Fortran | ||
more attractive than C++ when it comes to a choice of compiled language: | ||
currently, C++ lacks support of multidimensional arrays. Having Carbon implement | ||
this would give it a major boost in the eyes of the scientific community. | ||
|
||
Nested arrays may look as a good alternative of multidimensional arrays but | ||
their performance may be not effective due to splitting in memory. | ||
|
||
## Background | ||
|
||
Multidimensional array is an array with more than two dimensions which is | ||
continuous in memory. | ||
|
||
Multidimensional array may be stored in memory in | ||
[row- or column- major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order). | ||
|
||
## Proposal | ||
|
||
We should add support of multidimensional arrays in Carbon via syntax extension | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As with #1787 (see this comment), we've put this proposal in a procedurally awkward position, because we haven't yet adopted a proposal for one-dimensional arrays yet. That being the case, I'm not sure what the best way forward is, but it might make sense to defer this proposal until we've adopted a one-dimensional array proposal that this can build on. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for late reply and thank you for your review! As I can see, #1787 is just about array initialization syntax. Should I create a new proposal for 1D arrays? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think either you or @asoffer should; see the discussion in the #process channel on Discord. |
||
for making code clean and simplier for reading and writing. | ||
|
||
```carbon | ||
var a: [f64; 3, 4]; | ||
var values: i64 = 0; | ||
for(a_i: auto in a[:,...]) { | ||
for(a_ij: auto in a_i[:,...]) { | ||
a_ij = values++; | ||
} } | ||
``` | ||
or | ||
```carbon | ||
var a: [f64; 3, 4]; | ||
var values: i64 = 0; | ||
for(i: auto in (0:2)) { | ||
for(j: auto in (0:3)) { | ||
a[i,j] = values++; | ||
} } | ||
``` | ||
|
||
## Details | ||
|
||
### Definition | ||
|
||
#### Automatic allocation | ||
|
||
Arrays can be automatically allocated: | ||
```carbon | ||
var x: [i32; :, :]; | ||
``` | ||
For avoiding Undefined Behavior, `x` has shape `(0, 0)`. | ||
Comment on lines
+73
to
+76
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It means that dimensions may vary at run-time. The most simple example is string (C# example): string s = "Hello";
s += ", world!"; In Fortran, it oftenly used for allocating arrays when sizes are becoming known: type(atom) :: atoms(:)
integer :: Natoms
Natoms = get_atoms_len()
allocate(atoms(Natoms)) In carbon, for simplifying (Sorry for Fortran-style): var n_atoms: i64;
var atoms: [atom; :];
n_atoms = get_atoms_len();
allocate(atoms, shape = ( n_atoms )) It can be rewritten as: var n_atoms: i64 = get_atoms_len();
var atoms: [atom; n_atoms]; But I would prefer to have such constructions since they may be actively used for class fields. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems like it would make the type of
Those differences are so basic that I think it would be misleading to use such similar syntax to represent them both. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my point of view,
I did not get why accessing for Let me separate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably move this discussion to your 1D array proposal, but to answer your questions:
Sorry, that was a typo. I've fixed it now.
Even if it's not nested I believe it needs two memory accesses. Let's assume the 2D array is implemented as a 1D array in row-major order. That means the element at row i and column j is located at index i + N * j in the underlying array, where N is the number of columns. So in order to access that element, I need to know the number of columns. If the number of columns can vary at run-time, that means I need to load the number of columns from memory before I can access the element. On the other hand, if the number of columns is fixed at compile time, I can avoid doing that load. |
||
|
||
Array may be defined: | ||
1. via **assignment**: | ||
```carbon | ||
var x: [i32; :, :]; | ||
var y: [i32; :, :] = ((0, 1, 2), (3, 4, 5)); | ||
x = y; | ||
``` | ||
2. via **memory allocation**: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd definitely recommend leaving this part out of the current proposal. We don't have a design for dynamic allocation of one-dimensional arrays, or even single objects, and it's impossible to evaluate this part of the proposal in isolation from that design. |
||
```carbon | ||
var x: [i32; :, :]; | ||
allocate(x, /*shape=*/(3, 2)); | ||
``` | ||
|
||
If array was already allocated and then, `allocate` called, the runtime error is. | ||
|
||
#### Automatic deallocation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason this should work differently for arrays than for single objects? If not, I think we can leave this section out. |
||
|
||
Automatically allocated arrays are destroying at the end of scope. For example, | ||
if such arrays belong to class object, they are destroying with class object. | ||
|
||
Manually, deallocation can be called using: | ||
```carbon | ||
var x: [i32; :, :]; | ||
allocate(x, /*shape=*/(3, 2)); | ||
deallocate(x); | ||
``` | ||
Calling of deallocation for non-allocated arrays leads to runtime error. | ||
|
||
### Operators | ||
|
||
Arrays may be modified in scalar and vector ways. | ||
|
||
1. Scalar way: | ||
```carbon | ||
var x: [i32; 3, 2] = ((0, 1, 2), (3, 4, 5)); | ||
var y: auto = -x; | ||
// each to each elements are summarized | ||
var z: auto = x + y; // z = ((0,0,0), (0,0,0)) | ||
``` | ||
If shapes are inconsistent, runtime error is. | ||
|
||
2. Vector way: | ||
```carbon | ||
var x: [i32; 3, 2] = ((0, 1, 2), (3, 4, 5)); | ||
// multiply each element by 2 | ||
x *= 2; // x = ((0, 2, 4), (6, 8, 10)); | ||
``` | ||
|
||
### Iterators (?) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use cases for this seem likely to be rare enough that named functions would be clearer, and would avoid the need for a new core-language syntax. In particular, I'm concerned about the |
||
|
||
In multidimensional arrays, it may be useful to have _iterators_ (row-major | ||
order): | ||
```carbon | ||
var a: [f64; 2, 3, 4]; | ||
var it: auto = a[:, ...]; | ||
for(i: auto in it) { ... } | ||
``` | ||
In this example, `i` presents `a[0,:,:]` and `a[1,:,:]` sequentially. | ||
Also, iterator may use last dimension (column-major order): | ||
```carbon | ||
var a: [f64; 4, 3, 2]; | ||
var it: auto = a[..., :]; | ||
for(i: auto in it) { ... } | ||
``` | ||
In this example, `i` presents `a[:,:,0]` and `a[:,:,1]` sequentially. | ||
In both cases, `i` is two dimensional array. | ||
|
||
`...` masks all dimensions. | ||
|
||
### Functions | ||
|
||
Usually, arrays uses as is. Below, sum of two arrays is: | ||
```carbon | ||
fn sum[T:! Type](x: T, y: T) -> T { | ||
return x + y; | ||
} | ||
``` | ||
|
||
Function returning 1D array: | ||
```carbon | ||
fn arr1D[T:! Type](x: T, y: T) -> [T; :] { | ||
return (x, y); | ||
} | ||
``` | ||
or 2D array: | ||
```carbon | ||
fn arr2D[T:! Type](x: T, y: T) -> [T; :,:] { | ||
return ((x, x), (y, y)); | ||
} | ||
``` | ||
|
||
Dimensions may be specified explicitly: | ||
```carbon | ||
fn arr1D[T:! Type](x: T, y: T) -> [T; 2] { | ||
return (x, y); | ||
} | ||
``` | ||
|
||
Lowering dimensions: | ||
```carbon | ||
fn unarr[T:! Type](x: T[:], y: T[:]) -> T { | ||
return sum(x + y); | ||
} | ||
``` | ||
|
||
#### Elemental functions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really understand the background and rationale for this feature. For example, what problems does it solve? Can those problems be solved with libraries instead of language features? Is there precedent for this feature in other languages? This might be a good piece to split out into a separate proposal. |
||
|
||
These functions applied to each element sequentially. | ||
|
||
```carbon | ||
el fn inc[T:! Type](x: T) -> T { | ||
return x + 1; | ||
} | ||
fn Main() -> i32 { | ||
var x: [i32, 3] = (0:2); | ||
var y: i32 = 3; | ||
x = inc(x); // similar to x = x + 1; | ||
y = inc(y); | ||
return 0; | ||
} | ||
``` | ||
It is useful when function is more compilicated than increment. | ||
|
||
Using _iterators_, elemental function may be used for sub-dimensions: | ||
```carbon | ||
el fn conv[T:! Type](x: T) -> T { | ||
return sum(x); | ||
} | ||
fn Main() -> i32 { | ||
var x: [i32; 3, 4] = reshape((0:11),/*shape=*/(3, 4)); | ||
var y: auto = conv(x[:,...]); // y = (6, 22, 38) | ||
var z: auto = conv(x[...,:]); // z = (12, 15, 18, 21) | ||
return 0; | ||
} | ||
``` | ||
|
||
### Standard library | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like another good piece to postpone to a future proposal. I'm not sure if we even have the right people on the project to properly review a full library design for multidimensional arrays, so it might be better to focus for now on the core-language functionality. |
||
|
||
#### allocate | ||
Allocates array: | ||
```carbon | ||
var x: [i32; :, :]; | ||
allocate(x, /*shape=*/(3, 2)); | ||
``` | ||
#### deallocate | ||
Deallocates array: | ||
```carbon | ||
var x: [i32; :, :]; | ||
allocate(x, /*shape=*/(3, 2)); | ||
deallocate(x); | ||
``` | ||
#### allocated | ||
Returns status of allocation: | ||
```carbon | ||
var x: [i32; :, :]; | ||
allocated(x); // False | ||
allocate(x, /*shape=*/(3, 2)); | ||
allocated(x); // True | ||
deallocate(x); | ||
allocated(x); // False | ||
``` | ||
#### shape | ||
Returns shape of arrays: | ||
```carbon | ||
var s: [i32; 2] = shape(x); // s = (3, 2) | ||
``` | ||
#### size | ||
Returns total array size: | ||
```carbon | ||
var l: i32 = size(x); // l = 6 | ||
``` | ||
With optional argument `dim` returns size in given dimension (indexing from 1): | ||
```carbon | ||
var l1: i32 = size(x, /*dim=*/1); // l1 = 3 | ||
var l2: i32 = size(x, /*dim=*/2); // l2 = 2 | ||
``` | ||
#### reshape | ||
Reshapes array: | ||
```carbon | ||
var x: [i32; 3, 2] = reshape(/*array=*/(0, 1, 2, 3, 4, 5), /*shape=*/(3, 2));/ | ||
``` | ||
#### transpose | ||
Transposes array (without additional argument only for 2D): | ||
```carbon | ||
var x: [i32; 3, 2] = ((0, 1, 2), (3, 4, 5)); | ||
var y: auto = transpose(x); // y = ((0, 3), (1, 4), (2, 5)) | ||
var z: auto = shape(y); // z = (2, 3) | ||
``` | ||
Additional argument `dims` marks dimensions for transposing: | ||
```carbon | ||
var x: [i32; 2, 2, 2] = ( ((0, 1), (2, 3)), ((4, 5), (6, 7)) ); | ||
var y: auto = transpose(x, /*dims=*/(1, 3)); | ||
// y = ( ((0, 4), (2, 6)), ((1, 5), (3, 7)) ) | ||
``` | ||
#### sum | ||
Sums all values in array: | ||
```carbon | ||
var x: [i32; 2, 2, 2] = ( ((0, 1), (2, 3)), ((4, 5), (6, 7)) ); | ||
var y: auto = sum(x); // y = 28 | ||
``` | ||
|
||
## Rationale | ||
|
||
This proposal should simplify to write High-Performance Compiting codes, | ||
most of them is performance-critical software. Unfortunately, C++ code is not | ||
affected. | ||
|
||
## Alternatives considered | ||
|
||
I'm under high impress of Fortran. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a piece of high-level feedback, I'd encourage you to look for ways to make this proposal smaller. A lot of the features you're proposing here seem like they could be follow-up proposals once the main multidimensional array feature is in place. Some of these features seem like they could be controversial, or at least require substantial discussion, and I don't want the main proposal to get bogged down.