-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-define the type system #8352
Conversation
paddle/framework/framework.proto
Outdated
} | ||
optional LoDTensorDesc lod_tensor = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoDTensorDesc => LoDTensor?
paddle/framework/framework.proto
Outdated
required Type type = 1; | ||
|
||
message Tensor { | ||
required Type data_type = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the data_type here for Tensor only POD types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Our C++ implementation of Tensor requires that elements have to be C++ POD types:
It looks to me that we can check in C++ code that data_type
must be of POD types. But I am open to better solutions. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested by @wangkuiyi , one way could be to leave it as it is and check in the C++ code that data_type
must be a POD. Another way could be to define this inside protobuf. We can split Type
into 2 different enums :
enum PodType {
// POD types
BOOL = 0;
INT16 = 1;
INT32 = 2;
INT64 = 3;
FP16 = 4;
FP32 = 5;
FP64 = 6;
}
enum CompositeType {
LOD_TENSOR = 1; // LoDTensor
SELECTED_ROWS = 2; // Tensor
FEED_MINIBATCH = 3;
FETCH_LIST = 4;
STEP_SCOPES = 5;
LOD_RANK_TABLE = 6;
// LOD_TENSOR_ARRAY = 7; // replaced by array-of-tensors. c.f. ArrayType
// PLACE_LIST = 8; // replaced by array-of-place. c.f. ArrayType
READER = 7; // ReaderType
}
Once we have done this, we can define a field type
in VarType
as follows:
message VarType {
oneof type {
PodType ptype = 1;
CompositeType ctype = 2;
}
}
```protobuf
This way we can capture the requirement of our Tensor and LODTensor inside protobuf message. The tensor can be defined as follows:
```protobuf
message Tensor {
required PodType data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
}
What do you guys think @wangkuiyi and @jacquesqiao ?
paddle/framework/framework.proto
Outdated
} | ||
optional Array array = 4; | ||
|
||
message Reader { repeated LoDTensorDesc lod_tensor = 1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoDTensorDesc => LoDTensor
paddle/framework/framework.proto
Outdated
FP64 = 6; | ||
|
||
// other types, that might need additonal descriptions | ||
LOD_TENSOR = 1; // LoDTensorType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because other types
are in the same enum Type
with POD types
, the index should not be duplicated. So the index should not start from 1.
|
||
message ReaderDesc { repeated LoDTensorDesc lod_tensor = 1; } | ||
message Array { VarType elem_type = 1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the VarType be repeated?
And message Array
can be used as array of VarType
in which the VarType should be the same and n-tuple
in which the VarType of the array can be different, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the VarType
in the array should be repeated. This is because the array will have all elements of the same VarType
. In an n-tuple, the VarType
will be repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Abhinav's comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavarora Thanks, you are right, I finally know where I was wrong.
We should add message Tuple {repeated VarType tuple_type = 1;}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this has also been discussed here: #8336 Instead of VarDesc
we will have VarType
as has been done in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, channel of channels is a commonly used pattern used with CSP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a channel can hold so many types, we cannot make shape inference easily at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Tuple
and Array
should be two different types.
-
The
Array
can hold elements with the same type and shape at compile type. The shape ofarray_read
can be inferenced at compile time only if thearray
holds the same time.- The length of Array can only be decided at runtime since there are
array_write
operators for RNN. If we make array can hold different types, we cannot inference type at compile time.
- The length of Array can only be decided at runtime since there are
-
The
Tuple
is an immutable type. The length ofTuple
and its elements will not be changed at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyoung You are right, Tuple
and Array
are 2 different types. The type of the array elements will be known at compile time. Tuple
should be immutable and its length and elements will not change. I am just a little curious how we could implement thin our cpp code. Today, we map the LODTensorArray
directly to a Vector<LODTensor>
. For an Array
would we have to create mappings for all types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we have to create mappings for all types?
I am curious about this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! This was exactly as we discussed on Friday. I have few more questions that I have asked in the comments.
Since @wangkuiyi was unable to see my comment, I am pasting it here again: There are 2 ways to make sure that Tensors and LODTensors only contain POD data types. One way, as suggested by @wangkuiyi, is to leave it as it is and check in the C++ code that enum PODType {
// POD types
BOOL = 0;
INT16 = 1;
INT32 = 2;
INT64 = 3;
FP16 = 4;
FP32 = 5;
FP64 = 6;
}
enum CompositeType {
LOD_TENSOR = 1; // LoDTensor
SELECTED_ROWS = 2; // Tensor
FEED_MINIBATCH = 3;
FETCH_LIST = 4;
STEP_SCOPES = 5;
LOD_RANK_TABLE = 6;
// LOD_TENSOR_ARRAY = 7; // replaced by array-of-tensors. c.f. ArrayType
// PLACE_LIST = 8; // replaced by array-of-place. c.f. ArrayType
READER = 7; // ReaderType
} Once we have done this, we can define a field message VarType {
oneof type {
PODType ptype = 1;
CompositeType ctype = 2;
}
...
} This way we can capture the requirement of our Tensor and LODTensor inside protobuf message. The tensor can be defined as follows: message Tensor {
required PodType data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
} More information about |
Isn't this similar to what @wangkuiyi has proposed in the PR , except for splitting the VarType into two separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for starting off the PR. The refinement done here should suffice for now and is the same idea as we had discussed on Friday.
@kavyasrinet The above comment was in response to the following thread -> #8352 (comment) |
I see. Thanks for the clarification, I agree with both the approaches. |
LOD_TENSOR_ARRAY = 7; | ||
PLACE_LIST = 8; | ||
READER = 9; | ||
} | ||
required string name = 1; | ||
required VarType type = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can VarDesc::type
be POD type like BOOL
, INT16
? Or it can only be LOD_TENSOR
, SELECTED_ROWS
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VarType::Type contains POD type.
FETCH_LIST = 10; | ||
STEP_SCOPES = 11; | ||
LOD_RANK_TABLE = 12; | ||
// LOD_TENSOR_ARRAY = 7; // replaced by array-of-tensors. c.f. ArrayType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tensors in the LOD_TENSOR_ARRAY will hold the same type and the same shape in compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. So LoDTensorArray is not Arary. We got that. Thanks for the reminder!
@abhinavarora @wangkuiyi I personally prefer the first way to separate the two kinds of Type: PODType and CompositeType. There are several reasons:
|
@jacquesqiao @abhinavarora @reyoung and everyone: Let us use a single enum and do NOT separate POD types from composite types, as I posted in the PR. I understand your point is due to the fact that the composite type Tensor requires that its elements are POD types. Let us follow this logic and imagine that in the future we might have an additional composite type, whose elements must be, say, tensors. How are we going to handle this? Do we need to separate LOD_TENSOR and SELECTED_ROW out from CompositeType and make it a new enum named TensorType? If we do so, we will have enum PODType {...}
enum TensorType {...}
enum CompositeType {...}
message VarType {
oneof {
PODType
TensorType
CompositeType
}
...
} There are two critical problems with the above strategy:
|
Since you haven't replied for a long time, we have closed this issue/pr. |
Fix #8336
Related with #8339
This PR is not supposed to work. It is just a reference to how we need to re-design Fluid's type system.