-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/880 binop ben bou #902
Conversation
…/binop_distribute
…/binop_distribute
…shape_map attribute -> return copy of the lshape_map
…oltz-analytics/heat into features/880-binop_ben-bou
run tests |
…lytics/heat into features/880-binop_ben-bou
run tests |
@@ -582,15 +582,15 @@ def create_lshape_map(self, force_check: bool = False) -> torch.Tensor: | |||
result. Otherwise, create the lshape_map | |||
""" | |||
if not force_check and self.__lshape_map is not None: | |||
return self.__lshape_map | |||
return self.__lshape_map.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why you do it, but the lshape_map can get quite big with many nodes. Why not cloning or copying as need arises?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well somehow it must be ensured that the attribute is immutable, and as far as I know, there are no read-only Tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of the time this shouldnt be a problem. the lshape_map
is generally much smaller than the other tensor sizes
@@ -601,7 +601,7 @@ def create_lshape_map(self, force_check: bool = False) -> torch.Tensor: | |||
self.comm.Allreduce(MPI.IN_PLACE, lshape_map, MPI.SUM) | |||
|
|||
self.__lshape_map = lshape_map | |||
return lshape_map | |||
return lshape_map.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
heat/core/dndarray.py
Outdated
# for dim in expand: | ||
# self = self.expand_dims(dim) | ||
# if len(expand): | ||
# return self[tuple(key)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
heat/core/dndarray.py
Outdated
key = tuple(key) | ||
self_proxy = self.__torch_proxy__() | ||
# None and newaxis indexing | ||
# expand = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
heat/core/manipulations.py
Outdated
array_dtype = types.canonical_heat_type(t_array_dtype) | ||
target = arrays[0] | ||
try: | ||
# arrays[1:] = sanitation.sanitize_distribution(*arrays[1:], target=target) # error in unpacking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Another thing about this part: the try ... except
is only needed to transform a NotImplementedError
to a ValueError
. At some point it would be good to have unified Errors within HeAT, such that e.g. a wrong split axis always yields the same Exception
heat/core/_operations.py
Outdated
|
||
def __get_out_params(target, other=None, map=None): | ||
""" | ||
Getter for the output parameters of a binop with target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"binop with target" -> "binary operation with target distribution"
heat/core/_operations.py
Outdated
def __get_out_params(target, other=None, map=None): | ||
""" | ||
Getter for the output parameters of a binop with target. | ||
If other is provided, it's distribution will be matched to target or, if provided, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other -> other
it's -> its
target -> target
heat/core/_operations.py
Outdated
""" | ||
Getter for the output parameters of a binop with target. | ||
If other is provided, it's distribution will be matched to target or, if provided, | ||
redistributed according to map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map -> map
heat/core/_operations.py
Outdated
other : DNDarray | ||
DNDarray to be adapted | ||
map : Tensor | ||
Lshape-Map other should be matched to. Defaults to target's lshape_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lshape-Map -> lshape_map
other -> other
target's lshape_map -> target.lshape_map
heat/core/relational.py
Outdated
# result_tensor = _operations.__binary_op(torch.equal, x, y) | ||
# | ||
# if result_tensor.larray.numel() == 1: | ||
# result_value = result_tensor.larray.item() | ||
# else: | ||
# result_value = True | ||
# | ||
# return result_tensor.comm.allreduce(result_value, MPI.LAND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
) -> Union[DNDarray, Tuple(DNDarray)]: | ||
""" | ||
Distribute every arg according to target.lshape_map or, if provided, diff_map. | ||
After this sanitation, the lshapes are compatible along the split dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arg
, target.lshape_map
, diff_map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ben-bou thanks again for so much work. The default cloning of the lshape_map
is something we should rethink. Otherwise I've got mostly editorial changes.
run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brillian @ben-bou , thank you so much!
Description
Issue/s resolved: #880
Changes proposed:
None
/newaxis
indexing togetitem
ht.equal
Method. Previously it used thebinop
interface, which was wrong because it isn't abinop
stack
function compatible to not balanced arraysType of change
--->
Performance
Checks lshape-maps inside every binop between distributed dndarrays -> more communication
Redistributes out-of-place -> more memory
If the dndarrays are balanced (i.e. the features of this PR are not used) the introduced overhead is small.
Binops between two balanced dndarrays of shape
(30,100,100)
with 24 MPI-processes spend >90% of the runtime in the respective torch functions:Due Diligence
Does this change modify the behaviour of other functions? If so, which?
yes, every function using binops,