Skip to content
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.

[WIP] Even more registers #217

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

[WIP] Even more registers #217

wants to merge 18 commits into from

Conversation

ftynse
Copy link
Contributor

@ftynse ftynse commented Mar 26, 2018

UNCLEAN AND NON-REBASED WORK IN PROGRESS

Do not merge or even review.

ftynse added 15 commits March 22, 2018 19:55
Extract as an overload of Scop::activePromotions taking a set of active
statement instances and a tensor id.  Overload was chosen because both
functions return the same data structure and are semantically close.
Internally, we may need to modify the stored active promotions but the
public functions return either a copy of or a const reference to that
storage.  Extract logic to find active promotions into a separate
function that returns indexes into the storage, and use it to create a
copy inside a public call.
If a group of references was promoted into shared memory, but it could
be also promoted to registers while covering exactly the same statement
instances accessing it, demote it from shared memory before promoting to
registers.
These option combinations were failing with previous implementations of
double promotion.  Make sure they never fail again.
All other ScheduleTree node types are printed in such a way that each
set(map) of the union_set(union_map) present in the node is printed on a
new line.  Do the same for extension nodes.
This creates a private convenience function to obtain a copy of active
promotions specified by a list of their indexes in the storage.

Use this function in Scop::promoteGroup to avoid retraversing the list
of all promotions twice in a row.
In cases when the appoximate footprint of the reference group being
promoted to registers is not a subset of any of the approximate
footprints of the reference groups promoted to shared, it is still
possible to promote by copying directly from global memory as long as
all overlapping reference groups have only read the data.  It will just
create multiple copies of the data in different storages without
compromising correctness.
In cases where a reference group promoted to registers covered exactly
the same accesses as another group promoted to shared memory, the second
group was demoted to save up shared memory space.  However, this led to
adverse effects since copying global->shared is performed in the
beginning of the block while copying global->register deeper in the tree,
which does not allow to hide latency from loads.  Keep the group
promoted to shared memory and perform a copy shared->register.

Alternative solution would be to decrease the promotion scope depth for
register promotion.  This would require to ensure that loops indices of
which are present in subscripts of "register" arrays are fully unrolled
so that the elements of that array are effectively mapped to registers.
Since unrolling is expensive in compilation time and is exposed to the
autotuner, we would prefer to also expose the register promotion depth
in the future.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants