This repository has been archived by the owner on Apr 28, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 212
[WIP] Even more registers #217
Open
ftynse
wants to merge
18
commits into
master
Choose a base branch
from
even-more-registers
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Extract as an overload of Scop::activePromotions taking a set of active statement instances and a tensor id. Overload was chosen because both functions return the same data structure and are semantically close.
Internally, we may need to modify the stored active promotions but the public functions return either a copy of or a const reference to that storage. Extract logic to find active promotions into a separate function that returns indexes into the storage, and use it to create a copy inside a public call.
If a group of references was promoted into shared memory, but it could be also promoted to registers while covering exactly the same statement instances accessing it, demote it from shared memory before promoting to registers.
These option combinations were failing with previous implementations of double promotion. Make sure they never fail again.
All other ScheduleTree node types are printed in such a way that each set(map) of the union_set(union_map) present in the node is printed on a new line. Do the same for extension nodes.
This creates a private convenience function to obtain a copy of active promotions specified by a list of their indexes in the storage. Use this function in Scop::promoteGroup to avoid retraversing the list of all promotions twice in a row.
In cases when the appoximate footprint of the reference group being promoted to registers is not a subset of any of the approximate footprints of the reference groups promoted to shared, it is still possible to promote by copying directly from global memory as long as all overlapping reference groups have only read the data. It will just create multiple copies of the data in different storages without compromising correctness.
In cases where a reference group promoted to registers covered exactly the same accesses as another group promoted to shared memory, the second group was demoted to save up shared memory space. However, this led to adverse effects since copying global->shared is performed in the beginning of the block while copying global->register deeper in the tree, which does not allow to hide latency from loads. Keep the group promoted to shared memory and perform a copy shared->register. Alternative solution would be to decrease the promotion scope depth for register promotion. This would require to ensure that loops indices of which are present in subscripts of "register" arrays are fully unrolled so that the elements of that array are effectively mapped to registers. Since unrolling is expensive in compilation time and is exposed to the autotuner, we would prefer to also expose the register promotion depth in the future.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
UNCLEAN AND NON-REBASED WORK IN PROGRESS
Do not merge or even review.