Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformations: implement stencil-tensorize-z-dimension #2516

Merged
merged 18 commits into from
May 13, 2024

Conversation

n-io
Copy link
Collaborator

@n-io n-io commented Apr 30, 2024

The stencil-tensorize-z-dimension pass transforms scalar stencils into stencils operating on tensors of z-dim values, which will prove helpful when lowering.

To summarise, the pass does the following

  • Types on which the stencil operates, such as memref<1024x512x512xf32> and similar are replaced by a corresponding tensorised version memref<1024x512xtensor<512xf32>>
  • Stencil accesses become 2-dimensional: stencil.access in the x and y dimensions produce a z-dim tensor, while stencil.access in the z-dimension produce a z-dim tensor with an offset (diagonal accesses are currently not supported, see below).
  • If an arith dialect operation has two tensors as inputs, this is valid in the dialect and requires no change.
  • If an arith dialect operation has tensor and scalar as inputs, the scalar is expanded into an empty tensor using tensor.empty and linalg.fill. This is a temporary solution, as the dialect we're lowing to does support mixing tensor and scalar operands in most cases.
  • The pass operates in a forward manner for building the correct ops, and then back-propagates the correct shapes in a second pass.

This should be lowered as follows:

  • (host) stencil.load triggers a host-to-device transfer of z-value tensors to each of the targeted devices
  • (host) stencil.store likewise triggers a device-to-host transfer
  • (host) stencil.apply invokes the computation
  • (device) stencil.access at non-zero x/y coordinates trigger device-to-device communication
  • (device) stencil.access at zero x/y coordinates trigger device-internal computation
  • (device) arith.<anything> trigger device-internal computation

Todo:

  • we will need to implement linalg.map for future updates

Limitations:

  • Diagonal accesses are not yet supported

@superlopuh superlopuh added the transformations Changes or adds a transformatio label May 1, 2024
Copy link

codecov bot commented May 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.66%. Comparing base (0c0ac43) to head (581c22a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2516      +/-   ##
==========================================
- Coverage   89.74%   89.66%   -0.08%     
==========================================
  Files         356      357       +1     
  Lines       44796    45024     +228     
  Branches     6721     6766      +45     
==========================================
+ Hits        40201    40373     +172     
- Misses       3609     3636      +27     
- Partials      986     1015      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@n-io
Copy link
Collaborator Author

n-io commented May 1, 2024

This transformation actually deals with two different flavours of z-value tensors, which can be distinguished as follows:

  1. padded z-value tensors (including the border regions, e.g. tensor<512xf32>) and
  2. non-padded z-value tensors (without the border regions, e.g. tensor<510xf32>).

Merging the main branch into this PR has come with additional verification, which because of the above distinction did not pass in two places:

  • stencil.access naturally seeks to return a non-padded version. However, this is now always handled in a separate tensor.extract_slice op that comes with a z-offset.
  • Similarly, stencil.store attempts to store non-padded z-value tensor into a padded z-value tensor. It is not entirely clear how to handle this, and this verification is currently disabled. (update: verification has been enabled again with a relaxed constraint)

Copy link
Collaborator

@PapyChacal PapyChacal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!
I see a lot of little things that could be simplified, with a bit of know-how, so I'm in favour of having a call and going over those so I can guide you through them, what do you think?
It should make the pass more readable and more powerful in one go, and is a nice opportunity to teach you more xDSL tricks 🙂

I also am strongly in favor of having one reverse-walking pass, rather than a forward tensorizing pass and a second one to adjust shapes. But we can do the above mentioned simplification orthogonally if you'd rather stick to that for now!

@AntonLydike
Copy link
Collaborator

This is very nice stuff! I'm not 100% sure in my head how we go from here to CSL, but this is definitely a good start!

@PapyChacal PapyChacal merged commit be08855 into main May 13, 2024
9 checks passed
@PapyChacal PapyChacal deleted the nicolai/stencil-tensorize-dimension branch May 13, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transformations Changes or adds a transformatio
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants