Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relu merge optimizer pass #586

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

oliviaweng
Copy link

@oliviaweng oliviaweng commented Jun 29, 2022

Description

We introduce an hls4ml optimizer pass for merging the ReLU layer into the Dense/Conv2D layers when ReLU immediately follows them---a frequently encountered pattern in neural networks (NNs). NNs in hls4ml are spatially laid out using dataflow stages to implement each layer, which are linked together by FIFOs. These FIFOs can cost BRAMs, LUTs, and/or FFs. By default in hls4ml, each ReLU is implemented as its own dataflow stage. Because each additional dataflow stage costs extra logic and FIFOs, we reduce the resource utilization by merging the ReLU activation function into the layer preceding it. Although the layers with the newly merged ReLU functionality use more logic than before, there is still a net decrease in resources. This optimization was introduced in hls4ml's MLPerf TinyML Benchmark 2022 submission and written up in this paper. Resource reductions introduced by this optimization are reported in the paper.

Type of change

This optimization pass was first mentioned in the MLPerf TinyML PR #503.

  • New feature (non-breaking change which adds functionality)
  • A new research paper code implementation

Tests

This repo contains two test models (a fully-connected NN and a CNN) that can be trained on MNIST, converted into Vivado HLS, and synthesized using Vivado HLS 2020.1.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.

@jmduarte
Copy link
Member

Hi @oliviaweng thanks a lot for the contribution, it looks great!

It seems some tests are failing: https://gitlab.cern.ch/fastmachinelearning/hls4ml/-/pipelines/4158655

I think most of the failures are relatively easy to fix, e.g. CONFIG_T::out_t is missing. Could you take a look and see if you can make them pass? We can also discuss if something is not clear with the error

Comment on lines 5 to 11
def match(self, node):
supported_layers = (Dense, Conv2D, Conv2DBatchnorm)

is_match = issubclass(node.get_input_node().__class__, supported_layers)
# ReLU layers are of class Activation
is_match = is_match and issubclass(node.__class__, Activation)
return is_match
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. For the missing CONFIG_T::out_t error, it looks like this match() function is too generous. Since all activation layers are of subclass Activation, any Dense/Conv2D layer that is followed by any activation function returns True, which is wrong. I'll look into tightening this up.

@vloncar
Copy link
Contributor

vloncar commented Jul 5, 2022

A question on the approach itself: This will only work for ReLU and will require duplicate overrides for every possible combination of activation, layer and HLS implementation if we want to expand it. Instead of extending the kernel of the matrix-vector multiplication to tack on ReLU computation and then create duplicate function calls for that, why don't we introduce a new operation that is nop by default that sits at the end of the dense function? Basically the where the cast function sits now. Then the config can include a proper implementation of activation if we choose to merge it in. Like in config class we add template<class data_T, class CONFIG_T> using activation = nnet::some_activation_or_nop<data_T, CONFIG_T>;

This probably sounds unclear, so if you have 15 minutes we can chat over zoom.

@jmitrevs
Copy link
Contributor

Should we discuss @vloncar's suggestion? It seems like there is quite a bit of interest in this PR so it will be good to get it in.

@vloncar
Copy link
Contributor

vloncar commented Jul 29, 2022

It was discussed offline and we converged on the proper approach to this.

@jmitrevs
Copy link
Contributor

Any more news on this?

@oliviaweng
Copy link
Author

@abijithYayavaram is currently building a more generic version. We aim to push some updates within the next several weeks.

@jmitrevs
Copy link
Contributor

What is the plan for this in general? Is this where we'll attack with the new code generation framework?

@vloncar
Copy link
Contributor

vloncar commented Oct 20, 2023

I believe we should revisit this once we have a better way of generating functions in which activations would be merged. There's little gain in general if FIFO depth optimization is applied (and none for io_parallel).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants