Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC][ACL] Fix list is not supported as an input node #10801

Merged
merged 14 commits into from
Apr 14, 2022

Conversation

DzAvril
Copy link
Contributor

@DzAvril DzAvril commented Mar 28, 2022

The input of op concatenate is a list of tensors. We tried to bring it to ACL and found the JSON node of it is like below.

{
  "nodes": [
    {
      "op": "input",
      "name": "arm_compute_lib_0_i0",
      "attrs": {
        "dtype": [
          [
            "int32",
            "int32",
            "int32"
          ]
        ],
        "shape": [
          [
            [1, 234, 234, 256],
            [1, 234, 234, 256],
            [1, 234, 234, 256]
          ]
        ]
      }
    },
    {
      "op": "kernel",
      "name": "concatenate",
      "inputs": [[
          0,
          0,
          0], [
          0,
          1,
          0], [
          0,
          2,
          0]],
      "attrs": {
        "num_outputs": "1",
        "num_inputs": "3",
        "dtype": [
          [
            "int32"
          ]
        ],
        "axis": [
          [
            "0"
          ]
        ],
        "shape": [
          [
            [3, 234, 234, 256]
          ]
        ]
      }
    }
  ],
  "arg_nodes": [0],
  "heads": [[
      1,
      0,
      0]],
  "node_row_ptr": [0, 3, 4]
}

The inputs of node kernel is:

"inputs": [[
          0,
          0,
          0], [
          0,
          1,
          0], [
          0,
          2,
          0]]

Element in inputs is the index of input node. Taking [0, 1, 0] as an example, the first 0 is nodeid, the second 1 is indexid, and the last 0 is version.
But in function Run, it only uses 0 as indexid in statement uint32_t eid = EntryID(nid, 0); so the other two inputs are ignored.

  void Run() override {
    for (size_t i = 0; i < input_nodes_.size(); ++i) {
      auto nid = input_nodes_[i];
      uint32_t eid = EntryID(nid, 0);
      if (nodes_[nid].GetOpType() == "input") {
        void* data = data_entry_[eid]->data;
        CheckACLError(layer_.inputs[i].allocator()->import_memory(data));
      }
    }

    for (size_t i = 0; i < outputs_.size(); ++i) {
      uint32_t eid = EntryID(outputs_[i]);
      void* data = data_entry_[eid]->data;
      CheckACLError(layer_.outputs[i].allocator()->import_memory(data));
    }

    this->layer_.function->run();
  }

This PR introduces a variable json_inputid_to_layer_inputid which maps the input index of JSON node to the index of the ACL layer's inputs.

@u99127
Copy link
Contributor

u99127 commented Mar 28, 2022

@lhutton1 - could you take a look ?

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DzAvril, its great to see more support being added to ACL again :) Apologies for the delayed review I was on holiday the past couple of days. Overall LGTM! The comments are mostly just about cleaning things up, testing and tightening the conditions under which concatenate gets offloaded

python/tvm/relay/op/contrib/arm_compute_lib.py Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
src/runtime/contrib/arm_compute_lib/acl_runtime.cc Outdated Show resolved Hide resolved
@DzAvril
Copy link
Contributor Author

DzAvril commented Mar 31, 2022

Thanks @DzAvril, its great to see more support being added to ACL again :) Apologies for the delayed review I was on holiday the past couple of days. Overall LGTM! The comments are mostly just about cleaning things up, testing and tightening the conditions under which concatenate gets offloaded

@lhutton1 Thanks for the comments. Please help to review again.

@lhutton1
Copy link
Contributor

Thanks @DzAvril LGTM!

cc @manupa-arm @comaniac @Mousius who may be able to help get this merged

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 7, 2022

Thanks @DzAvril LGTM!

cc @manupa-arm @comaniac @Mousius who may be able to help get this merged

Hi @lhutton1 , thanks for your review. As you have approved these changes, the status of this PR is supposed to be Approved, but in fact, it is still Review required. And quote:

Merging can be performed automatically with 1 approving review.

Will it be merged automatically after the status changes to Approved?

@lhutton1
Copy link
Contributor

lhutton1 commented Apr 7, 2022

Unfortunately since I'm just a reviewer my approval doesn't unblock merging, we would need to wait for someone who is a committer to approve this change and merge it.

Friendly ping @manupa-arm @comaniac @Mousius (I'm not sure who else might be interested in ACL support)

@lhutton1
Copy link
Contributor

Hi @DzAvril, I was having a think about this over the weekend and remembered that for the ACL integration we don't run the MergeCompilerRegions pass, meaning each operation is offloaded to ACL separately. Because of this, I'm not sure offloading concatenate (as opposed to running on TVM) would bring any performance improvement - since the operation doesn't really do any computation. So I'm curious, does this PR fix a separate issue you were seeing, or did you observe any kind of performance improvement while offloading concatenate?

My apologies for missing this previously - it's been a while since working on this.

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 11, 2022

Our team is working on a project which needs hot-upgrade for models, our approach is offloading as many operations as possible to ACL and performance is not a concern. So this PR fixes a separate issue of offloading operations which have a list as inputs and offloading concatenate is a typical case of this issue.

@cee1
Copy link

cee1 commented Apr 11, 2022

which have a list as inputs and offloading concatenate is a typical case of this issue.
For src/runtime/contrib/json/json_runtime.h

The JSONRuntimeBase misses a line for multiple EntryID (eid), as the patch pointed out.

( And this issue hits us while hooking ACL concat for a feature of "hot upgradable model":

  • "Hot upgradable model" are pure data, no executable code within it.
  • Which desire a common Op library to be used by ALL models.

We leverage ACL as the base of the "common Op library". )

Hi @lhutton1, is it a more appropriate way to
a) Disable hooking to ACL' concat within relay/op/contrib/arm_compute_lib.py? (While keeping the concat entry in runtime/contrib/arm_compute_lib/acl_runtime.cc)

Or b) Submit the fixup of json_runtime.h only?

Any idea?

@lhutton1
Copy link
Contributor

Thanks for the insight @DzAvril, @cee1, I think in that case it would be fine to keep the implementation. It might be a good idea to disable offloading concatenate by default, but then add an optional parameter say offload_concat=False to partition_for_arm_compute_lib to manually turn on partitioning for concatenate, WDYT?

@cee1
Copy link

cee1 commented Apr 12, 2022

offload_concat=False to partition_for_arm_compute_lib to manually turn on partitioning for concatenate

Hi @lhutton1, would you please review this revision of the patch?

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision @DzAvril, @cee1. LGTM.. again!

@cee1
Copy link

cee1 commented Apr 12, 2022

Thanks for the revision @DzAvril, @cee1. LGTM.. again!

Hi @u99127, could you review this patch and perform "approving review" if looks fine ...?

@lhutton1
Copy link
Contributor

@masahi @areusch @tmoreau89 - we're struggling to find a committer to take a look at this PR, do you have any suggestions?


Returns
-------
ret : annotated and partitioned module.
"""
global offload_concat_
Copy link
Member

@masahi masahi Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a global var, can we make use of arm_compute_lib_pattern_table?

def arm_compute_lib_pattern_table():

i.e. register concat pattern if offload_concat == True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look @masahi. Currently ACL has a mixture of registering operations using composite functions and _register_external_op_helper which adds the target.arm_compute_lib attribute, which is not ideal. I think if we wanted to do this for concat it would need to be registered using pattern table rather than the other mechanism.

In the future we should probably use the pattern table for all operations.

Copy link
Member

@masahi masahi Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we are adding a new op support, why not add it via pattern like

Then it is easy to enable / disable a pattern based on some flags. @DzAvril

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, please review. I'm not sure it is what you want because it seems a little weird since arm_compute_lib_pattern_table aims to register fusing patterns for ops and registering op attr is out of it.
BTW, this commit triggered a Sanity Check error:

python/tvm/relay/op/contrib/arm_compute_lib.py:274: [W0612(unused-variable), arm_compute_lib_pattern_table.concatenate] Unused variable 'concatenate'

Copy link
Member

@masahi masahi Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't know what @tvm.ir.register_op_attr("nn.conv2d", "target.arm_compute_lib") thing is for... Why do you need both the pattern + check function at


def check_conv(extract):

and register_op_attr thing at

@tvm.ir.register_op_attr("nn.conv2d", "target.arm_compute_lib")
def conv2d(expr):

?

I never needed the latter in the BYOCs I worked on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't know what @tvm.ir.register_op_attr("nn.conv2d", "target.arm_compute_lib") thing is for...

Gently ping @lhutton1, can you help to explain this to masai?

Copy link
Member

@masahi masahi Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DzAvril, @lhutton1 said "I think if we wanted to do this for concat it would need to be registered using pattern table rather than the other mechanism". This is also what I'm suggesting above. Can you do that? You shouldn't need @tvm.ir.register_op_attr thing for concat. You just need to add is_op("concatenate") etc + check function which you already have under @tvm.ir.register_op_attr(concatenate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't know what @tvm.ir.register_op_attr("nn.conv2d", "target.arm_compute_lib")

Yeah I don't think we need to register conv2d like this either as it should have already been picked up by the pattern table. This way of registering operators predates the pattern table so I guess this was just overlooked. We can follow up in a separate PR to move all operators so that they are registered using the pattern table - which I believe is the way other BYOC's are going. Apologies for the confusion @DzAvril, like @masahi said, it would be best to register concatenate (and other operators added in the future) using the pattern table since this is the preferred mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing some comments above. I have a question about the concatenate_pattern, see code below.

    def concatenate_pattern():
        """Create an concatenate pattern from equivalent relay operators.

        Returns
        -------
        pattern : dataflow_pattern.AltPattern
            Denotes the concatenate pattern.
        """
        pattern = is_op("concatenate")(wildcard())
        return pattern

As the input of concatenate is a tuple of tensors, what the pattern of the parameters should be like? wildcard() in the code block is not working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try is_tuple(None)? See #7754

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 13, 2022

@masahi @lhutton1 Thanks for your comments. I re-support concatenate with pattern table mechanism and bring two changes.

  1. Add a parameter called disabled_ops that indicates the ops we do not offload to ACL to partition_for_arm_compute_lib, concatenate is included by default.
  2. Add a function called AddCommonSingleJSONNode to add single op like concatenate to graph.
  std::vector<JSONGraphNodeEntry> AddCommonSingleJSONNode(const CallNode* cn, std::string name) {
    std::vector<JSONGraphNodeEntry> inputs;
    for (const auto& arg : cn->args) {
      auto res = VisitExpr(arg);
      inputs.insert(inputs.end(), res.begin(), res.end());
    }
    auto node = std::make_shared<JSONGraphNode>(name,     /* name_ */
                                                "kernel", /* op_type_ */
                                                inputs, 1 /* num_outputs_ */);

    const auto* fn = cn->op.as<FunctionNode>();
    ICHECK(fn);
    const auto* callNode = fn->body.as<CallNode>();
    ICHECK(callNode);
    SetCallNodeAttribute(node, callNode);
    return AddNode(node, GetRef<Expr>(cn));
  }

This function calls AddNode which returns a list of graph entry nodes and flattens the tuple type expr. In this way, the core problem this PR aims to fix is gone, because after flattening tuple type expr the JSON graph like below:

  {
    "attrs": {
      "dtype": [
        [
          "float32"
        ]
      ],
      "shape": [
        [
          [
            1,
            234,
            234,
            256
          ]
        ]
      ]
    },
    "name": "",
    "op": "input"
  },
  {
    "attrs": {
      "dtype": [
        "float32"
      ],
      "shape": [
        [
          2,
          234,
          234,
          256
        ]
      ]
    },
    "name": "",
    "op": "input"
  },
  {
    "attrs": {
      "dtype": [
        "float32"
      ],
      "shape": [
        [
          3,
          234,
          234,
          256
        ]
      ]
    },
    "name": "",
    "op": "input"
  },
  {
    "attrs": {
      "axis": [
        [
          "0"
        ]
      ],
      "dtype": [
        [
          "float32"
        ]
      ],
      "num_inputs": "3",
      "num_outputs": "1",
      "shape": [
        [
          [
            6,
            234,
            234,
            256
          ]
        ]
      ]
    },
    "inputs": [
      [
        0,
        0,
        0
      ],
      [
        1,
        0,
        0
      ],
      [
        2,
        0,
        0
      ]
    ],
    "name": "concatenate",
    "op": "kernel"
  }

It will work without json_inputid_to_layer_inputid. Even though I think we should keep the change I made in acl_runtime.cc and other related files in case of other corner cases occur.

@DzAvril
Copy link
Contributor Author

DzAvril commented Apr 13, 2022

def arm_compute_lib_pattern_table(disabled_ops=["concatenate"]):

The default value of disabled_ops triggers an error in Sanity Check

Dangerous default value [] as argument

I have no idea how to fix this, can you give a hint pls?

@masahi
Copy link
Member

masahi commented Apr 13, 2022

You can add dangerous-default-value to

# pylint: disable=invalid-name, unused-argument
to suppress this warning.

@masahi masahi merged commit 52f52c8 into apache:main Apr 14, 2022
Lucien0 pushed a commit to Lucien0/tvm that referenced this pull request Apr 19, 2022
* [BYOC][ACL] Fix list is not supported as an input node

* fix clang lint error

* fix compile warnning

* fix python module import error

* rename concatenate test file

* fix always MakeACLTensor with same eid 0

* do not offload concat default

* fix concattnate test failure

* fix test failure

* fix lint error

* fix lint

* remove global var offload_concat

* support concatenate with pattern table mechanism

* disable pylint dangerous-default-value warning

Co-authored-by: XuZhi <xuzhi.xu@alibaba-inc.com>
altanh pushed a commit to altanh/tvm that referenced this pull request Apr 28, 2022
* [BYOC][ACL] Fix list is not supported as an input node

* fix clang lint error

* fix compile warnning

* fix python module import error

* rename concatenate test file

* fix always MakeACLTensor with same eid 0

* do not offload concat default

* fix concattnate test failure

* fix test failure

* fix lint error

* fix lint

* remove global var offload_concat

* support concatenate with pattern table mechanism

* disable pylint dangerous-default-value warning

Co-authored-by: XuZhi <xuzhi.xu@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants