-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VTA] [Hardware] Chisel implementation #3258
Conversation
} | ||
|
||
object ISA { | ||
def LUOP = BitPat("b_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_???????0_0????000") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to describe these instead of hand modifying the bit patterns? i.e a Scala abstraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this can be quite overwhelming / error prone to modify and extend. Finding a cleaner way to do this would be nice (e.g. what bits are relevant in the field, and what their values should be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can discuss this later to come up with a better way to do it. I have an idea but I will like to see your take on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a better way to decode? This could get difficult to parse if the instruction width gets wider potentially.
Okay did two passes over the design, overall looks good to me and mostly makes sense from a mid-level of understanding. My main comments are to provide more high-level explanation about what you are doing, common design patterns, and how things fit together so new people can get up to speed by reading some of the code without needing to ask as many questions. It would be good to bake in some of your design insights, comment on tricky pieces, and explain optimizations if you can. |
Let us talk about plans about tests/python/tsim/ As we finish the migration. I hope we have a unified VTA test-suite that test both (sim, tsim, and remote if it is available). The original VTA test infrastructure https://github.com/dmlc/tvm/blob/master/vta/tests/python/unittest/test_vta_insn.py#L74 is built in a way such that testing.run will run the test case for each available environment. See the implementation of run https://github.com/dmlc/tvm/blob/master/vta/python/vta/testing/util.py#L34 I hope we can reuse that instead of creating separate test cases for tsim only |
This modification makes Does this change seems good? @tqchen |
@vegaluisjose @jroesch A better way would be to add vta_sim_init to a with statement. Example code. Given this is a context scope.
See related code https://github.com/dmlc/tvm/blob/master/vta/python/vta/environment.py#L182 Once you do that, you no longer need to put sim_init calls, instead, you can just modify vta.testing.run |
} | ||
|
||
class UopDecode extends Bundle { | ||
val u2 = UInt(10.W) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we parameterize these bits from the uop decode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real question here is, are we going to support micro-ops larger than 32-bit? cause otherwise it will be hardcoding the field on another variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider supporting micro-ops larger than 32 bits
val M_STRIDE_BITS = 16 | ||
val M_PAD_BITS = 4 | ||
|
||
val C_UOP_BGN_BITS = 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now where are these parameters derived from? (from the code block below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design pattern here is the following
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the BitPat "design pattern" was taken from here perhaps for 32-bit and 64-bit machines looks less overwhelming than for 128-bit machines. We definitely can think more about this later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My follow up question is: these parameters are just default values, and will be derived from a top-level config file right? Or if I change the size of wgt memory, I'll need to change these bit positions?
} | ||
|
||
object ISA { | ||
def LUOP = BitPat("b_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_????????_???????0_0????000") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this can be quite overwhelming / error prone to modify and extend. Finding a cleaner way to do this would be nice (e.g. what bits are relevant in the field, and what their values should be).
Btw, I don't know why CI is breaking now at the GPU frontend side? @tqchen @jroesch @tmoreau89 |
Please ignore the GPU error. see some final comments |
vta/python/vta/testing/simulator.py
Outdated
"""Init hardware library for TSIM""" | ||
cur_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__))) | ||
vta_build_path = os.path.join(cur_path, "..", "..", "..", "build") | ||
ext = ".dylib" if sys.platform == "darwin" else ".so" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if extension already exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
vta/python/vta/testing/simulator.py
Outdated
@@ -55,5 +57,15 @@ def stats(): | |||
x = tvm.get_global_func("vta.simulator.profiler_status")() | |||
return json.loads(x) | |||
|
|||
def tsim_init(hw_lib): | |||
"""Init hardware library for TSIM""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a user-facing function, always document all arguments.
def tsim_init(hw_lib):
"""Description
Parameters
------------
hw_lib : str
Path to hardware library
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here.
Thanks, @vegaluisjose @tmoreau89 @jroesch @liangfu @huajsj , this PR is now merged |
This PR provides a Chisel implementation for VTA and it runs on top of TSIM. It runs successfully all the Conv2D layers of ResNet-18 and other unit tests. The testing directory currently is located at
tvm/vta/tests/python/tsim
. We can modify/merge that later once we figure out how we going to structure everything out.Let me know any feedback,