-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark fix #229
benchmark fix #229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nice rewrite and enhancement to current benchmarking tools!
benchmark/attri_util.py
Outdated
class ReadOnly: | ||
def __init__(self, value): | ||
self._value = value | ||
|
||
@property | ||
def value(self): | ||
return self._value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this abstraction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This structure is intended to protect data from unexpected changes. Since Python uses reference types by default rather than value types, it is susceptible to unintended modifications.
benchmark/attri_util.py
Outdated
|
||
BLAS_OPS = ReadOnly(["addmm", "mv", "addmm", "mm", "outer"]) | ||
|
||
DEFAULT_WARMUP_COUNT = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experience is we need a much longer warmup to warrant a stable perf result. I suggest flip the warmup and repeat values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
benchmark/attri_util.py
Outdated
# BLAS situation | ||
# BLAS shapes is defined by (B,M,N,K), it is different from the non blas Shapes | ||
DEFAULT_BLAS_BENCH_SHAPES = [(1, 1, 1, 32), (4, 15, 160, 1024), (16, 495, 5333, 71)] | ||
DEFAULT_BLAS_WITHOUT_BATCH_BENCH_SHAPES = [(1, 1, 32), (15, 160, 1024), (495, 5333, 71)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about BLAS_DEFAULT_BMNK, BLAS_DEFAULT_MNK?
Also, the larger sizes are rather small.
benchmark/attri_util.py
Outdated
|
||
@dataclass | ||
class BenckmarkMatrics: | ||
# the simple version shape info, this shape setted here just to with the last version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm a little fussy here. The past tense of set
is set still.
…r user-specified dtype and metrics, and abstract input generator.
…AULT_SHAPES_2D_ONLY shapes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Job,how about add some docs to CONTRIBUTING.md
fn = lambda: op(*args, **kwargs) | ||
if self.is_backward: | ||
out = fn() | ||
dout = torch.randn_like(out) | ||
fn = lambda: out.backward(dout, retain_graph=True) | ||
if CPU_MODE: | ||
for i in range(WARMUP): | ||
if Config.cpu_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we design a new mode which outputs both cpu latency and gpu latency? @tianxiao-baai
* benchmark fix * add seven new testing parameters * move shapes info to yaml file * Added the BenchmarkMetrics & BenchmarkResult abstraction
PR Category
Benchmark
Type of Change
New Feature
PR Description
1. New Testing Parameters
This PR introduces new benchmark testing parameters, including:
level
str
comprehensive
(default): Comprehensive testing.core
: Core testing.warmup
int
DEFAULT_WARMUP_COUNT = 1000
iter
int
DEFAULT_ITER_COUNT = 100
query
record
str
none
(default)log
: Logs output in JSON format.dtype
list[str]
dtypes
can be listed usingpytest --help
.torch.float16
,torch.float32
,torch.bfloat16
,torch.int16
,torch.int32
,torch.bool
,torch.complex64
metric
list[str]
latency
,speedup
,tflops
,latency_base
,accuracy
,utilization
2. Structural Design Adjustments
This section outlines several structural design adjustments:
Added the
BenchmarkMetrics
abstractionAdded the
BenchmarkResult
abstractionAdjusted the design of the
Benchmark
structure3. Improvements to Test Data
input_generator
, which provides a default input generator. Special input scenarios can directly override the corresponding generator.