Skip to content
This repository was archived by the owner on Jul 24, 2024. It is now read-only.

Conversation

@crazydemo
Copy link

This PR

  1. use while rather than for loop in benchmark, so that the early stop can be truly enabled. Otherwise, once min_batches is done, the benchmark will be over, which may generate unstable performance results when min_batches is small.
  2. add min_bacthes and min_seconds into benchmark_params.

@crazydemo crazydemo requested a review from ZhennanQin January 26, 2024 10:40
while True:
s = get_time()
x = backend.to_device(x)
x = backend.to_device(sample)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's wrong that we pass some constant sample instead of part of our dataset. It might be cached in some way

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of this change is to avoid data_loader generating data during benchmark. Otherwise this is data_loader benchmark instead of MLP benchmark. Data_loader won't be used in production inference environment, instead it's possible to use same buffer holding different samples passing to the network. It means in real production environment, the input can be cached in some way, it's not a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no data generation during benchmarking. Right now we benchmark passing pre-generated data to compute device and forward pass. Main benchmarking happens with fw_times that is limited to net(backend.to_device(x))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have an alternative change in PR#75.
We could still pass the pre-generated test loader data to the forward pass. But then we will need to drop the first 3 steps' performance data in benchmarking period. Because we notice the previous warm up does not work. According to the onednn verbose, the first 3 steps in benchmarking period show quite poor performance.
I notice that you have an issue mentioned you would move warmup directly into main pass. If this issue is solved, maybe we can close this PR.

@Egor-Krivov
Copy link
Contributor

This PR

  1. use while rather than for loop in benchmark, so that the early stop can be truly enabled. Otherwise, once min_batches is done, the benchmark will be over, which may generate unstable performance results when min_batches is small.
  2. add min_bacthes and min_seconds into benchmark_params.

I don't mind point 2, but for point 1 can't you just pass larger min_batches in this case?

@Egor-Krivov
Copy link
Contributor

Merged alternative version of this PR

@Egor-Krivov Egor-Krivov closed this Feb 8, 2024
@Egor-Krivov Egor-Krivov deleted the zhangyan/fix_perf branch February 8, 2024 13:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants