Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Some Benchmarks for get_json_object #10729

Open
wants to merge 2 commits into
base: branch-24.12
Choose a base branch
from

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Apr 22, 2024

This depends on #10728 and I don't know if this is where or how we want to deal with benchmarks. I am also happy to move this to another repo if we want to.

revans2 added 2 commits April 22, 2024 08:15
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
* limitations under the License.
*/

val input = "/data/tmp/SCALE_FROM_JSON"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this file?

val numRows = 3000000
//val nullProbability = 0.1
val nullProbability = 0.0001
val output = "/data/tmp/SCALE_FROM_JSON"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both input and output assume there's a /data/ folder. That may be fine and can be improved upon later, but perhaps a note would be good?

@abellina
Copy link
Collaborator

I think it's probably simple to have set of micro benchmarks tied to data gen that in spark-rapids, especially with the spark dependencies. We should add a runner in spark-rapids-benchmarks that triggers the correct code path using a jar and given a specific spark.

@ttnghia
Copy link
Collaborator

ttnghia commented Apr 22, 2024

FYI: we have related benchmark implemented spark-rapids-jni: NVIDIA/spark-rapids-jni#1952

@sameerz sameerz added the performance A performance related task/issue label Apr 23, 2024
@revans2 revans2 changed the base branch from branch-24.06 to branch-24.10 July 30, 2024 18:17
@revans2 revans2 changed the base branch from branch-24.10 to branch-24.12 October 9, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants