-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDataSourceWriteBenchmark, DataSourceWriteBenchmark and AvroWriteBenchmark to use main method #22861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gengliangwang @wangyum @dongjoon-hyun help review. |
|
Test build #98133 has finished for PR 22861 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yucai . We (@yucai and @gengliangwang and me) know what is this for, but please don't piggy-back (hide) something like this. This kind of change requires another explicit PR in general.
|
@dongjoon-hyun Originally, I want to do two things in this PR.
But, refactor Any suggestion how to split PR? |
|
At least, the following is worth for a separate PR because it's orthogonal One PR had better have one theme. Putting different themes into one PR together is not a good practice. Please start with the minimal one. If the committer asks some example, then add that later. That's better. |
|
Personally I am against accessing the main args in such way. It looks a bit ugly. |
|
Current implementation misses main args, but some suite would need it anyway. |
|
@dongjoon-hyun I used #22872 to make main args accessible for |
| * 2. with sbt: build/sbt "avro/test:runMain <this class>" | ||
| * To run this benchmark: | ||
| * {{{ | ||
| * 1. without sbt: bin/spark-submit --class <this class> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add avro:
bin/spark-submit --class <this class> --jars <spark core test jar>,<spark catalyst test jar>,<spark sql test jar> <spark avro test jar>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hint an exception when run:
bin/spark-submit --class org.apache.spark.sql.execution.benchmark.AvroWriteBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar ./external/avro/target/spark-avro_2.11-3.0.0-SNAPSHOT-tests.jar
Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source: Avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:647)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:94)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:93)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:313)
at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangyum Good catch! I think it needs <spark avro jar>, added.
|
Test build #98207 has finished for PR 22861 at commit
|
|
Test build #98215 has finished for PR 22861 at commit
|
|
Test build #98247 has finished for PR 22861 at commit
|
|
Test build #98245 has finished for PR 22861 at commit
|
|
@dongjoon-hyun Tests have been passed. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yucai . Almost looks good. Please update the PR description according to the PR content and merge yucai#6 .
- Fix
- SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.AvroWriteBenchmark"
+ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain org.apache.spark.sql.execution.benchmark.AvroWriteBenchmark"
- Remove
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.BuiltInDataSourceWriteBenchmark Parquet ORC"
Thanks @dongjoon-hyun , merged.
|
Thank you, @yucai , @gengliangwang and @wangyum . |
|
Test build #98310 has finished for PR 22861 at commit
|
…Benchmark, DataSourceWriteBenchmark and AvroWriteBenchmark to use main method ## What changes were proposed in this pull request? Refactor BuiltInDataSourceWriteBenchmark, DataSourceWriteBenchmark and AvroWriteBenchmark to use main method. ``` SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.BuiltInDataSourceWriteBenchmark" SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "avro/test:runMain org.apache.spark.sql.execution.benchmark.AvroWriteBenchmark" ``` ## How was this patch tested? manual tests Closes apache#22861 from yucai/BuiltInDataSourceWriteBenchmark. Lead-authored-by: yucai <yyu1@ebay.com> Co-authored-by: Yucai Yu <yucai.yu@foxmail.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Refactor BuiltInDataSourceWriteBenchmark, DataSourceWriteBenchmark and AvroWriteBenchmark to use main method.
How was this patch tested?
manual tests