Add logging to Integration test runs in local and local-cluster mode #10644

razajafri · 2024-03-28T18:39:17Z

This PR adds logs to integration test runs in local and local-cluster modes

Today, the integration logs only write output to the console making it difficult to go back and check test failures without re-running the failing test suite. This is useful when debugging integration test failures in a dev environment. This can also be used in auditing and identifying all the execs/expressions that are currently being tested by our integration tests if one is to set PYSP_TEST_spark_rapids_sql_explain=ALL before running the suite.

After running integration tests, the logs will be generated under PROJECT_ROOT/integration_tests/target/run_dir-xxx directory

- Set explain all - Working logs - Added logger.FileHandler only for xdist workers

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

tgravescs · 2024-04-01T13:26:01Z

Please add a better description stating exactly what this adds, how a user would use it, and why its being added.

What are Working logs?

razajafri · 2024-04-01T14:38:30Z

Please add a better description stating exactly what this adds, how a user would use it, and why its being added.

What are Working logs?

Thanks for reviewing, I have updated the PR description. PTAL

integration_tests/README.md

tgravescs · 2024-04-01T14:46:07Z

integration_tests/src/main/python/spark_init_internal.py

+    # Set up Logging
+    # Create a named logger
+    global logger
+    logger.setLevel(logging.INFO)


is there a way to change log level without modifying this file?

We could use a file config based on the documentation but I am not sure if adding another config will add any value as this is a python file and can be changed without having to build the project.

so its not clear to me what this setLevel is doing compared to the file_handler, compared to the log4j file? you are passing the log4j as the driver options so why/when are these ones needed?

Here are some points to clarify

The pytest and the plugin are both writing to the same log file e.g gw0_worker_logs.log.

This logLevel is independent of the log level set in the log4j.properties and is only used for setting the level for the file_handler

The file_handler defined here, and thus the level, is only used for logging the test name that is needed in the worker_logs.

so this one is just for the pytest driver/framework to write to the logs? Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct? If that is the case just adding that as comment to this section would be nice.

What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?

any response top my questions?

I answered some here. #10644 (comment)

What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?

I have tested this on standalone mode and the changes have no effect on the logs if a user sets the --master I have not tested on yarn but I assume it should be the same?

yeah I think its fairly safe let the nightly integration tests catch any issues

integration_tests/src/test/resources/xdist_it_log4j.properties

integration_tests/README.md

integration_tests/src/test/resources/xdist_it_log4j.properties

…logs

razajafri · 2024-04-25T19:59:09Z

@tgravescs @gerashegalov PTAL

gerashegalov

LGTM, pending other reviews

razajafri · 2024-04-26T15:44:37Z

build

integration_tests/README.md

integration_tests/src/test/resources/log4j2.properties

integration_tests/src/test/resources/log4j.properties

integration_tests/src/main/python/spark_init_internal.py

tgravescs · 2024-04-26T16:00:35Z

integration_tests/src/main/python/spark_init_internal.py

+    # Set up Logging
+    # Create a named logger
+    global logger
+    logger.setLevel(logging.INFO)


so this one is just for the pytest driver/framework to write to the logs? Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct? If that is the case just adding that as comment to this section would be nice.

What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?

razajafri · 2024-04-26T16:35:57Z

so this one is just for the pytest driver/framework to write to the logs?

Yes, regardless of parallelism. This logger is used for logging pytest request.node.nodeid

Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct?

Only when TEST_PARALLEL > 1 otherwise we use log4j properties defined in run_pyspark_from_build.sh

I will add more comments on this line to make this clear.

razajafri · 2024-05-01T22:45:30Z

@tgravescs I have addressed your concern about only enforcing this in local mode. PTAL

tgravescs · 2024-05-02T14:22:15Z

mostly looks good, I had one question I'm waiting on

razajafri · 2024-05-02T17:26:09Z

build

gerashegalov · 2024-05-02T19:36:00Z

integration_tests/run_pyspark_from_build.sh

@@ -241,7 +242,8 @@ else
    # Set the Delta log cache size to prevent the driver from caching every Delta log indefinitely
    export PYSP_TEST_spark_databricks_delta_delta_log_cacheSize=${PYSP_TEST_spark_databricks_delta_delta_log_cacheSize:-10}
    deltaCacheSize=$PYSP_TEST_spark_databricks_delta_delta_log_cacheSize
-    export PYSP_TEST_spark_driver_extraJavaOptions="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize $COVERAGE_SUBMIT_FLAGS"
+    DRIVER_EXTRA_JAVA_OPTION="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"


[optional fix] : should be plural

Suggested change

DRIVER_EXTRA_JAVA_OPTION="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"

DRIVER_EXTRA_JAVA_OPTIONS="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"

gerashegalov · 2024-05-02T19:49:01Z

integration_tests/run_pyspark_from_build.sh

+        # The only case where we want worker logs is in local mode so we set the value here explicitly
+        # We can't use the PYSP_TEST_spark_master as it's not always set e.g. when using --master
+        export USE_WORKER_LOGS=1


why does it not apply to L306 dealing with the multi-executor spark that can also be executed in non-xdist mode?

I wanted to restrict the focus of this PR to nightly tests run only which run in local mode. Do you feel strongly about adding it for this flow as well?

Yes, the logic should be as straightforward as possible. The only distinction I see required is whether we have a Spark app running by an xdist worker or not.

I have added logs to local-cluster mode as well. PTAL

razajafri · 2024-05-03T01:30:19Z

@gerashegalov PTAL

razajafri · 2024-05-03T01:30:41Z

build

gerashegalov

LGTM

integration_tests/run_pyspark_from_build.sh

Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

razajafri · 2024-05-03T02:35:02Z

build

razajafri added 4 commits March 22, 2024 09:59

Add loggings to Integration test runs

bd8e007

- Set explain all - Working logs - Added logger.FileHandler only for xdist workers

undo setting explain

71c2da9

added documentation and did some refactoring

403e4a9

Signing off

e900327

Signed-off-by: Raza Jafri <rjafri@nvidia.com>

tgravescs reviewed Apr 1, 2024

View reviewed changes

integration_tests/README.md Outdated Show resolved Hide resolved

tgravescs reviewed Apr 1, 2024

View reviewed changes

razajafri added 2 commits April 1, 2024 08:31

addressed review comments

7caeffe

added documentation to clarify ambiguity in multiple log4j props file

2956c01

razajafri requested a review from tgravescs April 1, 2024 15:40

tgravescs reviewed Apr 1, 2024

View reviewed changes

integration_tests/README.md Outdated Show resolved Hide resolved

sameerz added the test Only impacts tests label Apr 1, 2024

gerashegalov reviewed Apr 2, 2024

View reviewed changes

integration_tests/README.md Outdated Show resolved Hide resolved

integration_tests/src/test/resources/xdist_it_log4j.properties Outdated Show resolved Hide resolved

razajafri changed the base branch from branch-24.04 to branch-24.06 April 3, 2024 15:22

razajafri added 5 commits April 24, 2024 15:53

Merge remote-tracking branch 'origin/branch-24.06' into SP-10626-add-…

42ad173

…logs

renamed xdist_it_log4j.properties to be more generic

192efc1

Added log4j to the flow with TEST_PARALLEL < 2

d0a5c2d

Added comments

e2a52b6

Updated README.md

ba4ca4d

gerashegalov previously approved these changes Apr 25, 2024

View reviewed changes

tgravescs reviewed Apr 26, 2024

View reviewed changes

added more comments and updated copyrights

7262357

razajafri dismissed gerashegalov’s stale review via 7262357 April 26, 2024 16:36

razajafri added 2 commits May 1, 2024 15:37

Made changes so the worker logs only get generated in local mode

f63022e

Updated docs

f166505

tgravescs previously approved these changes May 2, 2024

View reviewed changes

gerashegalov reviewed May 2, 2024

View reviewed changes

added logs to local-cluster mode as well

6298d8f

razajafri dismissed tgravescs’s stale review via 6298d8f May 3, 2024 01:28

razajafri changed the title ~~Add logging to Integration test runs~~ Add logging to Integration test runs in local and local-cluster mode May 3, 2024

gerashegalov previously approved these changes May 3, 2024

View reviewed changes

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved

razajafri dismissed gerashegalov’s stale review via ac69a5a May 3, 2024 02:34

razajafri and others added 2 commits May 2, 2024 19:34

Update integration_tests/run_pyspark_from_build.sh

ac69a5a

Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

Update integration_tests/run_pyspark_from_build.sh

ea5df58

Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>

gerashegalov approved these changes May 3, 2024

View reviewed changes

razajafri merged commit 50822d5 into NVIDIA:branch-24.06 May 3, 2024
44 checks passed

razajafri deleted the SP-10626-add-logs branch May 3, 2024 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logging to Integration test runs in local and local-cluster mode #10644

Add logging to Integration test runs in local and local-cluster mode #10644

razajafri commented Mar 28, 2024 •

edited

Loading

tgravescs commented Apr 1, 2024

razajafri commented Apr 1, 2024

tgravescs Apr 1, 2024

razajafri Apr 1, 2024

tgravescs Apr 1, 2024

razajafri Apr 25, 2024

tgravescs Apr 26, 2024

tgravescs May 2, 2024

razajafri May 2, 2024

tgravescs May 2, 2024

razajafri commented Apr 25, 2024

gerashegalov left a comment

razajafri commented Apr 26, 2024

tgravescs Apr 26, 2024

razajafri commented Apr 26, 2024 •

edited

Loading

razajafri commented May 1, 2024

tgravescs commented May 2, 2024

razajafri commented May 2, 2024

gerashegalov May 2, 2024

gerashegalov May 2, 2024

razajafri May 2, 2024

gerashegalov May 2, 2024

razajafri May 3, 2024

razajafri commented May 3, 2024

razajafri commented May 3, 2024

gerashegalov left a comment

razajafri commented May 3, 2024

	DRIVER_EXTRA_JAVA_OPTION="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"
	DRIVER_EXTRA_JAVA_OPTIONS="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"

Add logging to Integration test runs in local and local-cluster mode #10644

Add logging to Integration test runs in local and local-cluster mode #10644

Conversation

razajafri commented Mar 28, 2024 • edited Loading

tgravescs commented Apr 1, 2024

razajafri commented Apr 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razajafri commented Apr 25, 2024

gerashegalov left a comment

Choose a reason for hiding this comment

razajafri commented Apr 26, 2024

Choose a reason for hiding this comment

razajafri commented Apr 26, 2024 • edited Loading

razajafri commented May 1, 2024

tgravescs commented May 2, 2024

razajafri commented May 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razajafri commented May 3, 2024

razajafri commented May 3, 2024

gerashegalov left a comment

Choose a reason for hiding this comment

razajafri commented May 3, 2024

razajafri commented Mar 28, 2024 •

edited

Loading

razajafri commented Apr 26, 2024 •

edited

Loading