Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging to Integration test runs in local and local-cluster mode #10644

Merged
merged 17 commits into from
May 3, 2024

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented Mar 28, 2024

This PR adds logs to integration test runs in local and local-cluster modes

Today, the integration logs only write output to the console making it difficult to go back and check test failures without re-running the failing test suite. This is useful when debugging integration test failures in a dev environment. This can also be used in auditing and identifying all the execs/expressions that are currently being tested by our integration tests if one is to set PYSP_TEST_spark_rapids_sql_explain=ALL before running the suite.

  • After running integration tests, the logs will be generated under PROJECT_ROOT/integration_tests/target/run_dir-xxx directory

- Set explain all
- Working logs
- Added logger.FileHandler only for xdist workers
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@tgravescs
Copy link
Collaborator

Please add a better description stating exactly what this adds, how a user would use it, and why its being added.

What are Working logs?

@razajafri
Copy link
Collaborator Author

Please add a better description stating exactly what this adds, how a user would use it, and why its being added.

What are Working logs?

Thanks for reviewing, I have updated the PR description. PTAL

# Set up Logging
# Create a named logger
global logger
logger.setLevel(logging.INFO)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to change log level without modifying this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a file config based on the documentation but I am not sure if adding another config will add any value as this is a python file and can be changed without having to build the project.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so its not clear to me what this setLevel is doing compared to the file_handler, compared to the log4j file? you are passing the log4j as the driver options so why/when are these ones needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some points to clarify

  • The pytest and the plugin are both writing to the same log file e.g gw0_worker_logs.log.
  • This logLevel is independent of the log level set in the log4j.properties and is only used for setting the level for the file_handler
  • The file_handler defined here, and thus the level, is only used for logging the test name that is needed in the worker_logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one is just for the pytest driver/framework to write to the logs? Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct? If that is the case just adding that as comment to this section would be nice.

What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any response top my questions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered some here. #10644 (comment)

What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?

I have tested this on standalone mode and the changes have no effect on the logs if a user sets the --master I have not tested on yarn but I assume it should be the same?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think its fairly safe let the nightly integration tests catch any issues

integration_tests/README.md Outdated Show resolved Hide resolved
@razajafri razajafri requested a review from tgravescs April 1, 2024 15:40
@sameerz sameerz added the test Only impacts tests label Apr 1, 2024
@razajafri razajafri changed the base branch from branch-24.04 to branch-24.06 April 3, 2024 15:22
@razajafri
Copy link
Collaborator Author

@tgravescs @gerashegalov PTAL

gerashegalov
gerashegalov previously approved these changes Apr 25, 2024
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending other reviews

@razajafri
Copy link
Collaborator Author

build

integration_tests/README.md Show resolved Hide resolved
# Set up Logging
# Create a named logger
global logger
logger.setLevel(logging.INFO)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one is just for the pytest driver/framework to write to the logs? Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct? If that is the case just adding that as comment to this section would be nice.

What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?

@razajafri
Copy link
Collaborator Author

razajafri commented Apr 26, 2024

so this one is just for the pytest driver/framework to write to the logs?

Yes, regardless of parallelism. This logger is used for logging pytest request.node.nodeid

Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct?

Only when TEST_PARALLEL > 1 otherwise we use log4j properties defined in run_pyspark_from_build.sh

I will add more comments on this line to make this clear.

@razajafri
Copy link
Collaborator Author

@tgravescs I have addressed your concern about only enforcing this in local mode. PTAL

@tgravescs
Copy link
Collaborator

mostly looks good, I had one question I'm waiting on

tgravescs
tgravescs previously approved these changes May 2, 2024
@razajafri
Copy link
Collaborator Author

build

@@ -241,7 +242,8 @@ else
# Set the Delta log cache size to prevent the driver from caching every Delta log indefinitely
export PYSP_TEST_spark_databricks_delta_delta_log_cacheSize=${PYSP_TEST_spark_databricks_delta_delta_log_cacheSize:-10}
deltaCacheSize=$PYSP_TEST_spark_databricks_delta_delta_log_cacheSize
export PYSP_TEST_spark_driver_extraJavaOptions="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize $COVERAGE_SUBMIT_FLAGS"
DRIVER_EXTRA_JAVA_OPTION="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional fix] : should be plural

Suggested change
DRIVER_EXTRA_JAVA_OPTION="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"
DRIVER_EXTRA_JAVA_OPTIONS="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize"

Comment on lines +315 to +317
# The only case where we want worker logs is in local mode so we set the value here explicitly
# We can't use the PYSP_TEST_spark_master as it's not always set e.g. when using --master
export USE_WORKER_LOGS=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it not apply to L306 dealing with the multi-executor spark that can also be executed in non-xdist mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to restrict the focus of this PR to nightly tests run only which run in local mode. Do you feel strongly about adding it for this flow as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the logic should be as straightforward as possible. The only distinction I see required is whether we have a Spark app running by an xdist worker or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added logs to local-cluster mode as well. PTAL

@razajafri razajafri changed the title Add logging to Integration test runs Add logging to Integration test runs in local and local-cluster mode May 3, 2024
@razajafri
Copy link
Collaborator Author

@gerashegalov PTAL

@razajafri
Copy link
Collaborator Author

build

gerashegalov
gerashegalov previously approved these changes May 3, 2024
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved
integration_tests/run_pyspark_from_build.sh Outdated Show resolved Hide resolved
razajafri and others added 2 commits May 2, 2024 19:34
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
@razajafri
Copy link
Collaborator Author

build

@razajafri razajafri merged commit 50822d5 into NVIDIA:branch-24.06 May 3, 2024
44 checks passed
@razajafri razajafri deleted the SP-10626-add-logs branch May 3, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants