-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logging to Integration test runs in local and local-cluster mode #10644
Conversation
- Set explain all - Working logs - Added logger.FileHandler only for xdist workers
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Please add a better description stating exactly what this adds, how a user would use it, and why its being added. What are Working logs? |
Thanks for reviewing, I have updated the PR description. PTAL |
# Set up Logging | ||
# Create a named logger | ||
global logger | ||
logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to change log level without modifying this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a file config based on the documentation but I am not sure if adding another config will add any value as this is a python file and can be changed without having to build the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so its not clear to me what this setLevel is doing compared to the file_handler, compared to the log4j file? you are passing the log4j as the driver options so why/when are these ones needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some points to clarify
- The pytest and the plugin are both writing to the same log file e.g
gw0_worker_logs.log
. - This
logLevel
is independent of the log level set in the log4j.properties and is only used for setting the level for thefile_handler
- The
file_handler
defined here, and thus the level, is only used for logging the test name that is needed in the worker_logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this one is just for the pytest driver/framework to write to the logs? Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct? If that is the case just adding that as comment to this section would be nice.
What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any response top my questions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered some here. #10644 (comment)
What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?
I have tested this on standalone mode and the changes have no effect on the logs if a user sets the --master
I have not tested on yarn but I assume it should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think its fairly safe let the nightly integration tests catch any issues
@tgravescs @gerashegalov PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending other reviews
build |
# Set up Logging | ||
# Create a named logger | ||
global logger | ||
logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this one is just for the pytest driver/framework to write to the logs? Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct? If that is the case just adding that as comment to this section would be nice.
What happens to logs if running in on distributed Spark cluster - like yarn or standalone mode? I assume this will only write pytest related logs to this new file, do these configs mess up the normal spark logging?
Yes, regardless of parallelism. This logger is used for logging pytest
Only when TEST_PARALLEL > 1 otherwise we use log4j properties defined in I will add more comments on this line to make this clear. |
@tgravescs I have addressed your concern about only enforcing this in local mode. PTAL |
mostly looks good, I had one question I'm waiting on |
build |
@@ -241,7 +242,8 @@ else | |||
# Set the Delta log cache size to prevent the driver from caching every Delta log indefinitely | |||
export PYSP_TEST_spark_databricks_delta_delta_log_cacheSize=${PYSP_TEST_spark_databricks_delta_delta_log_cacheSize:-10} | |||
deltaCacheSize=$PYSP_TEST_spark_databricks_delta_delta_log_cacheSize | |||
export PYSP_TEST_spark_driver_extraJavaOptions="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize $COVERAGE_SUBMIT_FLAGS" | |||
DRIVER_EXTRA_JAVA_OPTION="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional fix] : should be plural
DRIVER_EXTRA_JAVA_OPTION="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize" | |
DRIVER_EXTRA_JAVA_OPTIONS="-ea -Duser.timezone=$TZ -Ddelta.log.cacheSize=$deltaCacheSize" |
# The only case where we want worker logs is in local mode so we set the value here explicitly | ||
# We can't use the PYSP_TEST_spark_master as it's not always set e.g. when using --master | ||
export USE_WORKER_LOGS=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it not apply to L306 dealing with the multi-executor spark that can also be executed in non-xdist mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to restrict the focus of this PR to nightly tests run only which run in local mode. Do you feel strongly about adding it for this flow as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the logic should be as straightforward as possible. The only distinction I see required is whether we have a Spark app running by an xdist worker or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added logs to local-cluster mode as well. PTAL
@gerashegalov PTAL |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
Co-authored-by: Gera Shegalov <gshegalov@nvidia.com>
build |
This PR adds logs to integration test runs in local and local-cluster modes
Today, the integration logs only write output to the console making it difficult to go back and check test failures without re-running the failing test suite. This is useful when debugging integration test failures in a dev environment. This can also be used in auditing and identifying all the execs/expressions that are currently being tested by our integration tests if one is to set
PYSP_TEST_spark_rapids_sql_explain=ALL
before running the suite.