-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFX components in GCP does not display component logs in GCP Vertex AI #6539
Comments
This issue has been raised for viewing component level logs in Logs explorer while running TFX pipelines in Vertex AI. I was unable to find any settings which can enable in the container logs. Please let me know if I am missing anything. Thank you! |
+1. Logs are also not displayed when using PyTorch + Kubeflow pipelines. Please fix it, this seems to be a general issue. Not only makes debugging tricky but I also can't get information if the specified GPU and memory is utilized when training. |
@ImmanuelXIV, This repo is for issues you face while implementing TFX pipelines. I would request you to open a issue with cloud support team. You can follow Get Support to raise an issue. Thank you! |
Strange that this is a general issue. |
We are experiencing the same issue in VAI trying to migrate our training pipelines to 1.14. I have raised a Google Support Case. Has anyone else experiencing this issue raised a case? Would be good to compare notes. |
Hello, we also ran several VAI pipelines with our hands but we were able to see the component logs, regardless if a component run failed or not. This is very weird and I want to check if “all” components logs are not displayed regardless if it failed or not, @crbl1122. But, I can give you a general way to debug.
|
@lego0901 I confirm that no logs or errors are seen neither for components running successfully, nor for the ones which are crashing during execution. |
I would like to express my gratitude for your confirmation. May I request further information from you so that we can conduct a more thorough investigation into this matter? Since we are unable to reproduce the issue on our end (despite the fact that numerous users are encountering the same problem), we require additional input regarding your specific situation. Could you kindly provide responses to the following questions:
Thank you very much for your assistance. |
we also do not see anything in Error Reporting. we did not see this before TFX 1.14 We manage depedencies using Poetry, Github does not support uplaod of lock files, but here is output of
|
Hi, TFX==1.12.0. |
Thanks for providing your environments! However, I was not able to reproduce the phenomenon for both configurations, using the VAI example running locally:
I think some configurations, not the TFX, are outdated so the logs are not displayed. Let me contact to Vertex AI team engineer internally to figure out the problem. Thank you.
|
@lego0901 Hi, thank you for pursuing this. I see you cannot reproduce with the Penguin Example. I will try to reproduce with a simple pipeline to further aid problem determination... |
@lego0901 Hi, I want to add that the same problem occurs for Apache Beam jobs in Dataflow. No logs are displayed. So far, except Kubeflow all other pipeline types I tested (TFX, Dataflow/Beam), does not produce any logs. |
We had the same problem when we initially migrated to Vertex from Kubeflow, but only in our production GCP project. Logs worked fine in the testing project. It took a long investigation to discover that the cause was the default logging bucket for the production GCP project was disabled. Enabling logging bucket fixed the problem. |
Hi @IzakMaraisTAL Where the default logging bucket is defined and how you enabled it? In Vertex AI pipeline if I do not use TFX, no special setting has to be made in order to view the logs. |
@crbl1122 , it might be that our issue is different from what you are seeing. We have not tested any non-TFX Vertex AI pipelines. The following instructions are LLM generated, but I double checked them in our testing project and they seem correct:
|
@IzakMaraisTAL Many thanks for the info. |
If the bug is related to a specific library below, please raise an issue in the
respective repo directly: TFX
TensorFlow Data Validation Repo
TensorFlow Model Analysis Repo
TensorFlow Transform Repo
TensorFlow Serving Repo
System information
Interactive Notebook, Google Cloud, etc):
pip freeze
output):google-api-python-client 1.12.11
google-apitools 0.5.31
google-auth 2.25.2
google-auth-httplib2 0.1.1
google-auth-oauthlib 1.0.0
google-cloud-aiplatform 1.37.0
google-cloud-appengine-logging 1.4.0
google-cloud-audit-log 0.2.5
google-cloud-bigquery 2.34.4
google-cloud-bigquery-storage 2.23.0
google-cloud-bigtable 2.21.0
google-cloud-core 2.4.1
google-cloud-datastore 2.18.0
google-cloud-dlp 3.14.0
google-cloud-language 2.12.0
google-cloud-logging 3.9.0
google-cloud-pubsub 2.19.0
google-cloud-pubsublite 1.8.3
google-cloud-recommendations-ai 0.10.6
google-cloud-resource-manager 1.11.0
google-cloud-spanner 3.40.1
google-cloud-storage 2.13.0
google-cloud-videointelligence 2.12.0
google-cloud-vision 3.5.0
google-crc32c 1.5.0
google-pasta 0.2.0
google-resumable-media 2.6.0
googleapis-common-protos 1.62.0
grpc-google-iam-v1 0.13.0
Describe the current behavior
I am running in GCP Vertex AI Kubeflow pipelines with TFX components. The problem is that no component logs are displayed in the Vertex interface (neither main job nor pipeline job) while in the Logs Explorer only framework messages are displayed. This is irrespective of the component type (ExamplesGen, Trainer, Transform, etc) and leads to very difficult blindly debugging of TFX components. I submit the pipelines using a service account which has Logs Writer/Reader privileges.
Describe the expected behavior
Be able to view the component logs for code debugging.
Standalone code to reproduce the issue
Providing a bare minimum test case or step(s) to reproduce the problem will
greatly help us to debug the issue. If possible, please share a link to
Colab/Jupyter/any notebook.
Name of your Organization (Optional)
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.
The text was updated successfully, but these errors were encountered: