-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting remote_target_path
and remote_target_path_conn_id
, but no files land in S3 after running project dag
#1259
Comments
Hey there, @johnmcochran! I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together and get it resolved! To troubleshoot the issue of files not being transferred to S3 after setting
If these steps do not resolve the issue, it might be helpful to look into any specific error messages in the logs that could indicate what might be going wrong during the file transfer process. |
Hey @dosu, Do I need to add the remote_target_path to my DbtTaskGroup function? I was under the impression that simply adding the environment variables would let cosmos detect the desire for remote path and automatically send the target files to S3. Maybe you could help me understand why creating the environment variable for remote_target_path is necessary if remote_target_path needs to be specified in the DAG as well? I am curious about whether the S3 remote_target_path gets overwritten each time the DAG is run, or if it automatically adds subdirectories? Thank you, |
The Regarding the behavior of the |
Updated code: transform_data = DbtTaskGroup( I get this error when trying to add remote_target_path to my DbtTaskGroup: Broken DAG: [/usr/local/airflow/dags/sagitta_dags/sagitta_staff_dbt_dag.py] I tried digging through the DbtTaskGroup function and couldn't find any references to remote_target_path either. |
Hello @tatiana, Does the Bug tag that was assigned to this mean that it is a confirmed bug and there isn't something I need to fix on my end? Is there an ETA on how long issues normally take to be fixed? I'm moving to Astronomer soon and would like to know whether I can rely on the cosmos stored test failures functionalities for some upcoming data quality tests I am implementing at my company. Thank you, |
Hi @johnmcochran ! I had temporarily marked it as a bug until we could reproduce/troubleshoot it further. The variables This is an example of how I had set them up locally by using environment variables (this is a deployment global setting) when using GCS:
When doing this, Cosmos creates specific paths for each In this case, you can see three Cosmos Dbt "entities" were cached:
For this to work, please, confirm the folowing conditions are met:
Please, could you confirm that this works for you? BTW: would you be up for helping us improving our documentation so, in future, users don't face the same issues that you faced when trying out this feature? |
Hey @tatiana, TLDR: the cache config doesn't seem to be automatically generating AIRFLOW__COSMOS__ENABLE_CACHE=True and
I'm using cosmos 1.7.0, set in my requirements.txt file
Neither of these were listed in the Admin > Configurations page in airflow UI. I added these to my Dockerfile, and still not seeing test failures stored in the S3 directory defined by my remote_target_path. I tried just setting these before looking into load_method, since I saw that dbt_ls should be the default parsing method. If dbt_ls isn't the default, I think the documentation isn't clear on that end (I see that automatic looks for a manifest file, and then defaults to dbt_ls after that). I see for dbt_ls: "this requires the dbt executable to be installed on your machine". My assumption is that the below code would satisfy this requirement, but I'm not sure. execution_config = ExecutionConfig( This execution config is passed into my DbtTaskGroup: transform_data = DbtTaskGroup(
The example included in the documentation mentions setting the load_method in RenderConfig. I didn't not have this. I updated my DbtTaskGroup: transform_data = DbtTaskGroup( Unfortunately, none of these changes yielded test failures being stored in my designated S3 directory. I still have the failing test within my dbt project. |
@johnmcochran thanks for your prompt reply
Cosmos does not materialize a configuration on behalf of the user (e.g. environment variables). Still, the default behaviour is to consider those true, as shown in the code below, if the user does not specify them: astronomer-cosmos/cosmos/settings.py Line 20 in ae94975
Next steps
It would be great to see if your scheduler logs include:
|
Hey @tatiana , 1.>
I see in the airflow UI the following: Astronomer Runtime 12.1.1 based on Airflow 2.10.2+astro.1
Tried checking the logs in Airflow UI for the specific failed task and didn't see any of these phrases ('Trying to parse' or 'cache miss'), no hits. I checked the scheduler docker container and inspected the log files to see if there's different info there and found the following. Hopefully there's an easier way to check scheduler logs than what I just did, ha. Could the warning about conversion function be the issue? This chunk of logs repeated itself in a similar fashion many times within the scheduler logs. [2024-10-30T00:06:32.820+0000] {logging_mixin.py:190} INFO - [2024-10-30T00:06:32.819+0000] {dagbag.py:588} INFO - Filling up the DagBag from /usr/local/airflow/dags/sagitta_dags/sagitta_staff_dbt_dag.py |
Hi @tatiana, Have you had a chance to take a look at the logs I posted above? I bolded the portions that may be relevant, but unfortunately didn't see any hits for the things you wanted me to look for. Sincerely, |
Hi @tatiana, Checking in again to see if you could take a look at the response I posted to your questions :) Sincerely, |
remote_target_path
and remote_target_path_conn_id
, but no files land in S3 after running project dag
Hi @johnmcochran, I'm sorry for the delay. I've been sidetracked with other priorities at Astronomer. I'm back to this issue. At the beginning of the issue, you stated:
There was an issue in our documentation, which was fixed in #1305. By setting
And also:
As part of Cosmos 1.8, we're aiming to improve Cosmos callback support and implement the feature you're asking for. This feature request seems to be duplicated of this ticket: #801, that is in our current sprint. It is part of a bigger goal: #1349. Therefore, I suggest you track one of those tickets, and I'll close this one as a duplicate. Sorry again for the delay. In regards to the scheduler logs, you mentioned:
These logs will be in the Airflow scheduler, not in the Airflow UI, which currently only displays Task logs. Usually, the easiest way to see scheduler logs in Astro CLI is:
|
I'm trying to push data test failures to S3. To do this, I'm using the documentation here (https://astronomer.github.io/astronomer-cosmos/configuration/cosmos-conf.html#remote-target-path) to set up the target directory pointing to my S3 bucket.
There is limited info in the Cosmos documentation examples for how to get this up and running other than just creating the environment variables. What am I supposed to do after adding the environment variables? When I run my dbt project dag, no files are dropped into S3.
I successfully used the S3 connection for deploying docs, so I know that part is good, and I am using cosmos 1.7.0. I verified that I have a failing data test in my project.
bug?
Thank you,
John
The text was updated successfully, but these errors were encountered: