Skip to content

Conversation

@stiak
Copy link
Contributor

@stiak stiak commented Jun 14, 2023

With the move from delegate_to to impersonation_chain I'm no longer able to impersonate a user, only a service account. Google drive files created by a service account aren't visible to anyone else by default so having them placed under an already shared folder makes it easier for people to see them.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Jun 14, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 14, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Contributor

@phanikumv phanikumv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an example for the destination folder movement(similar to howto_operator_gcs_to_gdrive_move_files) , and explain this new functionality in gcs_to_gdrive.rst. You may also want to refer to that example from the rst file

@stiak
Copy link
Contributor Author

stiak commented Jun 14, 2023

Please add an example for the destination folder movement(similar to howto_operator_gcs_to_gdrive_move_files) , and explain this new functionality in gcs_to_gdrive.rst. You may also want to refer to that example from the rst file

I'm not sure what value I should provide for the folder ID in that example. It has to be a pre-existing folder that the google drive connection has access to. Can I use a dummy value, or is this meant to run as an integration test?

@phanikumv
Copy link
Contributor

Please add an example for the destination folder movement(similar to howto_operator_gcs_to_gdrive_move_files) , and explain this new functionality in gcs_to_gdrive.rst. You may also want to refer to that example from the rst file

I'm not sure what value I should provide for the folder ID in that example. It has to be a pre-existing folder that the google drive connection has access to. Can I use a dummy value, or is this meant to run as an integration test?

You can use a dummy value

@stiak
Copy link
Contributor Author

stiak commented Jun 14, 2023

I've added an example for "Copy into an existing folder". I ended up using this pattern for the folder id:

FOLDER_ID = os.environ.get("GCP_GDRIVE_FOLDER_ID", "abcd1234")

I found this example in tests/providers/google/cloud/transfers/test_gdrive_to_gcs.py

@eladkal
Copy link
Contributor

eladkal commented Jun 14, 2023

cc @shahar1 WDYT?

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please consider a small change to your example

@potiuk
Copy link
Member

potiuk commented Jul 5, 2023

Static checks failing - consider installing and running precommit - this is helpful as it will fix things for you automatically.

@stiak
Copy link
Contributor Author

stiak commented Jul 11, 2023

Static checks failing - consider installing and running precommit - this is helpful as it will fix things for you automatically.

Thanks, think I've fixed the static checks now. I've run pre-commit run --all-files locally and committed the fixups.

stiak added 7 commits July 11, 2023 09:08
With the move from `delegate_to` to `impersonation_chain` I'm no longer able to impersonate a user, only a service account.  Google drive files created by a service account aren't visible to anyone else by default so having them placed under an already shared folder makes it easier for people to see them.
In this example the name of the source object is irrelevant
@potiuk potiuk merged commit 3a8da4b into apache:main Jul 11, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 11, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@apallerlamudi
Copy link

apallerlamudi commented Jul 19, 2023

@stiak @potiuk I'm trying to move a folder(with multiple files) from a GCS bucket into a Google Drive (Which was created by our sales team) and I'm using this operator GCSToGoogleDriveOperator.

However, The task is running fine but the files are not moving to this google drive.

below is my task:
move_files_to_gdrive = GCSToGoogleDriveOperator(
task_id='move_files_to_gdrive',
source_bucket=GCS_BUCKET,
source_object= "incoming/DHR_*",
destination_object = "XXXXXXXXUUUUUUUIIIIIII"
)

XXXXXXXXUUUUUUUIIIIIII --> it is the Google Drive Folder id

Composer Version : 2.3.4
Airflow Version : 2.5.3

@stiak
Copy link
Contributor Author

stiak commented Jul 19, 2023

This change added a destination_folder_id parameter. It's available in version 10.4.0 of the apache-airflow-providers-google package.

Cloud composer use their own custom version of that package, which is based on 10.1.1 from the release notes.

@apallerlamudi
Copy link

@stiak Does this mean I need to wait for composer to upgrade their version of package to 10.4.0 for the apache-airflow-providers-google package?

@potiuk
Copy link
Member

potiuk commented Jul 21, 2023

@stiak Does this mean I need to wait for composer to upgrade their version of package to 10.4.0 for the apache-airflow-providers-google package?

You can likely upgrade it yourself. I believe you can upgrade packages independently in Composer and google provider is the same package as any other (unless of course there is a good reason you need to use the composer's variant of the package there - but that's more of a question to composer team.

But.

image

You always have option of copying the code into your custom operator and using it instead. Airflow is flexible, customizable, fully open-source and "Writing code" is precisely that you are doing when authoring DAGs. Copying the code you can see in this PR and creating and using your own operator while waiting for new provider being released is - I think - even easier than authoring a DAG. You can even add it in a common, .airflowignored package in your DAG folder following https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html#typical-structure-of-packages and then the only change to do is instead of "from airflow.providers.google....." do "from my-package" when importing it in DAG. You can even keep the same class name so that when you get a new provider you can swap import back

It's all Python and extending it's capabilities by writing code is first-class citizen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants