Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingestion/scheduler): add extraArgs support for Ingestion Scheduler (e.g. for extra pip packages) #10195

Merged

Conversation

Nelvin73
Copy link
Contributor

@Nelvin73 Nelvin73 commented Apr 3, 2024

The scheduled ingestion does not support extraArgs, like extra PIP packages.
Thats why Ingestions with PIP EXTRA PACKAGES or PIP EXTRA ARGS work with manual trigger, but do not work when run by scheduler.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

The scheduled ingestion does not support extraArgs, like extra PIP packages
@github-actions github-actions bot added the community-contribution PR or Issue raised by member(s) of DataHub Community label Apr 3, 2024
@pedro93
Copy link
Collaborator

pedro93 commented Apr 3, 2024

Hello @Nelvin73,

Thank you for raising this PR. I don't see anything inherently wrong with it and I'm inclined to accept the change. Before I do however, could you elaborate on what you trying to achieve with this property?

As per our docs we allow specifying env vars, datahub plugins and extra pip packages all using this field already (it's an implementation detail) in the advanced config of ingestions.

Screenshot 2024-04-03 at 19 27 58

@Nelvin73
Copy link
Contributor Author

Nelvin73 commented Apr 4, 2024

Hello @pedro93

the PR is actually exactly for those mentioned features (extra pip packages, extra pip args, ...) to work in scheduled mode.

When triggered manually (which triggers a GraphQL mutation query) this works perfectly and the venv is taking account of these extra args.
But unfortunately, the same ingestion source, when running in scheduled mode, it will ignore those settings, as the extraArgs are not written into the IngestionRequestInput aspect.

Looks like adding the extraArgs to Scheduler was forgotten somehow.
There are also some comments in Slack about that issue
e.g. https://datahubspace.slack.com/archives/C029A3M079U/p1703169843319679
but I didnt find a ISSUE ticket for it.

I compiled and tested locally and with the fix, the scheduled ingestion was taken care about the extraArgs.

If you have further questinos, let me know.

@Nelvin73
Copy link
Contributor Author

@hsheth2 @pedro93
Is there something I can clarify or can be done, to get approval?
BR
Christian

@hsheth2 hsheth2 changed the title fix[ingestion/scheduler]: add extraArgs support for Ingestion Scheduler (e.g. for extra pip packages) fix(ingestion/scheduler): add extraArgs support for Ingestion Scheduler (e.g. for extra pip packages) Apr 16, 2024
@hsheth2 hsheth2 merged commit 3ac8778 into datahub-project:master Apr 16, 2024
32 checks passed
@Nelvin73 Nelvin73 deleted the fix-ingestion-scheduler-extraargs branch April 16, 2024 20:56
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
…er (e.g. for extra pip packages) (datahub-project#10195)

Co-authored-by: Christian Groll <christian.groll@payback.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants