-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ForEachItem task #2359
ForEachItem task #2359
Conversation
eab6b53
to
c9e3f86
Compare
1342549
to
454b80a
Compare
Note that test cannot be added to the MemoryExecutor as it fails with the known error:
I try to fix this error without success so I decided to only test the task with the JDBC and the Kafak runners. |
454b80a
to
3fc9225
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments on examples, otherwise awesome work! 🎸
c462edf
to
8d5ed56
Compare
8d5ed56
to
580d6a5
Compare
I took it for a spin just now and it seems that the internal storage file for a given batch cannot be accessed by the subflow: https://share.descript.com/view/dLuk3uN2F0L reproducer: subflow id: subflow
namespace: qa
tasks:
- id: for_each_item
type: io.kestra.plugin.scripts.shell.Commands
runner: PROCESS
commands:
- cat "{{ trigger.items }}" parent id: each_parent
namespace: dev
tasks:
- id: extract
type: io.kestra.plugin.jdbc.duckdb.Query
sql: |
INSTALL httpfs;
LOAD httpfs;
SELECT *
FROM read_csv_auto('https://raw.githubusercontent.com/kestra-io/datasets/main/csv/orders.csv', header=True);
store: true
- id: each
type: io.kestra.core.tasks.flows.ForEachItem
items: "{{ outputs.extract.uri }}"
maxItemsPerBatch: 10
subflow:
namespace: qa
flowId: subflow
wait: true
transmitFailed: true in the same way, the preview on the Overview page doesn't work and topology view cannot be expanded for that subflow (but this last one is a UI feature that Yann can help with): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comment, QA didn't fully pass
@anna-geller are you sure your branch was up-to-date while doing the QA as I fixed this issue on Tuesday |
|
||
@NotEmpty | ||
@PluginProperty(dynamic = true) | ||
@Schema(title = "The items to be split into batches and processed. Make sure to set it to Kestra's internal storage URI, e.g. output from a previous task in the format `{{ outputs.task_id.uri }}` or an input parameter of FILE type e.g. `{{ inputs.myfile }}`.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or from the namespace files storage ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the standard comment, but yes, it should be possible. But note that we currently have no way to have the storage URI of a namespace file easily (we only have a Pebble function to read the content).
For now, I prefer to keep the standard comment we use everywhere
yes @loicmathieu I deleted my local branch and re-pulled again just to be sure and still exactly as described above, please try the reproducer the server log I'm getting when trying to preview a batch file: |
@anna-geller this has been fixed |
…ch item of a file
Co-authored-by: Anna Geller <anna.m.geller@gmail.com>
3d635bc
to
5a6a6ca
Compare
Kudos, SonarCloud Quality Gate passed! |
Fixes #2131
What changes are being made and why?
Add a
ForEachItem
task that will split a file in batches and will trigger a subflow execution for each batch.The current implementation will generates as many subflows as there is batches with unlimited concurrency.
How the changes have been QAed?
Parent flow
Sub flow: