Implemented filter operation on XCom's #48868

dabla · 2025-04-07T10:52:58Z

As I already explained at @potiuk and @ashb I'm experimenting with streamable/iterable XCom's which means Xcom's could be an iterable (now it' not supported yet) which would be evaluated at runtime when the consumer operator iterates over the results of the producing operator, which has the advantage of having the producer operator not needing to fetch all pages (and thus create all task instances) before passing it to the next operator, hence reducing memory usage as well as avoid unnecessary waiting times, and thus leading to much higher throughput.

When doing such test, I discovered that due to that aspect, the XCom where missing one important feature, which is filtering. We already have multiple operations like map, zip and concat, which are very handy, but not a filter. Not having this operation on the Xcom would mean I would again have to process the full XCom iterable by applying the filtering through a PythonOperator, which just removes the advantage of the streaming/iterable functionality before being able to pass it to the consuming operator, hence why this PR.

This PR doesn't of course add the streaming functionality, as this is still a WIP/POC, but at least would already provide us the filter operation on an XCom which is missing today.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

uranusjr · 2025-04-16T06:26:48Z

Also I think this is not a refactor PR, but a feature addition.

uranusjr · 2025-04-16T07:59:10Z

Question about the get_task_map_length implementation: It seems that filter() does not change the length (since the call is simply passed onto the wrapped XComArg). This means for

@task
def g():
    return [0] * 4

@task
def p(v):
    print(v)

p.expand(v=g().filter(lambda x: x / 2))

The scheduler would still create 4 task instances for p. What would happen in each task instance? Do tis[1] and tis[3] get skipped, or tis[2] and tis[3]? Or something else?

Also I think it would be a good idea to split this into two PRs; one that solely do refactoring, and the other add filter and the related mechanism. This would make reviewing a lot easier.

dabla · 2025-04-16T08:55:36Z

Question about the get_task_map_length implementation: It seems that filter() does not change the length (since the call is simply passed onto the wrapped XComArg). This means for

Wouldn't the get_task_map_length derive the length from the _FilterResult, as there the len would be lazely calculated when needed in conjuction with expanded tasks?

Never mind found it you're right will fix it:

@get_task_map_length.register
def _(xcom_arg: SchedulerFilterXComArg, run_id: str, *, session: Session):
    return get_task_map_length(xcom_arg.arg, run_id, session=session)

dabla · 2025-04-16T09:12:49Z

Hmmm, dunno if it's possible to determine the length, as you need the context to resolve the arg and do the actual filtering? Ofc as we are using the streaming functionality in combination with the filtering, we don't have that issue as we don't need to know the task map length in advance. Maybe we could raise an AttributeError in that case, hence also probably a reason why the filter operation was never implemented before.

dabla · 2025-04-16T09:15:26Z

Maybe do something like:

@get_task_map_length.register
def _(xcom_arg: SchedulerFilterXComArg, run_id: str, *, session: Session):
    raise NotImplementedError(
        "Cannot determine map length for FilterXComArg until filtered values are fully evaluated"
    )

uranusjr · 2025-04-16T17:16:10Z

We need a length value somehow because the scheduler needs to know how many tis to run. Since it is not possible to actually know the real length, I think one possibility would be to just use the original length and create potentially too many tis, and just mark the ones not needed as skipped afterwards.

…dexable

…ed _LazyMapResult and _FilterResult

…agic method

dabla · 2025-06-04T09:10:12Z

We need a length value somehow because the scheduler needs to know how many tis to run. Since it is not possible to actually know the real length, I think one possibility would be to just use the original length and create potentially too many tis, and just mark the ones not needed as skipped afterwards.

The length issue for filter would not be an issue anymore if following PR would be accepted, so best to wait what comes out of it.

dabla · 2025-07-02T18:19:42Z

Filtering with XCom's will only be possible once AIP-88 is implemented in following PR, so until then this PR makes no sense as in current implementation Xcom's have to know their length in advance which is impossible with filtering as it will possibily alter the length of Xcom's while filtering.

refactor: Implemented filter operation on XCom

a0cc92f

dabla requested review from amoghrajesh, ashb, kaxil and uranusjr as code owners April 7, 2025 10:52

boring-cyborg bot added the area:task-sdk label Apr 7, 2025

dabla and others added 6 commits April 7, 2025 12:53

Merge branch 'main' into feature/added-filter-operation-to-xcom

5c56f4e

Merge branch 'main' into feature/added-filter-operation-to-xcom

031c28e

refactor: Fixed some static checks

c858c9c

refactor: Fixed some mypy issues

cc5e7be

refactor: Added filter to PlainXComArg

48310f3

refactor: Reverted SchedulerZipXComArg back to orginal

567c84b

dabla marked this pull request as draft April 7, 2025 11:55

dabla and others added 7 commits April 7, 2025 13:55

Merge branch 'main' into feature/added-filter-operation-to-xcom

22ede6a

refactor: Fixed signature of filter method in PlainXComArg

49f2479

refactor: Fixed method signature filter

cfeb4a1

refactor: Fixed callables type in _FilterResult

4c4ee72

refactor: raise TypeError if getitem is called on iterable

e5358aa

Merge branch 'main' into feature/added-filter-operation-to-xcom

25640d8

Merge branch 'main' into feature/added-filter-operation-to-xcom

deb60a7

dabla marked this pull request as ready for review April 7, 2025 14:11

dabla marked this pull request as draft April 7, 2025 14:14

dabla and others added 8 commits April 7, 2025 16:45

Merge branch 'main' into feature/added-filter-operation-to-xcom

b54b4ae

Merge branch 'main' into feature/added-filter-operation-to-xcom

0b60a0f

refactor: Refactored _MapResult to support iterables

faccd4b

refactor: Register task_map_length on SchedulerFilterXComArg

bd204e0

Merge branch 'main' into feature/added-filter-operation-to-xcom

9e9b50a

refactor: Fixed signature of __getitem__ in _MapResult

bd88a31

refactor: Print filter callable result

b171639

Merge branch 'main' into feature/added-filter-operation-to-xcom

fd8a9f8

davidblain-infrabel and others added 2 commits April 16, 2025 08:16

refactor: Changed elif to if in resolved method of MapXComArg

04fb807

Merge branch 'main' into feature/added-filter-operation-to-xcom

3d23e11

dabla changed the title ~~refactor: Implemented filter operation on XCom~~ Implemented filter operation on XCom's Apr 16, 2025

davidblain-infrabel and others added 4 commits April 16, 2025 08:34

refactor: Explicitly convert value to Sequence in CallableResultMixin

ec273bf

refactor: Simplified _LazyMapResult and _FilterResult

800e8ea

refactor: Re-used comon values variable where possible in test xcom args

eb28ffe

Merge branch 'main' into feature/added-filter-operation-to-xcom

0c415db

davidblain-infrabel and others added 3 commits April 16, 2025 11:26

refactor: Changed types of results classes

844b785

refactor: Raise an ValueError if value isn't list, set or dict

4e2bda9

Merge branch 'main' into feature/added-filter-operation-to-xcom

7bd7b93

davidblain-infrabel and others added 11 commits April 17, 2025 21:18

refactor: Sets musts be converted to lists also otherwise it's not in…

68a8329

…dexable

refactor: Try except the StopIteration when yielding instead of suppress

d0835a6

refactor: Fixed __getitem__ magic method of _FilterResult

91adef4

refactor: Renamed CallableResultMixin to _MappableResult and refactor…

40191f8

…ed _LazyMapResult and _FilterResult

refactor: Check if result in PlainXComArg needs runtime resolution

dec8eb8

refactor: Refactored _LazyMapResult and _FilterResult with __next__ m…

e586ed0

…agic method

Merge branch 'main' into feature/added-filter-operation-to-xcom

3c37423

refactor: Changed non_filter method of FilterXComArg to staticmethod

7a9c487

Merge branch 'main' into feature/added-filter-operation-to-xcom

cedd8ec

Merge branch 'main' into feature/added-filter-operation-to-xcom

2e93ec4

Merge branch 'main' into feature/added-filter-operation-to-xcom

ff703ac

dabla closed this Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented filter operation on XCom's #48868

Implemented filter operation on XCom's #48868

Uh oh!

dabla commented Apr 7, 2025 •

edited

Loading

Uh oh!

uranusjr commented Apr 16, 2025

Uh oh!

uranusjr commented Apr 16, 2025

Uh oh!

dabla commented Apr 16, 2025 •

edited

Loading

Uh oh!

dabla commented Apr 16, 2025

Uh oh!

dabla commented Apr 16, 2025 •

edited

Loading

Uh oh!

uranusjr commented Apr 16, 2025

Uh oh!

dabla commented Jun 4, 2025

Uh oh!

dabla commented Jul 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implemented filter operation on XCom's #48868

Implemented filter operation on XCom's #48868

Uh oh!

Conversation

dabla commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uranusjr commented Apr 16, 2025

Uh oh!

uranusjr commented Apr 16, 2025

Uh oh!

dabla commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dabla commented Apr 16, 2025

Uh oh!

dabla commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uranusjr commented Apr 16, 2025

Uh oh!

dabla commented Jun 4, 2025

Uh oh!

dabla commented Jul 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dabla commented Apr 7, 2025 •

edited

Loading

dabla commented Apr 16, 2025 •

edited

Loading

dabla commented Apr 16, 2025 •

edited

Loading