Skip to content

Comments

Add max_mails parameter to attachment methods in IMAP hook#60963

Merged
shahar1 merged 13 commits intoapache:mainfrom
Bucky789:imap-max-mails
Feb 10, 2026
Merged

Add max_mails parameter to attachment methods in IMAP hook#60963
shahar1 merged 13 commits intoapache:mainfrom
Bucky789:imap-max-mails

Conversation

@Bucky789
Copy link

Adds an optional max_mails parameter to IMAP hook attachment methods to allow limiting the number of latest emails processed while still retrieving all attachments from those emails.

This preserves existing behavior when max_mails=None.


Motivation

Currently, users must either:

  • Process all matching emails, or
  • Use latest_only=True and receive only a single attachment

There is no way to retrieve all attachments from the last N matching emails.


Changes

  • Added max_mails parameter to:
    • retrieve_mail_attachments
    • download_mail_attachments
  • Updated internal mail iteration logic via _list_mail_ids_desc
  • Preserved backward compatibility with existing behavior
  • All IMAP provider tests pass

Fixes: #60836


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: ChatGPT 5.2 following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@vaefremov95
Copy link

vaefremov95 commented Jan 23, 2026

@Bucky789
Hello!
What if someone will pass 0 or negative number?
Maybe specify in docstring that this parameter should be strictly greater than 0?

@eladkal eladkal requested a review from shahar1 January 23, 2026 14:30
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests and ensure you refer to the edge cases

@Bucky789
Copy link
Author

Hello,
I’ve added validation to ensure max_mails must be a positive integer and updated the docstrings accordingly.
I also added unit tests covering zero, negative, and positive values, as well as limiting the number of processed emails.
All provider IMAP tests are now passing locally.

Please let me know if anything else should be adjusted.
Thanks!!

Copy link
Contributor

@henry3260 henry3260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Bucky789
Copy link
Author

Thanks for the review and approval! Let me know if any final adjustments are needed. This was my first ever open source contribution!!

Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Looks like you're on the way, but there are some fixes to do.
Please let me know once you're done.

@Bucky789
Copy link
Author

@shahar1 Thanks for pointing this out.
I’ve removed the with self: usage so the hook no longer closes the IMAP connection on each method call.
The lifecycle is now fully managed by the caller as intended (e.g. in ImapAttachmentSensor).
Unit tests are updated and passing locally.

@Bucky789
Copy link
Author

I’ve updated the implementation to align with the suggested approach: the logic now limits the number of mail IDs before reversing, so we avoid unnecessary work while still preserving the “latest first” behavior. This removes the extra overhead and keeps the intent of max_mails clear.

The changes have been pushed, and all provider IMAP tests are passing locally.
Please let me know if you’d like this adjusted further or handled differently.

@Bucky789
Copy link
Author

Bucky789 commented Feb 9, 2026

Hi! @shahar1 Just checking in on this PR, are there any additional changes or feedback needed from my side? Thanks!

@shahar1
Copy link
Contributor

shahar1 commented Feb 10, 2026

Hi! @shahar1 Just checking in on this PR, are there any additional changes or feedback needed from my side? Thanks!

Hey, I'll rerun the CI and if everything is green - I'll approve & merge before the cutting the release today.
Please let me know if you need to make any further changes, because I might have to rerun the CI manually.
Thank you!

@shahar1 shahar1 changed the title [IMAP] Add max_mails parameter to attachment methods Add max_mails parameter to attachment methods in IMAP hook Feb 10, 2026
@shahar1 shahar1 merged commit 5a4a8be into apache:main Feb 10, 2026
86 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 10, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@shahar1
Copy link
Contributor

shahar1 commented Feb 10, 2026

Well done @Bucky789 !
I'll be happy if you could test it with actual Airflow deployment and IMAP instance before the upcoming release.

@vaefremov95
Copy link

@shahar1
HI!
I have tested new feature max_mails with my test Airflow Deployment (Airflow==2.11. Python 3.12.3).
I used my company's production IMAP server and real-case scenario. I have 3 emails returned by filter, each with 2 attachments, but I needed to retrieve only last mail.
I tested methods has_mail_attachment, retrieve_mail_attachments, and download_mail_attachments with different settings for max_mails.
Here are some logs:

[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - attachemnts found with max_mails: True
[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - count attachments without max_mails: 6
[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - count attachments with max_mails=1: 2
[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - count attachments with max_mails=1 and latest_only: 1
[2026-02-13, 20:50:38 UTC] {logging_mixin.py:190} INFO - downloaded attachments without max_mails: 4
[2026-02-13, 20:50:39 UTC] {logging_mixin.py:190} INFO - downloaded attachments with max_mails=1: 2
[2026-02-13, 20:50:40 UTC] {logging_mixin.py:190} INFO - downloaded attachments with max_mails=1 and latest_only: 1

Everything works as expected, exept one detail. Since some files have same names, download_mail_attachments method overwrites them. So instead of 6 files I get 4. But I guess this is topic for another issue).

Please let me know if this is sufficient testing. Also, should I write the results in your testing providers issue?
Thank you!

@shahar1
Copy link
Contributor

shahar1 commented Feb 13, 2026

@shahar1
HI!
I have tested new feature max_mails with my test Airflow Deployment (Airflow==2.11. Python 3.12.3).
I used my company's production IMAP server and real-case scenario. I have 3 emails returned by filter, each with 2 attachments, but I needed to retrieve only last mail.
I tested methods has_mail_attachment, retrieve_mail_attachments, and download_mail_attachments with different settings for max_mails.
Here are some logs:

[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - attachemnts found with max_mails: True
[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - count attachments without max_mails: 6
[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - count attachments with max_mails=1: 2
[2026-02-13, 20:50:34 UTC] {logging_mixin.py:190} INFO - count attachments with max_mails=1 and latest_only: 1
[2026-02-13, 20:50:38 UTC] {logging_mixin.py:190} INFO - downloaded attachments without max_mails: 4
[2026-02-13, 20:50:39 UTC] {logging_mixin.py:190} INFO - downloaded attachments with max_mails=1: 2
[2026-02-13, 20:50:40 UTC] {logging_mixin.py:190} INFO - downloaded attachments with max_mails=1 and latest_only: 1

Everything works as expected, exept one detail. Since some files have same names, download_mail_attachments method overwrites them. So instead of 6 files I get 4. But I guess this is topic for another issue).

Please let me know if this is sufficient testing. Also, should I write the results in your testing providers issue?
Thank you!

Please also comment in the providers status issue. If it overwrites files, I tend to potspone the release of this provider to RC2 and hopefully we could solve it by then (if you could help with it, it would be great).

Ratasa143 pushed a commit to Ratasa143/airflow that referenced this pull request Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ImapHook: allow limiting number of processed emails

5 participants