Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3MultipleKeysSensor operator #15001

Closed
erparas opened this issue Mar 25, 2021 · 5 comments · Fixed by #18807
Closed

S3MultipleKeysSensor operator #15001

erparas opened this issue Mar 25, 2021 · 5 comments · Fixed by #18807
Labels
good first issue kind:feature Feature Requests provider:amazon-aws AWS/Amazon - related issues

Comments

@erparas
Copy link

erparas commented Mar 25, 2021

Description

Currently we have an operator, S3KeySensor which polls for the given prefix in the bucket. At times, there is need to poll for multiple prefixes in given bucket in one go. To have that - I propose to have a S3MultipleKeysSensor, which would poll for multiple prefixes in the given bucket in one go.

Use case / motivation

To make it easier for users to poll multiple S3 prefixes in a given bucket.

Are you willing to submit a PR?

Yes, I have an implementation ready for that.

Related Issues

NA

@erparas erparas added the kind:feature Feature Requests label Mar 25, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 25, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@eladkal
Copy link
Contributor

eladkal commented Mar 25, 2021

Must we have a new sensor for it?
Couldn't the current S3PrefixSensor be enhanced by changing prefix to accept str or List[str] that way it also backward compatible.

WDYT?

@eladkal eladkal added the provider:amazon-aws AWS/Amazon - related issues label Mar 25, 2021
@AmarEL
Copy link
Contributor

AmarEL commented Mar 26, 2021

We created the same sensor here to look for specific keys in the s3.
In our case, since the prefix was different, we did it on top of the check_for_key from the s3 hook class (The same used by the S3KeySensor).

        key_list = context['task_instance'].xcom_pull(task_ids='get_s3_keys_of_extracted_tables', key='s3_parquet_keys')
        hook = S3Hook(aws_conn_id=self.aws_conn_id, verify=self.verify)
        for key in key_list:
            self.log.info('Poking for key: s3://%s/%s', self.bucket_name, key)
            if not hook.check_for_key(key, self.bucket_name):
                return False
        return True

Make the S3KeySensor accept str or List[str] would work too.
Btw, apply this behavior for the three current sensors for s3 would be nice.

@SevakAvet
Copy link
Contributor

I'd more towards modifying existing operator and making it accept list of prefixes.
Also, the expected behaviour is that poke() will return True only if all keys exist, right?
Would that be useful to have sort of any(..) (when at least one key exist) instead of all(..)?

Also, this can be implemented as such:

return all(hook.check_for_key(key, self.bucket_name) for key in key_list)

@anaynayak
Copy link
Contributor

I can pick this up. Was thinking of the same approach as suggested by @SevakAvet

For the choice between any and all I believe it will depend on the use case. I can either get started with a simple all based PR or accept a Callable[[Dict[str, bool]],bool] to let the consumer decide based on the use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue kind:feature Feature Requests provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
6 participants