-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add archive_files to scrape args #129
Add archive_files to scrape args #129
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
You know my first read was just looking at the code as is, and taking a moment to think I'm realizing you took a different way to implement this than I was imagining. I was imagining that the configuration flag would be in the os-realtime lambda function config. So that if you wanted files to start archiving, you'd change the Lambda config in AWS admin console. This way should work fine, and it has the advantage of allowing a dev to run an ad hoc scrape with this flag set to test things (without changing behavior of other data flowing through). One disadvantage I can think of is that we'd have to redeploy DAGs (or I guess in the short run update about a dozen task definitions in the OS task defs repo?) if we wanted to switch the behavior for all realtime scraper runs. Were there other advantages/disadvantages you were thinking of with this approach? |
Thanks for additional insight. I approached the problem this based on the context(or lack of it). I have on our the system works. I did think of it having that advantage for easy local testing but I think I like your approach better. I am assuming that the lambda function config will be reflected in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you! A couple of follow-up actions are needed here (just in case you might not be aware):
- We need to merge and release this first (all the way to pypi)
- Then update openstates-scraper with the latest release of openstates-core via poetry
- Optionally: update openstates-realtime with the latest release of openstates-core via poetry, but this is not really important this time because the change in openstates-core is not directly used in the openstates-realtime project i.e we are not calling openstates-core library or any of the changes used directly just communication via SQS and Lambda and that's handled in the openstates realtime PR
Thank you Sogo, I probably need to a lecture on how all of this work on a call. |
DATA-4855
is_archive_files
argument