Add archive_files to scrape args #129

alexobaseki · 2024-06-06T01:47:48Z

Set is_archive_files argument

jessemortenson

LGTM

jessemortenson · 2024-06-06T17:08:06Z

You know my first read was just looking at the code as is, and taking a moment to think I'm realizing you took a different way to implement this than I was imagining. I was imagining that the configuration flag would be in the os-realtime lambda function config. So that if you wanted files to start archiving, you'd change the Lambda config in AWS admin console.

This way should work fine, and it has the advantage of allowing a dev to run an ad hoc scrape with this flag set to test things (without changing behavior of other data flowing through). One disadvantage I can think of is that we'd have to redeploy DAGs (or I guess in the short run update about a dozen task definitions in the OS task defs repo?) if we wanted to switch the behavior for all realtime scraper runs.

Were there other advantages/disadvantages you were thinking of with this approach?

alexobaseki · 2024-06-06T18:17:48Z

You know my first read was just looking at the code as is, and taking a moment to think I'm realizing you took a different way to implement this than I was imagining. I was imagining that the configuration flag would be in the os-realtime lambda function config. So that if you wanted files to start archiving, you'd change the Lambda config in AWS admin console.

This way should work fine, and it has the advantage of allowing a dev to run an ad hoc scrape with this flag set to test things (without changing behavior of other data flowing through). One disadvantage I can think of is that we'd have to redeploy DAGs (or I guess in the short run update about a dozen task definitions in the OS task defs repo?) if we wanted to switch the behavior for all realtime scraper runs.

Were there other advantages/disadvantages you were thinking of with this approach?

Thanks for additional insight. I approached the problem this based on the context(or lack of it). I have on our the system works. I did think of it having that advantage for easy local testing but I think I like your approach better. I am assuming that the lambda function config will be reflected in context args of process_import_function(event, context): such that
if context.file_archiving_enabled: #ARCHIVE_FILE?

elseagle

LGTM, thank you! A couple of follow-up actions are needed here (just in case you might not be aware):

We need to merge and release this first (all the way to pypi)
Then update openstates-scraper with the latest release of openstates-core via poetry
Optionally: update openstates-realtime with the latest release of openstates-core via poetry, but this is not really important this time because the change in openstates-core is not directly used in the openstates-realtime project i.e we are not calling openstates-core library or any of the changes used directly just communication via SQS and Lambda and that's handled in the openstates realtime PR

alexobaseki · 2024-06-06T19:02:46Z

LGTM, thank you! A couple of follow-up actions are needed here (just in case you might not be aware):

We need to merge and release this first (all the way to pypi)

Then update openstates-scraper with the latest release of openstates-core via poetry

Optionally: update openstates-realtime with the latest release of openstates-core via poetry, but this is not really important this time because the change in openstates-core is not directly used in the openstates-realtime project i.e we are not calling openstates-core library or any of the changes used directly just communication via SQS and Lambda and that's handled in the openstates realtime PR

Thank you Sogo, I probably need to a lecture on how all of this work on a call.

alexobaseki added 2 commits June 5, 2024 20:45

Add archive_files to scrape args

d25f012

Rename arg

f0bed83

alexobaseki requested review from elseagle and jessemortenson June 6, 2024 01:49

archive_files should be keyword arg

54d62b2

jessemortenson approved these changes Jun 6, 2024

View reviewed changes

Update args name

44a030f

elseagle approved these changes Jun 6, 2024

View reviewed changes

Add archive arguments

c3a4237

alexobaseki merged commit b62466f into main Jun 7, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add archive_files to scrape args #129

Add archive_files to scrape args #129

alexobaseki commented Jun 6, 2024 •

edited by jira bot

Loading

jessemortenson left a comment

jessemortenson commented Jun 6, 2024

alexobaseki commented Jun 6, 2024

elseagle left a comment •

edited

Loading

alexobaseki commented Jun 6, 2024

Add archive_files to scrape args #129

Add archive_files to scrape args #129

Conversation

alexobaseki commented Jun 6, 2024 • edited by jira bot Loading

jessemortenson left a comment

Choose a reason for hiding this comment

jessemortenson commented Jun 6, 2024

alexobaseki commented Jun 6, 2024

elseagle left a comment • edited Loading

Choose a reason for hiding this comment

alexobaseki commented Jun 6, 2024

alexobaseki commented Jun 6, 2024 •

edited by jira bot

Loading

elseagle left a comment •

edited

Loading