Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker: limit payload size #740

Merged
merged 15 commits into from
Jul 31, 2024
Merged

Worker: limit payload size #740

merged 15 commits into from
Jul 31, 2024

Conversation

josephjclark
Copy link
Collaborator

@josephjclark josephjclark commented Jul 30, 2024

  • Don't send a dataclip on step:complete if the payload is too big
  • Don't send a log message if the payload is too big
  • Send output_dataclip_id to lightning if the dataclip is removed

Closes #739 #571

Log output with limit printed, log redacted and dataclip not sent:

RTE Memory limit: 500mb
RTE Timeout: 300s
RTE Payload limit: 10mb
VER Versions:
    ▸ node.js                    18.12.1
    ▸ worker                     1.4.1
    ▸ @openfn/language-common    1.15.1
R/T Executing 4b9cf63e-82c0-46c3-beca-ad84d8d2daab
R/T Starting step New job
R/T [linker] loading module @openfn/language-common
R/T [linker] Loading module @openfn/language-common from /tmp/openfn/worker/repo/node_modules/@openfn/language-common_1.15.1/dist/index.cjs
R/T Resolved adaptor @openfn/language-common to version 1.15.1
R/T Executing expression (1 operations)
R/T Starting operation 1
JOB (Log message redacted: exceeds 10mb memory limit)
R/T Operation 1 complete in 59ms
R/T Expression complete!
R/T Completed step New job in 346ms
R/T Final memory usage: [step 81mb] [system 234mb]
R/T Dataclip too large. This dataclip will not be sent back to lighting.
R/T Run complete with status: success

Copy link
Member

@taylordowns2000 taylordowns2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. thanks!

@stuartc , happy with this if you are. from an ENV perspective, we currently use MAX_DATACLIP_SIZE_MB to control the webhook payload size limit. In an ideal world, it would be the same as this WORKER_MAX_PAYLOAD_MEMORY_MB or, even if it's technically not the same, we'd provide some way to make it the same... i.e., to not have to managed two separate values.

If an administrator wants to limit the max dataclip size to 10mb they should be able to control that with a single environment variable, instead of keeping two in sync.

100% happy to merge this as is and add a new ENV to the mix, but want to avoid duplication if/when possible.

@stuartc
Copy link
Member

stuartc commented Jul 31, 2024

looks great. thanks!

@stuartc , happy with this if you are. from an ENV perspective, we currently use MAX_DATACLIP_SIZE_MB to control the webhook payload size limit. In an ideal world, it would be the same as this WORKER_MAX_PAYLOAD_MEMORY_MB or, even if it's technically not the same, we'd provide some way to make it the same... i.e., to not have to managed two separate values.

If an administrator wants to limit the max dataclip size to 10mb they should be able to control that with a single environment variable, instead of keeping two in sync.

I hear you, I'm hesitant to break the pattern we have - but what we can do on Lightning - is have the env WORKER_MAX_PAYLOAD_MEMORY_MB be set to the same as MAX_DATACLIP_SIZE_MB if the former isn't provided. This only applies if the RTM is running via Lightning. So an administrator would need to set WORKER_MAX_PAYLOAD_MEMORY_MB on the worker container.

But the worker has the a payloadLimitMb option for the run exposed now, so once we start sending that to the worker, the worker arguably doesn't need an env variable since Lightning will set it per-run. And in that case I'm happy to default to MAX_DATACLIP_SIZE_MB for the run options until we have per-run options.

@josephjclark what is the scale of the env variable? Does ...=10 mean 10 megabytes. Also doesn't matter too much, but is it Megabit or MegaByte?

@josephjclark
Copy link
Collaborator Author

The env var has been renamed to WORKER_MAX_PAYLOAD_MB

That's measured in bytes @stuartc

@stuartc stuartc merged commit 7f14570 into main Jul 31, 2024
7 checks passed
@stuartc stuartc deleted the max-message-size branch July 31, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

worker: restrict size of emitted dataclips
3 participants