-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[collector][emitter] Split metric payloads bigger than 2MB #3454
[collector][emitter] Split metric payloads bigger than 2MB #3454
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great solution! I think it's a bit unclear what's going on in parts of it, so I think some comments and such would really help.
Otherwise, I think it's great and once there's a bit of added clarity I'm very happy to give it a 👍
emitter.py
Outdated
|
||
for i in range(nb_chunks): | ||
compressed_payloads.extend( | ||
serialize_and_compress_metrics_payload({"series": series[i*series_per_chunk:(i+1)*series_per_chunk]}, max_compressed_size, depth+1, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments here? There's a lot going on in this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good comment, this definitely needs more explanations
emitter.py
Outdated
log.error("Maximum depth of payload splitting reached, dropping the %d metrics in this chunk", len(series)) | ||
return compressed_payloads | ||
|
||
nb_chunks = len(zipped)/max_compressed_size + 1 + int(compression_ratio/2) # try to account for the compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does nb_chunks
mean? I am honestly uncertain, even after thinking about it for a few minutes. We might want to rename it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nb_chunks
is the number of "chunks" (i.e. smaller payloads) we'll split the current metrics_payload
into. I can definitely document this more, let me know if you have a better idea for the name of the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nb is number! ha that makes sense. ha. I kept thinking n
and b
were different words. No I think this is fine, n_chucks
is the only other one I can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood what it did, the naming just threw me off
return compressed_payloads | ||
|
||
|
||
def serialize_and_compress_metrics_payload(metrics_payload, max_compressed_size, depth, log): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really cool, elegant solution.
However, it took me a bit to understand everything that was going on in it. I think it could stand to use a few comments. I think it would be very easy to make a mistake in editing this function in the future without some added clarity, and I specified some of the places where I think it could stand to be clearer in my other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very valid comment, I'm going to document this more
emitter.py
Outdated
@@ -29,6 +29,11 @@ | |||
control_char_re = re.compile('[%s]' % re.escape(control_chars)) | |||
|
|||
|
|||
# Only enforced for the metrics API on our end, for now | |||
MAX_COMPRESSED_SIZE = 2 << 20 # 2MB, the backend should accept up to 3MB but let's be conservative here | |||
MAX_SPLIT_DEPTH = 3 # maximum depth of recursive calls to payload splitting function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you arrive at this number? It's cool if it's arbitrary, clearly from the test it can split a giant object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arbitrary number yes
Thanks @gmmeyer for the review! Just to explain the approach a bit (especially the arbitrary numbers that I chose) In terms of why we need to have a recursive function:
|
Yea, I think the solution is good. 3 is a fine depth, there's no reason to use a smaller one, which would also likely result in more dropped payloads. The approach makes sense and is really nice, once I understood what it was doing the rationale seemed pretty clear. It just took a little bit of parsing to get there. |
Addressed your review @gmmeyer, thanks! I also lowered the max call depth to |
@olivielpeau won't it not descend if it's already small enough? The change is fine, though! |
|
||
for i in range(n_chunks): | ||
# Create each chunk and make them go through this function recursively ; increment the `depth` of the recursive call | ||
compressed_payloads.extend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍰
What does this PR do?
Split metric payloads from the collector that are bigger than 2MB
Motivation
Customer case where a custom check is collecting ~100,000 metrics, which causes the collector payloads to go above the threshold of 3MB (after compression) that's enforced by the metrics api endpoint.
Testing Guidelines
Additional Notes
This is rather limited in scope: it can only split the metrics payload from the collector. That said I think the
serialize_and_compress_metrics_payload
function could be extracted out and used in dogstatsd too, if needed.With some additional work, we could also make this more modular and allow the forwarder to call a "split" function when it receives a
413
response from the api endpoint. That would be more complex to implement though.