Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collector][emitter] Split metric payloads bigger than 2MB #3454

Merged
merged 3 commits into from
Aug 21, 2017

Conversation

olivielpeau
Copy link
Member

What does this PR do?

Split metric payloads from the collector that are bigger than 2MB

Motivation

Customer case where a custom check is collecting ~100,000 metrics, which causes the collector payloads to go above the threshold of 3MB (after compression) that's enforced by the metrics api endpoint.

Testing Guidelines

  • Added a test
  • The fix has worked on the customer's setup

Additional Notes

This is rather limited in scope: it can only split the metrics payload from the collector. That said I think the serialize_and_compress_metrics_payload function could be extracted out and used in dogstatsd too, if needed.

With some additional work, we could also make this more modular and allow the forwarder to call a "split" function when it receives a 413 response from the api endpoint. That would be more complex to implement though.

Copy link
Contributor

@gmmeyer gmmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great solution! I think it's a bit unclear what's going on in parts of it, so I think some comments and such would really help.

Otherwise, I think it's great and once there's a bit of added clarity I'm very happy to give it a 👍

emitter.py Outdated

for i in range(nb_chunks):
compressed_payloads.extend(
serialize_and_compress_metrics_payload({"series": series[i*series_per_chunk:(i+1)*series_per_chunk]}, max_compressed_size, depth+1, log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments here? There's a lot going on in this line.

Copy link
Member Author

@olivielpeau olivielpeau Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good comment, this definitely needs more explanations

emitter.py Outdated
log.error("Maximum depth of payload splitting reached, dropping the %d metrics in this chunk", len(series))
return compressed_payloads

nb_chunks = len(zipped)/max_compressed_size + 1 + int(compression_ratio/2) # try to account for the compression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does nb_chunks mean? I am honestly uncertain, even after thinking about it for a few minutes. We might want to rename it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nb_chunks is the number of "chunks" (i.e. smaller payloads) we'll split the current metrics_payload into. I can definitely document this more, let me know if you have a better idea for the name of the variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nb is number! ha that makes sense. ha. I kept thinking n and b were different words. No I think this is fine, n_chucks is the only other one I can think of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood what it did, the naming just threw me off

return compressed_payloads


def serialize_and_compress_metrics_payload(metrics_payload, max_compressed_size, depth, log):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really cool, elegant solution.

However, it took me a bit to understand everything that was going on in it. I think it could stand to use a few comments. I think it would be very easy to make a mistake in editing this function in the future without some added clarity, and I specified some of the places where I think it could stand to be clearer in my other comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very valid comment, I'm going to document this more

emitter.py Outdated
@@ -29,6 +29,11 @@
control_char_re = re.compile('[%s]' % re.escape(control_chars))


# Only enforced for the metrics API on our end, for now
MAX_COMPRESSED_SIZE = 2 << 20 # 2MB, the backend should accept up to 3MB but let's be conservative here
MAX_SPLIT_DEPTH = 3 # maximum depth of recursive calls to payload splitting function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you arrive at this number? It's cool if it's arbitrary, clearly from the test it can split a giant object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrary number yes

@olivielpeau
Copy link
Member Author

Thanks @gmmeyer for the review!

Just to explain the approach a bit (especially the arbitrary numbers that I chose)

In terms of why we need to have a recursive function:

  • if there was no compression involved, we could basically split the metrics_payload into serialized_payload_size/max_payload_size + 1 smaller payloads, and we'd be sure the resulting payloads would be smaller than the maximum size.
  • since there is compression involved though, there's no way to know how splitting the payloads will affect the compression ratio. So I use this kind of arbitrary split: compressed_payload_size/max_compressed_size + 1 + int(compression_ratio/2) to try to account for how the compression ratio could change in the smaller payloads. The idea being that smaller payloads might have a worse compression ratio, so we split into smaller payloads than what would be needed if there was no compression involved.
    Also, since I'm not sure this logic is enough to create payloads that are small enough, I've added recursive calls. The maximum depth of 3 should be more than enough even if the compression ratio changes wildly after each split, it's really just a safety if for some reason we can't create a payload that's small enough (for example because one metric is huge). A value of 2 would probably be enough and less risky.

@gmmeyer
Copy link
Contributor

gmmeyer commented Jul 27, 2017

Yea, I think the solution is good. 3 is a fine depth, there's no reason to use a smaller one, which would also likely result in more dropped payloads.

The approach makes sense and is really nice, once I understood what it was doing the rationale seemed pretty clear. It just took a little bit of parsing to get there.

@olivielpeau
Copy link
Member Author

Addressed your review @gmmeyer, thanks!

I also lowered the max call depth to 2, I think that if we go deeper in the recursive calls we may spend way too much time serializing and compressing all the smaller payloads.

@gmmeyer
Copy link
Contributor

gmmeyer commented Jul 28, 2017

@olivielpeau won't it not descend if it's already small enough? The change is fine, though!


for i in range(n_chunks):
# Create each chunk and make them go through this function recursively ; increment the `depth` of the recursive call
compressed_payloads.extend(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰

@olivielpeau olivielpeau added this to the 5.17 milestone Aug 18, 2017
@olivielpeau olivielpeau merged commit 7913e0a into master Aug 21, 2017
@olivielpeau olivielpeau deleted the olivielpeau/split-collector-metrics-payloads branch August 21, 2017 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants