Configurable max event size for webhook updates #94

alexanderdean · 2016-12-27T23:17:46Z

Waiting for webhook to finish sending events... done! Warning: some events failed to send

I think it's related to overly-large payloads (too much stdout/err). We should be truncating this farely aggressively...

The text was updated successfully, but these errors were encountered:

alexanderdean · 2017-01-04T18:34:07Z

@ihortom has some examples of pipelines that always successfully emit the "job started", but never the "job failed". Excessive stdout/stderr in-between is the most likely culprit methinks.

Are we successfully truncating the stdio to a safe limit?

ninjabear · 2017-01-09T13:11:44Z

We're truncating it to 10kb for both (#87). Factotum will stop sending updates if it retrieves three non-200 responses (with a random backoff inbetween). I will reduce the limit further, 1kb is probably sufficient (~2kb total) unless there's an official maximum I can use (obviously preferable). Information on the responses should be available in the log file (factotum.log)

alexanderdean · 2017-01-09T13:30:55Z

10kb should be totally fine - I think there must be another bug at work. How sure are you that the 10kb is being enforced?

ninjabear · 2017-01-09T13:43:15Z

Sorry, bit misleading wording there it's 10kb for each stdout/stderr + the rest of the payload. So a message can end up at around ~20kb if both are full. I must have tested this with a collector at some point, but I will double check to be sure.

I'm fairly confident that it's being enforced as it's covered by these tests but I'll spend some time looking into it to make sure this is the case.

alexanderdean · 2017-01-09T14:07:32Z

Hey @ninjabear -

Factotum will stop sending updates if it retrieves three non-200 responses (with a random backoff inbetween).

Aha - this is consistent with what I'm seeing, where Factotum takes a couple of minutes to exit even after all child processes are complete, and then fails to send the webhook. That will be the retries.

ninjabear · 2017-01-27T12:48:05Z

That makes sense. We can be sure that this is the case by checking the log file - this should contain the string Failed to send webhook update to followed by a HTTP response code. The log file is in ./.factotum/factotum.log

alexanderdean · 2017-01-27T13:47:04Z

Bringing forwards as this impacts on the front-end's utility

ninjabear · 2017-03-16T12:14:26Z

We should probably give a better summary of which updates were failed to be transmitted and their reason too.

Something like this:

Failed to send XX update event to 255.255.255.255:8080 at <time>. The server responded with X.

ninjabear · 2017-03-27T15:42:21Z

I've been thinking about this a bit more, and I'm not sure our current fix will resolve this problem completely.

The thing is, the payload will always be arbitrarily large because we attach things like the factfile base64 encoded. If I had a max 5k payload then I can only send 3k (minus the constant bits) of stdout/stderr. This number changes as sizes of other things change.

I've also worked out why later events fail, and that is because the stdout/stderr for every task is included[1] with each event. This means as a job is processed, the event size steadily increases until we hit the max event size.

It's designed like this because each update contains all information about a job in a stateless way. From any single update you can work out the full state of the job. This hugely simplifies grokking and storage later (and is the reason the UI can work as it does).

I think we need to know what exactly the limit of an event size is, and if we can be 100% sure that's what's going on. This limit could be enforced in a number of places, starting at the load balancer and running through the http framework each collector uses right into the Snowplow code that handles requests. This means testing with Snowplow Mini is out unfortunately.

@ungn as a side note we need to update iglu central on the max-sizes of these events. Don't forget there's also heartbeats and a couple of other tickets in this milestone.

/cc @jbeemster / @alexanderdean if you have any ideas about the absolute max event size (perhaps this is defined I just don't know about it)

[1] : https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.factotum/task_update/jsonschema/1-0-0#L126-L133 / https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.factotum/job_update/jsonschema/1-0-0#L116-L123

alexanderdean · 2017-03-27T17:29:25Z

@ninjabear - this all makes total sense. A few thoughts:

The abs max event size is unfortunately undefined in Snowplow. I agree this ought to change
We ought to make the abs max event size configurable in Factotum as we can't 100% assume that the events will only be sent to Snowplow
Can we explore de facto max event sizes using Avalanche @jbeemster ?
If we can define the abs max event size, can we use this to degrade the webhook payload gracefully? I.e. we come up with two or three tiers of information by priority, and start truncating the lowest priority till we come under the limit?

ninjabear · 2017-07-28T13:07:57Z

We have confirmed the problem here was downstream; we believe Factotum sends all webhooks correctly.

As such I'm going to remove the bug label. We should however revisit the idea of the maximum size events Factotum should send on a lower priority - I'll leave this ticket open for that.

jbeemster · 2018-03-27T15:32:28Z

Updating this issue with our latest thinking on this issue - the maximum size of an individual event should really only be limited by the maximum record size that can be handled downstream. Stream Collectors can handle up-to 1MB payloads into Kinesis which is vastly more than the current limit.

Further to this the actual truncation of the output can be quite detrimental to the usage of these webhooks events in processing downstream.

In short I think we really just need to make this fully configurable so that we can start to figure out what the maximum webhook sizes really are that can be handled and to start recommending and using that size - we shouldn't be truncating without cause!

In reality the abs limits should be:

Kinesis: 1MB per event/record
Elasticsearch: Handles records up to hundreds of megabytes in size - no issue here
Redshift: Records will be truncated on insert if they exceeded the row definition

Not an error as such but could reduce the usefulness of stored records...

alexanderdean added the bug label Dec 27, 2016

alexanderdean added this to the Version 0.5.0 milestone Dec 27, 2016

alexanderdean assigned ninjabear Dec 27, 2016

ninjabear mentioned this issue Jan 27, 2017

Add job heartbeat as webhook event #99

Open

alexanderdean modified the milestones: Version 0.5.0, Version 0.6.0 Jan 27, 2017

ninjabear changed the title ~~Factotum webhooks not always firing?~~ Factotum webhook payload is too large Mar 16, 2017

ungn added a commit that referenced this issue Mar 16, 2017

Factotum webhook payload is too large (closes #94)

af443fa

ungn mentioned this issue Jun 5, 2017

Release/0.5.0 #102

Closed

16 tasks

ninjabear removed the bug label Jul 28, 2017

ninjabear changed the title ~~Factotum webhook payload is too large~~ Configurable max event size for webhook updates Jul 28, 2017

jbeemster closed this as completed in d84da81 Jun 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable max event size for webhook updates #94

Configurable max event size for webhook updates #94

alexanderdean commented Dec 27, 2016

alexanderdean commented Jan 4, 2017

ninjabear commented Jan 9, 2017

alexanderdean commented Jan 9, 2017

ninjabear commented Jan 9, 2017

alexanderdean commented Jan 9, 2017

ninjabear commented Jan 27, 2017 •

edited

Loading

alexanderdean commented Jan 27, 2017

ninjabear commented Mar 16, 2017 •

edited

Loading

ninjabear commented Mar 27, 2017

alexanderdean commented Mar 27, 2017

ninjabear commented Jul 28, 2017

jbeemster commented Mar 27, 2018

Configurable max event size for webhook updates #94

Configurable max event size for webhook updates #94

Comments

alexanderdean commented Dec 27, 2016

alexanderdean commented Jan 4, 2017

ninjabear commented Jan 9, 2017

alexanderdean commented Jan 9, 2017

ninjabear commented Jan 9, 2017

alexanderdean commented Jan 9, 2017

ninjabear commented Jan 27, 2017 • edited Loading

alexanderdean commented Jan 27, 2017

ninjabear commented Mar 16, 2017 • edited Loading

ninjabear commented Mar 27, 2017

alexanderdean commented Mar 27, 2017

ninjabear commented Jul 28, 2017

jbeemster commented Mar 27, 2018

ninjabear commented Jan 27, 2017 •

edited

Loading

ninjabear commented Mar 16, 2017 •

edited

Loading