influxdb: Concurrent PeriodicFlusher #2190

codebien · 2021-10-19T10:50:22Z

Restored the concurrency support for the InfluxDB output.
Small optimization to avoid the flush process with an empty SampleContainer slice.

output/helpers.go

inancgumus · 2021-10-19T12:11:45Z

On a side-note, can you also change the names of the file from helpers.go (vague), and put types to their related files periodic_flusher.go (for the PeriodicFlusher type and sample_buffer.go (for the SampleBuffer type).

output/influxdb/config.go

na-- · 2021-10-19T12:28:17Z

Regarding the file name, in general I agree with @inancgumus, but in this specific case, I think helpers.go may be the more practical choice. Consider the use case of someone new to k6 and its outputs that wants to write an extension.

If they look at https://pkg.go.dev/go.k6.io/k6/output, the file name doesn't matter, they mostly see the types, the filenames are at the very bottom (or if you click to see the implementation of something).

But if they open https://github.com/grafana/k6/tree/master/output, they see 3 very simple files that should be almost self-explanatory:

output/extensions.go - things to do with output extensions
output/types.go - some type definitions related to outputs
output/helpers.go - helpers to write outputs

Contrast this to something like output/periodic_flusher.go - nobody will know what this beast is and why it's needed until they actually look at the code

inancgumus · 2021-10-19T12:36:11Z

output/helpers.go - helpers to write outputs

Fair enough.

What about:

output/flusher.go or,
output/writer.go.

(or without an -er suffix to not to confuse it with an interface)

IMHO, it's practical, makes more sense, and explains what it provides.

na-- · 2021-10-19T12:53:30Z

What about:
* `output/flusher.go` or,

* `output/writer.go`.
(or without an -er suffix to not to confuse it with an interface)

IMHO, it's practical, makes more sense, and explains what it provides.

Yes, but that explanation only makes sense if someone is familiar with the outputs enough to know that these are actually helpers 😅 Otherwise, flusher and writer might give someone the impression that these are essential parts of the k6 outputs when they are, in fact, very much not essential. They are simple helpers that remove the boilerplate for handling commonly occurring tasks in various outputs. However, they are not required to implement an output (or output extension), you can write one perfectly fine without them, e.g. the cloud output doesn't use them.

inancgumus · 2021-10-19T13:04:25Z

Yes, but that explanation only makes sense if someone is familiar with the outputs enough to know that these are actually helpers 😅 Otherwise, flusher and writer might give someone the impression that these are essential parts of the k6 outputs when they are, in fact, very much not essential. They are simple helpers that remove the boilerplate for handling commonly occurring tasks in various outputs. However, they are not required to implement an output (or output extension), you can write one perfectly fine without them, e.g. the cloud output doesn't use them.

OK, then 😅 You know, when you call something a helper, you can put anything in it. There are no limits.

na-- · 2021-10-19T13:12:56Z

You know, when you call something a helper, you can put anything in it. There are no limits.

There are - code reviews 😄 output/helpers.go should contain only helpers that are useful for a big chunk of the outputs k6 has or might have.

As I said, in general, I completely agree with you, just not about this specific case 😅 And k6 definitely suffers from the problem you're worried about in a few places, most notably with lib (https://github.com/grafana/k6/tree/master/lib). We really need to refactor that one, since it's a nasty meaningless catch-all module that contains everything and the kitchen sink... 😞 And that mess has practical implications, #1302 is a huge PITA... 😞

inancgumus

I'm starting a request change, so we don't forget to add @codebien 's explanation here.

inancgumus

Nice work!

codebien · 2021-10-19T15:25:04Z

@na-- I also added the log for warning about long execution of the flush metrics function. It could be very noisy in some situations. Do you think is something acceptable from a UX perspective? An experienced user can detect the same thing by inspecting the debug logs with the verbose mode.

output/helpers.go

na-- · 2021-10-20T06:45:17Z

output/helpers.go

+			case <-ticker.C:
+				limiter <- struct{}{}


Hmm this may not be ideal 😞

It will work fine when the InfluxDB instance is able to keep up with the load, but if ever starts slowing down, the AsyncPeriodicFlusher will be stuck here, since all of the allowed goroutines will be started and limiter will be full. But the metrics data will keep coming to the output, so the next flushCallback() will have more data to flush and thus have to do more work and send a bigger chunk to InfluxDB. And so on and so forth, potentially taking more and more time to free up a slot in limiter and sending ever bigger chunks of data.

The way it was implemented before, we always started a new goroutine:

k6/stats/influxdb/collector.go

Lines 102 to 113 in 2193de0

for {

select {

case <-ticker.C:

c.wg.Add(1)

go c.commit()

case <-ctx.Done():

c.wg.Add(1)

go c.commit()

c.wg.Wait()

return

}

}

And we always got the currently buffered data in a local buffer in that goroutine before we potentially waited for the semaphore to have a free slot:

k6/stats/influxdb/collector.go

Lines 128 to 138 in 2193de0

func (c *Collector) commit() {

defer c.wg.Done()

c.bufferLock.Lock()

samples := c.buffer

c.buffer = nil

c.bufferLock.Unlock()

// let first get the data and then wait our turn

c.semaphoreCh <- struct{}{}

defer func() {

<-c.semaphoreCh

}()

We just waited to send the actual network requests until we had a free slot. So the chunk sizes were smaller, but we could potentially have a whole bunch of started goroutines, waiting, with big chunks of data in memory... But at least these chunks would have been roughly the same size, so InfluxDB likely wouldn't have choked on one.

Not sure which is better, and to be honest, given that InfluxDB v1 is dead, it probably doesn't really matter. But if we want to bring the old way back, we can't do it with a new helper, it needs to be done in the InfluxDB output itself. It should be easy to do, we can use the old PeriodicFlusher and just spin up the goroutine immediately at the start of the output's flushMetrics() method 🤷‍♂️

@mstoykov, @codebien, WDYT?

Yeah, it was done in this way to maintain the same logic with the current sync (before we decided to split). We have the same problem with the Stop procedure.

With the current code, I have these suggestions:

Add another select for skipping requests so we can avoid the stuck (not optimal for performances):

select { case limiter <- struct{}{}: go func() { .... } default: continue LOOP }

Add timeout for requests, in this way, we could guarantee that also Stop doesn't stuck forever.

and to be honest, given that InfluxDB v1 is dead, it probably doesn't really matter.

This should be used also from the InfluxDB v2

This was done so in influxdbv1 for a couple of reasons IIRC:

influxdbv1 has an upper limit to the size of the post, which means that just continuously increasing that is ... probably not great ;)

influxdbv1 does at least non zero amount of the ingest while it reads the request, which means it takes quite a while for big requests.

from my experiments pushing multiple requests was faster/better which was tried after I noticed there is CPU not used, but influxdb uses just 1 core (from a quick look at htop, so grains of salt and all that jazz).

All of this was done by me probably 2 years ago, as far as I remember within a day as a quick try to make the influxdbv1 output more ... stable as it previously was getting to writing 50mb+ requests which were taking upwards of 60s regardless of how you configure influxdb.

Whether the above is applicable for another output is a question I can't answer. The cloud output for example has parallel pushing as well but it's after it has things aggregated and as such can't be in the PeriodicFlusher(which it also doesn't use).

Now given that I broke the concurrent writes in the influxdbv1 a few months ago when I moved it to an output, it seems to me that it might not have worked all that much better as nobody has come to complain. Also if I have to add it again it will literally be to add go func() { and } around the parts that pushes after the PeriodicFlusher has flushed.

Arguably the additional change (that I decided I don't want to spend the time IIRC) is to split the samples so there aren't more than a certain (configurable) amount and push those concurrently if necessary. But this likely will be terrible for some other output that will ingest concurrently but we now push to it in multiple requests.

To be honest given the IMO ease of implementing whatever concurrency push logic on top of the current PeriodicFlusher and that IMO it will need to be different depending on the output, I am for scrapping this additional utility type and just writing the 4 lines in each output that needs it if they do and I still don't know if influxdbv2 will actually be helped by this 🤷 .

Additionally, arguably a lot of the problems is that if you have 1k VUs doing 5 iters/s you need to write 5k iteration sample a second ... which probably should all be combined in 1 sample, doing that will likely fix all of the problems ;)

Also AsyncPeriodicFlusher seems like the wrong name ... maybe ConcurrentPeriodicFlusher ?

Also if I have to add it again it will literally be to add go func() { and } around the parts that pushes after the PeriodicFlusher has flushed.

To be honest given the IMO ease of implementing whatever concurrency push logic on top of the current PeriodicFlusher and that IMO it will need to be different depending on the output, I am for scrapping this additional utility type and just writing the 4 lines in each output that needs it if they do and I still don't know if influxdbv2 will actually be helped by this

Yeah, you are probably right, this new helper seems to be more harm than help after all 😞 Restoring the original async logic, with smaller (even if not constant) chunk sizes, back in the influxdb code and using the old PeriodicFlusher seems to be the best way to go here...

using the old PeriodicFlusher seems to be the best way to go here...

@na-- to be sure, are we saying we want to remove the PeriodicFlusher and implement the ticker with the concurrent flush directly in the influxdb output, right?

I think that we should:

remove (i.e. not add) NewAsyncPeriodicFlusher

leave PeriodicFlusher how it used to be

also don't change how the old PeriodicFlusher was used in the influxdb output before this PR

leave the semaphore code in there as well, but do everything in the flushMetrics() method after the samples := o.GetBufferedSamples() in a new goroutine (with an added waitgroup to ensure we flush everything before Stop() ends):

k6/output/influxdb/output.go

Lines 203 to 208 in aab12d5

samples := o.GetBufferedSamples()

o.semaphoreCh <- struct{}{}

defer func() {

<-o.semaphoreCh

}()

I think this would mimic the old InfluxDB behavior, right?

I agree with @na-- that this should happen for the old influxdb output, whether this should be a priority is a different question, but it should be fairly straightforward.

For the new one(influxdbv2), I am still interested in a real-world test with and without the upstream influxdb library's async writer instead of the bad benchmark I have written. I would expect that influxdbv2 will handle ingestion better than the old one and maybe even use multiple cores/goroutines for the ingestion without needing to send multiple requests 🤞

I am still interested in a real-world test with and without the upstream influxdb library's async writer instead of the bad benchmark I have written

an attempt about it can be found in the influxv2 PR grafana/xk6-output-influxdb#2

yorugac · 2021-10-20T08:36:30Z

Just a suggestion: once the logic of spawning in NewAsyncPeriodicFlusher is finalized, it might make sense to add the test checking that AsyncPeriodicFlusher cannot spawn more than concurrency goroutines.

yorugac

Turning the above ^^ into a "request change"

na--

LGTM, and sorry for the whole whole confusion with me suggesting the async helper 😞

mstoykov

LGTM, I think we might be (in general) writing too many debug messages in outputs, but this is fine and unrelated to the problem at hand so you can ignore it :)

mstoykov · 2021-10-21T08:28:30Z

output/influxdb/output.go

 	return nil
 }

 func (o *Output) flushMetrics() {
 	samples := o.GetBufferedSamples()
+	if len(samples) < 1 {
+		o.logger.Debug("Any buffered samples, skipping the flush operation")


No need for a message IMO, this will just catch cases where someone is running very light script that has really big response times

codebien · 2021-10-21T10:24:44Z

@yorugac added the test for asserting concurrency and rate limiting

yorugac

Actually, I meant that test for new periodic flusher type -- now it's specific for InfluxDB output... 🤔 But that's a good check to have anyway 🙂 LGTM 👍

codebien self-assigned this Oct 19, 2021

github-actions bot requested review from inancgumus and mstoykov October 19, 2021 10:50

na-- requested changes Oct 19, 2021

View reviewed changes

output/helpers.go Outdated Show resolved Hide resolved

inancgumus reviewed Oct 19, 2021

View reviewed changes

output/influxdb/config.go Show resolved Hide resolved

mstoykov added this to the v0.35.0 milestone Oct 19, 2021

codebien force-pushed the 2185-flusher-wg branch from 4615d9b to e2ecd79 Compare October 19, 2021 14:33

inancgumus requested changes Oct 19, 2021

View reviewed changes

inancgumus previously approved these changes Oct 19, 2021

View reviewed changes

codebien dismissed inancgumus’s stale review via a173d72 October 19, 2021 15:20

codebien force-pushed the 2185-flusher-wg branch from fac5d8b to a173d72 Compare October 19, 2021 15:20

codebien requested review from na-- and inancgumus October 19, 2021 15:33

na-- reviewed Oct 19, 2021

View reviewed changes

output/helpers.go Outdated Show resolved Hide resolved

na-- reviewed Oct 19, 2021

View reviewed changes

output/helpers.go Outdated Show resolved Hide resolved

codebien force-pushed the 2185-flusher-wg branch from b3a579e to 6df9b57 Compare October 19, 2021 15:55

codebien requested a review from na-- October 19, 2021 16:14

na-- reviewed Oct 20, 2021

View reviewed changes

output/helpers.go Outdated Show resolved Hide resolved

codebien force-pushed the 2185-flusher-wg branch from 6df9b57 to 46f68c8 Compare October 20, 2021 06:41

na-- reviewed Oct 20, 2021

View reviewed changes

yorugac requested changes Oct 20, 2021

View reviewed changes

codebien force-pushed the 2185-flusher-wg branch from 46f68c8 to b4afa19 Compare October 20, 2021 16:13

na-- previously approved these changes Oct 21, 2021

View reviewed changes

mstoykov previously approved these changes Oct 21, 2021

View reviewed changes

yorugac previously approved these changes Oct 21, 2021

View reviewed changes

output/influxdb: support concurrent flush metrics

dd6cef2

codebien dismissed stale reviews from yorugac, mstoykov, and na-- via dd6cef2 October 21, 2021 10:22

codebien force-pushed the 2185-flusher-wg branch from b4afa19 to dd6cef2 Compare October 21, 2021 10:22

codebien requested review from mstoykov, yorugac and na-- October 21, 2021 10:24

codebien changed the title ~~Async PeriodicFlusher~~ influxdb: Concurrent PeriodicFlusher Oct 21, 2021

na-- approved these changes Oct 21, 2021

View reviewed changes

yorugac approved these changes Oct 21, 2021

View reviewed changes

codebien merged commit 65322ac into master Oct 21, 2021

codebien deleted the 2185-flusher-wg branch October 21, 2021 13:01

mstoykov approved these changes Oct 21, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

influxdb: Concurrent PeriodicFlusher #2190

influxdb: Concurrent PeriodicFlusher #2190

codebien commented Oct 19, 2021 •

edited

Loading

inancgumus commented Oct 19, 2021

na-- commented Oct 19, 2021

inancgumus commented Oct 19, 2021 •

edited

Loading

na-- commented Oct 19, 2021 •

edited

Loading

inancgumus commented Oct 19, 2021

na-- commented Oct 19, 2021

inancgumus left a comment

inancgumus left a comment

codebien commented Oct 19, 2021

na-- Oct 20, 2021 •

edited

Loading

codebien Oct 20, 2021 •

edited

Loading

codebien Oct 20, 2021 •

edited

Loading

mstoykov Oct 20, 2021 •

edited

Loading

na-- Oct 20, 2021

codebien Oct 20, 2021

na-- Oct 20, 2021 •

edited

Loading

mstoykov Oct 20, 2021

codebien Oct 20, 2021

yorugac commented Oct 20, 2021

yorugac left a comment

na-- left a comment

mstoykov left a comment

mstoykov Oct 21, 2021

codebien commented Oct 21, 2021

yorugac left a comment

	for {
	select {
	case <-ticker.C:
	c.wg.Add(1)
	go c.commit()
	case <-ctx.Done():
	c.wg.Add(1)
	go c.commit()
	c.wg.Wait()
	return
	}
	}

	func (c *Collector) commit() {
	defer c.wg.Done()
	c.bufferLock.Lock()
	samples := c.buffer
	c.buffer = nil
	c.bufferLock.Unlock()
	// let first get the data and then wait our turn
	c.semaphoreCh <- struct{}{}
	defer func() {
	<-c.semaphoreCh
	}()

	samples := o.GetBufferedSamples()

	o.semaphoreCh <- struct{}{}
	defer func() {
	<-o.semaphoreCh
	}()

influxdb: Concurrent PeriodicFlusher #2190

influxdb: Concurrent PeriodicFlusher #2190

Conversation

codebien commented Oct 19, 2021 • edited Loading

inancgumus commented Oct 19, 2021

na-- commented Oct 19, 2021

inancgumus commented Oct 19, 2021 • edited Loading

na-- commented Oct 19, 2021 • edited Loading

inancgumus commented Oct 19, 2021

na-- commented Oct 19, 2021

inancgumus left a comment

Choose a reason for hiding this comment

inancgumus left a comment

Choose a reason for hiding this comment

codebien commented Oct 19, 2021

na-- Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

codebien Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

codebien Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

mstoykov Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

na-- Oct 20, 2021

Choose a reason for hiding this comment

codebien Oct 20, 2021

Choose a reason for hiding this comment

na-- Oct 20, 2021 • edited Loading

Choose a reason for hiding this comment

mstoykov Oct 20, 2021

Choose a reason for hiding this comment

codebien Oct 20, 2021

Choose a reason for hiding this comment

yorugac commented Oct 20, 2021

yorugac left a comment

Choose a reason for hiding this comment

na-- left a comment

Choose a reason for hiding this comment

mstoykov left a comment

Choose a reason for hiding this comment

mstoykov Oct 21, 2021

Choose a reason for hiding this comment

codebien commented Oct 21, 2021

yorugac left a comment

Choose a reason for hiding this comment

codebien commented Oct 19, 2021 •

edited

Loading

inancgumus commented Oct 19, 2021 •

edited

Loading

na-- commented Oct 19, 2021 •

edited

Loading

na-- Oct 20, 2021 •

edited

Loading

codebien Oct 20, 2021 •

edited

Loading

codebien Oct 20, 2021 •

edited

Loading

mstoykov Oct 20, 2021 •

edited

Loading

na-- Oct 20, 2021 •

edited

Loading