Better handling of buffered writes needed #113

sjmudd · 2017-03-24T07:54:57Z

This is applicable when: BufferInstanceWrites == true.

I recently added some counters to monitor the number of time InstancePollSeconds gets exceeded during discovery. The number seen should normally be quite low but I've seen that on a busy orchestrator server, especially when talking to a orchestrator backend in a different datacentre that the number of times this happens can jump significantly.

Consequently better management and monitoring of this is needed.

Thoughts involve:

ensuring that the configuration parameters used are dynamically configurable via SIGHUP calls and thus do not require orchestrator to be restarted. This affects the 2 variables: InstanceFlushIntervalMilliseconds and InstanceWriteBufferSize.
adding extra monitoring of the time taken for flushInstanceWriteBuffer to run. A single metric every minute is useless so I need to collect metrics and then be able to provide aggregate data and percentile timings in a similar way to how the discovery timings are handled.
parallelising this function to run against the backend orchestrator server a number of times. (completely serialising this even though the writes are batched is not fully efficient but we should ensure that writes for the same instance are never done through different connections at the same time)

With these changes it should be easier to see where the bottleneck is and to be able to adjust the configuration "dynamically" to ensure the required performance is achieved.

The text was updated successfully, but these errors were encountered:

sjmudd · 2017-03-24T08:07:10Z

Below are some graphs from a test cluster.

The two graphs above show the issue seen, together with a normal situation. Changing the orchestrator configuration to talk to a local orchestrator backend resolves the problem but any orchestrator server in the cluster should be able to write properly to the backend.

sjmudd · 2017-03-24T08:09:04Z

The solution is not yet fully clear but dynamic adjustment of the parameters will make it much easier to monitor the effect of the changes without restarting the active node to see which settings are better.

sjmudd added the enhancement label Mar 24, 2017

This was referenced Nov 16, 2018

Instance write buffer metrics #676

Closed

Instance write buffer metrics v2 #682

Closed

luisyonaldo mentioned this issue Aug 19, 2019

Write buffer metrics #960

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of buffered writes needed #113

Better handling of buffered writes needed #113

sjmudd commented Mar 24, 2017

sjmudd commented Mar 24, 2017

sjmudd commented Mar 24, 2017

Better handling of buffered writes needed #113

Better handling of buffered writes needed #113

Comments

sjmudd commented Mar 24, 2017

sjmudd commented Mar 24, 2017

sjmudd commented Mar 24, 2017