-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.12.0] Too much write OPS #6058
Comments
Are you batching your points? When a write comes in to Influx, the points in the write are stored in both the in-memory TSM cache (for query performance) and the on-disk WAL (for permanent storage, to be eventually snapshotted and compacted into TSM files). The WAL is grouped by retention policy and database, both of which are fixed per batch; therefore one POST to /write or one OpenTSDB write action should result in approximately one write to the filesystem, regardless of whether your batch was 1 point or 1000 points. Whether the filesystem is flushed to disk immediately or batched up later is dependent on your filesystem, operating system, and disk controllers – all things outside the control of InfluxDB. |
@mark-rushakoff one question about the subscriber . |
@simnv something seems weird about your disk activity. Is that a network drive? HDD or SSD? I'm not aware of other reports of that kind of heavy disk activity so I'm inclined to believe something is unusual about your setup. @earthnut it doesn't appear that there's any guarantee about the order of points sent to subscribers vs. flushed to shards: https://github.com/influxdata/influxdb/blob/d024ca2/cluster/points_writer.go#L207-L230 |
Now, after upgrading to 0.12, I have not only high wal write ops, but also a high write ops on the fs with data directory. |
Attached strace to influxdb process. Most iops are for wal files (3k+ write iops, understandable) and for meta/meta.dbtmp file (2k+ write iops, each time for 4754 bytes). Last part is strange for me. |
Dug a little deeper. Every operation leads to Made several diffs of this Moved WAL and So, in the end, data is flushed on every batch of points received. And it is flushed not only in WAL, but it also fires metadata update and flush. Can someone explain to me, what's the point of that? Why make constant writes on disk damaging it in the process instead of making syncs in sane intervals of time like once every seconds, for example? Why should I use terrible crutches of tmpfs and rsync to make those updates less frequent? I hope I just didn't configured influxdb right. But judging from the sources, it works like that. |
Updated to 0.12.1, problem has gone. Storing 9k+ metrics per second I have only 25 write operations per second to wal partition and occasional writes to data partition. Thanks! |
@simnv awesome, thanks for the update! |
I have a test setup using opentsdb input to collect data from several bosun scollectors feeding little above 8k metrics per second. I have two continuous queries, one running every ten minutes, other running every hour, both just storing mean values into other retentions.
My problem is that influx constantly makes write operations on disk. About 4k ops. It doesn't consume much memory, but those writes are annoying: why does an application that I use for monitoring one virtual platform is the most resource-consuming application on that platform?
Tried playing with WAL limits, tried setting ready two, then ten times higher, tried other parameters, no effect. Write is always happening.
For now I just placed WAL on tmpfs, and it works quite normal: almost no visible disk activity, data is just stored normally. Making rsync every minute to store it on a disk for persisting across reboots, don't know if it's a good practice though.
Read ops are positive, write ops are negative values.
Is this a normal behavior for influx? How can I tune influx to make those write operations to occur more rarely?
The text was updated successfully, but these errors were encountered: