-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Graphite Disk Usage Calculation #261
Comments
@r0h4n @brainfunked @shtripat Please review |
@ltrilety @r0h4n @nthomas-redhat @Tendrl/qe @jjkabrown1 Have we done any Graphite disk usage estimates for following sizes (I tried to come up with 3 "sizes" based on some typical deployments): Small
Medium
Large
In Tendrl/commons#819 @ltrilety mentioned "1 day of metrics for 6 gluster servers [Small] takes about 10G." |
@julienlim, the formula for the calculation is already provided in #261 (comment), which is as below:
Size may vary depends on the no disks, lvms etc. So I would reccomend to calculate this based on per deployment basis. What you think? |
@nthomas-redhat @julienlim @shtripat it's great we have a formula for 180 days period, but from what I see we should simplify it as it's not easy to read. We could use deployments in #261 (comment) and provide some numbers. |
@nthomas-redhat @shtripat @ltrilety @jjkabrown1 It's good to have a formula for the 180 day period, but we'll need to adjust according to the retention policies. That being said, this formula is too cumbersome for someone to calculate. We need to provide an easy-to-use calculator (think Ceph's pgcalc-like or some kind of spreadsheet), where user can input some numbers (e.g. # nodes in cluster, # clusters, # volumes, #bricks, and how long to retain data), and it provides an estimate. @ltrilety As to the deployment sizes, I took a first stab at try to come up with something, and it does need further discussion and tweaking. Suggestions? |
@cloudbehl, @r0h4n , let us sync up and put together guidelines for possible standard configurations |
@nthomas-redhat ack! |
@nthomas-redhat Please provide change in disk size requirements for Graphite for below scenarios
|
A note for the assessment. Don't forget that un-manage screws the counting as it takes all data from graphite and saved them on
|
For standard cluster sizes please see below: Small Configuration Up to 8 Nodes Recommendation: Medium Configuration 9 - 16 Nodes Recommendation Large Configuration Recommendation |
Graphite Disk Usage Calculation
Whisper storage utilization
Per data point: 12 bytes
Per metric: 12 * no of data points
so for 60s:180d retention (60 * 24 * 180 data points) * 12 bytes = 3110400 (~ 2.97 MB)
or for 10s:180d retention (6 * 60 * 24 * 180 data points) * 12 bytes = 18662400 (~ 17.8 MB)
The calculations below are based on Tendrl’s default storage retention policy of all metrics consisting of data points at 60 seconds interval being stored for 180 days.
There are currently two trees to enable grafana navigation:
Cluster -> Volume -> Node -> Brick -> Block Device
Cluster -> Node -> Brick -> Block Device
Cluster -> Volume -> Node -> Brick -> Block Device
This tree contains all the cluster specific information for Volumes, Nodes, Bricks and Block Devices. This tree does NOT contain Node specific information. Nodes contain information only as relates to the cluster, such as rebalance information.
Block Device
Size on disk: 37325481 (~36 MB)
Structure:
├── disk_octets
│ ├── read.wsp
│ └── write.wsp
├── disk_ops
│ ├── read.wsp
│ └── write.wsp
├── disk_time
│ ├── read.wsp
│ └── write.wsp
├── mount_utilization
│ ├── percent_used.wsp
│ ├── total.wsp
│ └── used.wsp
└── utilization
├── percent_used.wsp
├── total.wsp
└── used.wsp
Brick
Size on disk: 102648857 (~98MB per brick) + (no of devices * (36 MB per devices))
Structure:
├── connections_count.wsp
├── device
│ └── vda
│ ├── disk_octets
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_ops
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_time
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── mount_utilization
│ │ ├── percent_used.wsp
│ │ ├── total.wsp
│ │ └── used.wsp
│ └── utilization
│ ├── percent_used.wsp
│ ├── total.wsp
│ └── used.wsp
├── entry_ops.wsp
├── fop
│ ├── GETXATTR
│ │ ├── hits.wsp
│ │ ├── latencyAvg.wsp
│ │ ├── latencyMax.wsp
│ │ └── latencyMin.wsp
│ ├── LOOKUP
│ │ ├── hits.wsp
│ │ ├── latencyAvg.wsp
│ │ ├── latencyMax.wsp
│ │ └── latencyMin.wsp
│ ├── OPENDIR
│ │ ├── hits.wsp
│ │ ├── latencyAvg.wsp
│ │ ├── latencyMax.wsp
│ │ └── latencyMin.wsp
│ └── READDIR
│ ├── hits.wsp
│ ├── latencyAvg.wsp
│ ├── latencyMax.wsp
│ └── latencyMin.wsp
├── healed_cnt.wsp
├── heal_failed_cnt.wsp
├── inode_ops.wsp
├── inode_utilization
│ ├── gauge-total.wsp
│ ├── gauge-used.wsp
│ └── percent-percent_bytes.wsp
├── iops
│ ├── gauge-read.wsp
│ └── gauge-write.wsp
├── lock_ops.wsp
├── read_write_ops.wsp
├── split_brain_cnt.wsp
├── status.wsp
└── utilization
├── gauge-total.wsp
├── gauge-used.wsp
└── percent-percent_bytes.wsp
Node
Size on disk: 12441712 (~12 MB per host) + (no of bricks * (98MB per brick)) + (no of devices * (36 MB per device))
Structure:
├── bricks
│ └── |root|gluster_bricks|vol1_b2
│ ├── connections_count.wsp
│ ├── device
│ │ └── vda
│ │ ├── disk_octets
│ │ │ ├── read.wsp
│ │ │ └── write.wsp
│ │ ├── disk_ops
│ │ │ ├── read.wsp
│ │ │ └── write.wsp
│ │ ├── disk_time
│ │ │ ├── read.wsp
│ │ │ └── write.wsp
│ │ ├── mount_utilization
│ │ │ ├── percent_used.wsp
│ │ │ ├── total.wsp
│ │ │ └── used.wsp
│ │ └── utilization
│ │ ├── percent_used.wsp
│ │ ├── total.wsp
│ │ └── used.wsp
│ ├── inode_utilization
│ │ ├── gauge-total.wsp
│ │ ├── gauge-used.wsp
│ │ └── percent-percent_bytes.wsp
│ ├── status.wsp
│ └── utilization
│ ├── gauge-total.wsp
│ ├── gauge-used.wsp
│ └── percent-percent_bytes.wsp
├── rebalance_bytes.wsp
├── rebalance_failures.wsp
├── rebalance_files.wsp
└── rebalance_skipped.wsp
Volume
Size on disk: 46656545 (~44.5 MB per volume) + (no of hosts * (12 MB per host)) + (no of bricks * (98MB per brick)) + (no of devices * (36 MB per device))
Structure:
├── brick_count
│ ├── down.wsp
│ ├── total.wsp
│ └── up.wsp
├── geo_rep_session
│ ├── down.wsp
│ ├── partial.wsp
│ ├── total.wsp
│ └── up.wsp
├── nodes
│ ├── dhcp43-54_lab_eng_blr_redhat_com
│ │ ├── bricks
│ │ │ └── |root|gluster_bricks|vol1_b2
│ │ │ ├── connections_count.wsp
│ │ │ ├── device
│ │ │ │ └── vda
│ │ │ │ ├── disk_octets
│ │ │ │ │ ├── read.wsp
│ │ │ │ │ └── write.wsp
│ │ │ │ ├── disk_ops
│ │ │ │ │ ├── read.wsp
│ │ │ │ │ └── write.wsp
│ │ │ │ ├── disk_time
│ │ │ │ │ ├── read.wsp
│ │ │ │ │ └── write.wsp
│ │ │ │ ├── mount_utilization
│ │ │ │ │ ├── percent_used.wsp
│ │ │ │ │ ├── total.wsp
│ │ │ │ │ └── used.wsp
│ │ │ │ └── utilization
│ │ │ │ ├── percent_used.wsp
│ │ │ │ ├── total.wsp
│ │ │ │ └── used.wsp
│ │ │ ├── inode_utilization
│ │ │ │ ├── gauge-total.wsp
│ │ │ │ ├── gauge-used.wsp
│ │ │ │ └── percent-percent_bytes.wsp
│ │ │ ├── status.wsp
│ │ │ └── utilization
│ │ │ ├── gauge-total.wsp
│ │ │ ├── gauge-used.wsp
│ │ │ └── percent-percent_bytes.wsp
│ │ ├── rebalance_bytes.wsp
│ │ ├── rebalance_failures.wsp
│ │ ├── rebalance_files.wsp
│ │ └── rebalance_skipped.wsp
│ └── dhcp43-83_lab_eng_blr_redhat_com
│ ├── rebalance_bytes.wsp
│ ├── rebalance_failures.wsp
│ ├── rebalance_files.wsp
│ └── rebalance_skipped.wsp
├── pcnt_used.wsp
├── rebal_status.wsp
├── snap_count.wsp
├── state.wsp
├── status.wsp
├── subvol_count.wsp
├── usable_capacity.wsp
└── used_capacity.wsp
Cluster -> Node -> Brick -> Block Device
This tree contains all the cluster specific information for Nodes, Bricks and Block Devices.
Block Device
Size on disk: 37325481 (~36 MB)
Structure:
├── disk_octets
│ ├── read.wsp
│ └── write.wsp
├── disk_ops
│ ├── read.wsp
│ └── write.wsp
├── disk_time
│ ├── read.wsp
│ └── write.wsp
├── mount_utilization
│ ├── percent_used.wsp
│ ├── total.wsp
│ └── used.wsp
└── utilization
├── percent_used.wsp
├── total.wsp
└── used.wsp
Brick - Without file operations
Size on disk: 40435965 (~39 MB per brick) + (no of devices * (36 MB per devices))
├── device
│ └── vda
│ ├── disk_octets
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_ops
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_time
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── mount_utilization
│ │ ├── percent_used.wsp
│ │ ├── total.wsp
│ │ └── used.wsp
│ └── utilization
│ ├── percent_used.wsp
│ ├── total.wsp
│ └── used.wsp
├── entry_ops.wsp
├── inode_ops.wsp
├── inode_utilization
│ ├── gauge-total.wsp
│ ├── gauge-used.wsp
│ └── percent-percent_bytes.wsp
├── iops
│ ├── gauge-read.wsp
│ └── gauge-write.wsp
├── lock_ops.wsp
├── read_write_ops.wsp
├── status.wsp
└── utilization
├── gauge-total.wsp
├── gauge-used.wsp
└── percent-percent_bytes.wsp
With File operations
Size on disk: 90203242 (~86MB per brick) + (no of devices * (36 MB per devices))
├── device
│ └── vda
│ ├── disk_octets
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_ops
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_time
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── mount_utilization
│ │ ├── percent_used.wsp
│ │ ├── total.wsp
│ │ └── used.wsp
│ └── utilization
│ ├── percent_used.wsp
│ ├── total.wsp
│ └── used.wsp
├── entry_ops.wsp
├── fop
│ ├── GETXATTR
│ │ ├── hits.wsp
│ │ ├── latencyAvg.wsp
│ │ ├── latencyMax.wsp
│ │ └── latencyMin.wsp
│ ├── LOOKUP
│ │ ├── hits.wsp
│ │ ├── latencyAvg.wsp
│ │ ├── latencyMax.wsp
│ │ └── latencyMin.wsp
│ ├── OPENDIR
│ │ ├── hits.wsp
│ │ ├── latencyAvg.wsp
│ │ ├── latencyMax.wsp
│ │ └── latencyMin.wsp
│ └── READDIR
│ ├── hits.wsp
│ ├── latencyAvg.wsp
│ ├── latencyMax.wsp
│ └── latencyMin.wsp
├── inode_ops.wsp
├── inode_utilization
│ ├── gauge-total.wsp
│ ├── gauge-used.wsp
│ └── percent-percent_bytes.wsp
├── iops
│ ├── gauge-read.wsp
│ └── gauge-write.wsp
├── lock_ops.wsp
├── read_write_ops.wsp
├── status.wsp
└── utilization
├── gauge-total.wsp
├── gauge-used.wsp
└── percent-percent_bytes.wsp
Node
Size on disk: 401282895 (~382 MB per host) + (no of LVM disk * (24 MB per disk)) +(no of virtual disk * (30 MB per disk)) + (no of bricks * (86 MB per brick)) + (no of devices * (36 MB per device))
.
├── aggregation-memory-sum
│ └── memory.wsp
├── aggregation-swap-sum
│ └── swap.wsp
├── brick_count
│ ├── down.wsp
│ ├── total.wsp
│ └── up.wsp
├── bricks
│ ├── |root|bricks|v1
│ │ ├── device
│ │ │ └── vda
│ │ │ ├── disk_octets
│ │ │ │ ├── read.wsp
│ │ │ │ └── write.wsp
│ │ │ ├── disk_ops
│ │ │ │ ├── read.wsp
│ │ │ │ └── write.wsp
│ │ │ ├── disk_time
│ │ │ │ ├── read.wsp
│ │ │ │ └── write.wsp
│ │ │ ├── mount_utilization
│ │ │ │ ├── percent_used.wsp
│ │ │ │ ├── total.wsp
│ │ │ │ └── used.wsp
│ │ │ └── utilization
│ │ │ ├── percent_used.wsp
│ │ │ ├── total.wsp
│ │ │ └── used.wsp
│ │ ├── entry_ops.wsp
│ │ ├── inode_ops.wsp
│ │ ├── inode_utilization
│ │ │ ├── gauge-total.wsp
│ │ │ ├── gauge-used.wsp
│ │ │ └── percent-percent_bytes.wsp
│ │ ├── iops
│ │ │ ├── gauge-read.wsp
│ │ │ └── gauge-write.wsp
│ │ ├── lock_ops.wsp
│ │ ├── read_write_ops.wsp
│ │ ├── status.wsp
│ │ └── utilization
│ │ ├── gauge-total.wsp
│ │ ├── gauge-used.wsp
│ │ └── percent-percent_bytes.wsp
│ ├── cpu
│ ├── percent-idle.wsp
│ ├── percent-interrupt.wsp
│ ├── percent-nice.wsp
│ ├── percent-softirq.wsp
│ ├── percent-steal.wsp
│ ├── percent-system.wsp
│ ├── percent-user.wsp
│ └── percent-wait.wsp
├── df-boot
│ ├── df_complex-free.wsp
│ ├── df_complex-reserved.wsp
│ ├── df_complex-used.wsp
│ ├── df_inodes-free.wsp
│ ├── df_inodes-reserved.wsp
│ ├── df_inodes-used.wsp
│ ├── percent_bytes-free.wsp
│ ├── percent_bytes-reserved.wsp
│ ├── percent_bytes-used.wsp
│ ├── percent_inodes-free.wsp
│ ├── percent_inodes-reserved.wsp
│ └── percent_inodes-used.wsp
├── df-dev
│ ├── df_complex-free.wsp
│ ├── df_complex-reserved.wsp
│ ├── df_complex-used.wsp
│ ├── df_inodes-free.wsp
│ ├── df_inodes-reserved.wsp
│ ├── df_inodes-used.wsp
│ ├── percent_bytes-free.wsp
│ ├── percent_bytes-reserved.wsp
│ ├── percent_bytes-used.wsp
│ ├── percent_inodes-free.wsp
│ ├── percent_inodes-reserved.wsp
│ └── percent_inodes-used.wsp
├── df-dev-shm
│ ├── df_complex-free.wsp
│ ├── df_complex-reserved.wsp
│ ├── df_complex-used.wsp
│ ├── df_inodes-free.wsp
│ ├── df_inodes-reserved.wsp
│ ├── df_inodes-used.wsp
│ ├── percent_bytes-free.wsp
│ ├── percent_bytes-reserved.wsp
│ ├── percent_bytes-used.wsp
│ ├── percent_inodes-free.wsp
│ ├── percent_inodes-reserved.wsp
│ └── percent_inodes-used.wsp
├── df-root
│ ├── df_complex-free.wsp
│ ├── df_complex-reserved.wsp
│ ├── df_complex-used.wsp
│ ├── df_inodes-free.wsp
│ ├── df_inodes-reserved.wsp
│ ├── df_inodes-used.wsp
│ ├── percent_bytes-free.wsp
│ ├── percent_bytes-reserved.wsp
│ ├── percent_bytes-used.wsp
│ ├── percent_inodes-free.wsp
│ ├── percent_inodes-reserved.wsp
│ └── percent_inodes-used.wsp
├── df-run
│ ├── df_complex-free.wsp
│ ├── df_complex-reserved.wsp
│ ├── df_complex-used.wsp
│ ├── df_inodes-free.wsp
│ ├── df_inodes-reserved.wsp
│ ├── df_inodes-used.wsp
│ ├── percent_bytes-free.wsp
│ ├── percent_bytes-reserved.wsp
│ ├── percent_bytes-used.wsp
│ ├── percent_inodes-free.wsp
│ ├── percent_inodes-reserved.wsp
│ └── percent_inodes-used.wsp
├── df-run-user-0
│ ├── df_complex-free.wsp
│ ├── df_complex-reserved.wsp
│ ├── df_complex-used.wsp
│ ├── df_inodes-free.wsp
│ ├── df_inodes-reserved.wsp
│ ├── df_inodes-used.wsp
│ ├── percent_bytes-free.wsp
│ ├── percent_bytes-reserved.wsp
│ ├── percent_bytes-used.wsp
│ ├── percent_inodes-free.wsp
│ ├── percent_inodes-reserved.wsp
│ └── percent_inodes-used.wsp
├── df-sys-fs-cgroup
│ ├── df_complex-free.wsp
│ ├── df_complex-reserved.wsp
│ ├── df_complex-used.wsp
│ ├── df_inodes-free.wsp
│ ├── df_inodes-reserved.wsp
│ ├── df_inodes-used.wsp
│ ├── percent_bytes-free.wsp
│ ├── percent_bytes-reserved.wsp
│ ├── percent_bytes-used.wsp
│ ├── percent_inodes-free.wsp
│ ├── percent_inodes-reserved.wsp
│ └── percent_inodes-used.wsp
├── disk-dm-0
│ ├── disk_io_time
│ │ ├── io_time.wsp
│ │ └── weighted_io_time.wsp
│ ├── disk_octets
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_ops
│ │ ├── read.wsp
│ │ └── write.wsp
│ └── disk_time
│ ├── read.wsp
│ └── write.wsp
├── disk-vda
│ ├── disk_io_time
│ │ ├── io_time.wsp
│ │ └── weighted_io_time.wsp
│ ├── disk_merged
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_octets
│ │ ├── read.wsp
│ │ └── write.wsp
│ ├── disk_ops
│ │ ├── read.wsp
│ │ └── write.wsp
│ └── disk_time
│ ├── read.wsp
│ └── write.wsp
├── interface-eth0
│ ├── if_dropped
│ │ ├── rx.wsp
│ │ └── tx.wsp
│ ├── if_errors
│ │ ├── rx.wsp
│ │ └── tx.wsp
│ ├── if_octets
│ │ ├── rx.wsp
│ │ └── tx.wsp
│ └── if_packets
│ ├── rx.wsp
│ └── tx.wsp
├── memory
│ ├── memory-buffered.wsp
│ ├── memory-cached.wsp
│ ├── memory-free.wsp
│ ├── memory-slab_recl.wsp
│ ├── memory-slab_unrecl.wsp
│ ├── memory-used.wsp
│ ├── percent-buffered.wsp
│ ├── percent-cached.wsp
│ ├── percent-free.wsp
│ ├── percent-slab_recl.wsp
│ ├── percent-slab_unrecl.wsp
│ └── percent-used.wsp
├── ping
│ ├── ping-10_70_42_151.wsp
│ ├── ping_droprate-10_70_42_151.wsp
│ └── ping_stddev-10_70_42_151.wsp
├── status.wsp
└── swap
├── percent-cached.wsp
├── percent-free.wsp
├── percent-used.wsp
├── swap-cached.wsp
├── swap-free.wsp
├── swap_io-in.wsp
├── swap_io-out.wsp
└── swap-used.wsp
Single cluster (Approx utilization of a cluster)
Size on disk: 49767242(~48 MB per cluster) + (no of host * (~382 MB per host)) + (no of LVM disk * (24 MB per disk)) +(no of virtual disk * (30 MB per disk)) + (no of bricks * (86 MB per brick)) + (no of devices * (36 MB per device)) + (no of volume * (~44.5 MB per volume)) + (no of hosts * (12 MB per host)) + (no of bricks * (98 MB per brick)) + (no of devices * (36 MB per device))
The text was updated successfully, but these errors were encountered: