Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carbon.conf #4

Closed
rbirnie opened this issue Oct 28, 2013 · 8 comments
Closed

Carbon.conf #4

rbirnie opened this issue Oct 28, 2013 · 8 comments

Comments

@rbirnie
Copy link

rbirnie commented Oct 28, 2013

Hi,

Would it also be possible to see your carbon.conf for an idea on which settings need to be set between the different cache's/relay's?

@obfuscurity
Copy link
Owner

Here is the one I think you're referring to. IIRC it was configured for 0.9.10 and will probably not be suitable as a drop-in replacement for your environment but it's something to work from. In this environment we had Backstop running on Heroku accepting a heavy stream of metrics and distributing them to the pool of relays below.

[cache:a]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2103
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2104
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7102
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:b]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2203
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2204
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7202
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:c]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2303
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2304
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7302
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:d]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2403
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2404
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7402
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:e]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2503
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2504
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7502
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:f]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2603
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2604
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7602
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:g]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2703
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2704
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7702
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:h]
USER = carbon
MAX_CACHE_SIZE = 10000000
MAX_UPDATES_PER_SECOND = 5000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2803
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2804
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7802
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[relay:a]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2113
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2114
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
DESTINATIONS = 127.0.0.1:2104:a, 127.0.0.1:2204:b, 127.0.0.1:2304:c, 127.0.0.1:2404:d, 127.0.0.1:2504:e, 127.0.0.1:2604:f, 127.0.0.1:2704:g, 127.0.0.1:2804:h

[relay:b]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2213
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2214
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
DESTINATIONS = 127.0.0.1:2104:a, 127.0.0.1:2204:b, 127.0.0.1:2304:c, 127.0.0.1:2404:d, 127.0.0.1:2504:e, 127.0.0.1:2604:f, 127.0.0.1:2704:g, 127.0.0.1:2804:h

[relay:c]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2313
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2314
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
DESTINATIONS = 127.0.0.1:2104:a, 127.0.0.1:2204:b, 127.0.0.1:2304:c, 127.0.0.1:2404:d, 127.0.0.1:2504:e, 127.0.0.1:2604:f, 127.0.0.1:2704:g, 127.0.0.1:2804:h

[relay:d]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2413
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2414
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
DESTINATIONS = 127.0.0.1:2104:a, 127.0.0.1:2204:b, 127.0.0.1:2304:c, 127.0.0.1:2404:d, 127.0.0.1:2504:e, 127.0.0.1:2604:f, 127.0.0.1:2704:g, 127.0.0.1:2804:h

@obfuscurity
Copy link
Owner

Contrast that with this version which imho is much more typical of a "scale-out" (on a single box) across multiple cores. We had an HAProxy running in front of the relays at different "layers" (in front of replication and fanout layers). Running in production on 0.9.12.

[cache:1]
USER = carbon
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2103
PICKLE_RECEIVER_PORT = 2104
CACHE_QUERY_PORT = 7102
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:2]
USER = carbon
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2203
PICKLE_RECEIVER_PORT = 2204
CACHE_QUERY_PORT = 7202
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:3]
USER = carbon
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2303
PICKLE_RECEIVER_PORT = 2304
CACHE_QUERY_PORT = 7302
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:4]
USER = carbon
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2403
PICKLE_RECEIVER_PORT = 2404
CACHE_QUERY_PORT = 7402
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:5]
USER = carbon
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2503
PICKLE_RECEIVER_PORT = 2504
CACHE_QUERY_PORT = 7502
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

[cache:6]
USER = carbon
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2603
PICKLE_RECEIVER_PORT = 2604
CACHE_QUERY_PORT = 7602
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False

# Special cache with AUTOFLUSH enabled.
# Only for use with bulk data loads.
# Do not point MetricsD or StatsD at this.
# The fsyncs would be overwhelming.
[cache:7]
USER = carbon
MAX_UPDATES_PER_SECOND = 1000
MAX_CREATES_PER_MINUTE = 500
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
CACHE_QUERY_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2703
PICKLE_RECEIVER_PORT = 2704
CACHE_QUERY_PORT = 7702
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = True

[aggregator:1]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2123
PICKLE_RECEIVER_PORT = 2124
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_AGGREGATION_INTERVALS = 6
USE_FLOW_CONTROL = True
FORWARD_ALL = True
DESTINATIONS = 127.0.0.1:2104:1, 127.0.0.1:2204:2, 127.0.0.1:2304:3, 127.0.0.1:2404:4, 127.0.0.1:2504:5, 127.0.0.1:2604:6

[aggregator:2]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2223
PICKLE_RECEIVER_PORT = 2224
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_AGGREGATION_INTERVALS = 6
USE_FLOW_CONTROL = True
DESTINATIONS = 127.0.0.1:2104:1, 127.0.0.1:2204:2, 127.0.0.1:2304:3, 127.0.0.1:2404:4, 127.0.0.1:2504:5, 127.0.0.1:2604:6

# Note that our relay instance ids don't match up
# with our port numbering (2x14) because the first
# traditional carbon-relay port (2114) is in use
# by the HAProxy frontend.
#
# We are also using a 2nd HAProxy to distribute
# across two fanout relays (relay:5, relay:6)

# These relays act as our "replication" layer,
# sending a duplicate feed to the "fanout" relays below.

[relay:1]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2213
PICKLE_RECEIVER_PORT = 2214
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
USE_FLOW_CONTROL = True
REPLICATION_FACTOR = 2
DESTINATIONS = 127.0.0.1:2614:0, 172.16.6.24:2114:0

[relay:2]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2313
PICKLE_RECEIVER_PORT = 2314
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
USE_FLOW_CONTROL = True
REPLICATION_FACTOR = 2
DESTINATIONS = 127.0.0.1:2614:0, 172.16.6.24:2114:0

[relay:3]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2413
PICKLE_RECEIVER_PORT = 2414
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
USE_FLOW_CONTROL = True
REPLICATION_FACTOR = 2
DESTINATIONS = 127.0.0.1:2614:0, 172.16.6.24:2114:0

[relay:4]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2513
PICKLE_RECEIVER_PORT = 2514
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
USE_FLOW_CONTROL = True
REPLICATION_FACTOR = 2
DESTINATIONS = 127.0.0.1:2614:0, 172.16.6.24:2114:0

# These relays act as our "fanout" relays to the
# local carbon-aggregators.

[relay:5]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2713
PICKLE_RECEIVER_PORT = 2714
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
USE_FLOW_CONTROL = True
DESTINATIONS = 127.0.0.1:2124:1, 127.0.0.1:2224:2

[relay:6]
USER = carbon
LINE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2813
PICKLE_RECEIVER_PORT = 2814
MAX_QUEUE_SIZE = 10000
MAX_DATAPOINTS_PER_MESSAGE = 500
RELAY_METHOD = consistent-hashing
USE_FLOW_CONTROL = True
DESTINATIONS = 127.0.0.1:2124:1, 127.0.0.1:2224:2

@anatolijd
Copy link

Ports 2114 and 2614 - I assume they're are used by two haproxy frontends, right ?
localhost:2114 - haproxy roundrobin to local relay:[1-4] instances,
and
localhost:2614 - haproxy roundrobin to local relay:[5-6] instances.

Am I getting it correct ?

@obfuscurity
Copy link
Owner

Yup

On Thu, Jan 09, 2014 at 01:25:17AM -0800, Anatoliy D. wrote:

Ports 2114 and 2614 - I assume they're are used by two haproxy frontends, right ?
localhost:2114 - haproxy roundrobin to local relay:[1-4] instances,
and
localhost:2614 - haproxy roundrobin to local relay:[5-6] instances.

Am I getting it correct ?


Reply to this email directly or view it on GitHub:
#4 (comment)

Jason Dixon
http://obfuscurity.com/
https://twitter.com/obfuscurity

@punnie
Copy link

punnie commented May 22, 2014

Hey, sorry to dig up such an old thing, but I'm kind of running into some problems with a similar aggregator setup. I wonder if you could at least tell me how wrong I am doing this.

Have two relays picking up metrics from a AMQP queue, similar to relay:[5,6]. Those relays distribute their load between two aggregators, similar to aggregator:[1,2], using the consistent-hashing method.

My aggregators are misbehaving with this. My take on this is that by using consistent-hashing in order to distribute the load between two aggregators I'm not feeding them all the metrics they need to properly aggregate according to the rule, because those metrics will be "evenly" distributed between the two aggregators. They won't arrive in the proper sequence for the same aggregator to operate on them.

My aggregators jam in this setup, but I attribute the jamming issue to the fact that I stop feeding metrics to the system at a given point. Although I think that if I didn't stop, aggregates would be wrongly calculated between both aggregates.

If you're reading this to the end, thank you so much already. I'd really appreciate the heads up on this: it's starting to make a small dent in my sanity.

@obfuscurity
Copy link
Owner

@punnie First off, sorry for the late reply. I just now saw your question. Second, I think that's a reasonable assumption. You could use REPLICATION_MODE=2 to duplicate all metrics to both but that presumably defeats your purpose of running two aggregators. Ideally you'd want to use relay rules for just the metrics being aggregated, and C-H for everything else, but I don't think carbon-relay supports this sort of hybrid mode.

@punnie
Copy link

punnie commented Jul 18, 2014

@obfuscurity thank you very much for taking the time to reply.

just for the sake of closure, I've solved this issue with the help of graphite-project/carbon#32. what it does is implement a new relay method called "aggregated-consistent-hashing" which distributes metrics across aggregators taking into account the destination aggregated metric name instead of the source name to produce the hash. this ultimately results that every aggregator performs their own set of aggregation operations, and always receives every metric needed to perform that aggregation.

again, thanks for the help, and for pushing the envelope 🍻

@obfuscurity
Copy link
Owner

@punnie Very cool, I wasn't even aware of that feature. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants