-
Notifications
You must be signed in to change notification settings - Fork 107
metricdefinition refactor #199
Comments
Correct. Using an auto-increment ID or UUID for each metric would require MT to keep an index and co-ordinate this index with all other nodes. |
I really like this idea. However sending a metric and getting back an id is not really possible as metric-ingestion is asynchronous. However, as the metric ID is deterministic we dont need to ask for the ID, the sender can just compute it. So i propose that we use two metric ingestion "commands". STORE and INDEX STORE : simply send the ID+ts+val The two commands should be independent of each other. So that data can still be stored even if an INDEX has not been sent. But without the INDEX it will not be possible to query the data. |
we most urgently need an improvement in our kafka data format, for the following reasons:
later we may be able to use the same format to carbon-relay-ng -> tsdgw |
I am very eager to discuss and help with this endeavor. We see a significant amount of memory being used when replaying data on start up, requiring us to set our resource limits 3x higher than they otherwise would need. Of course, going faster and using less CPU/disk/network bandwidth is excellent as well. Could having a "nullable" |
without thinking about it too much, I think it all boils down to cramming the metric id, ts and value into as small of a number of bytes as possible. I think a good starting point may be a new message format like:
this would give us messages of 32B or 40B. potentially we can also batch points together in single messages if they have the same timestamp. |
That's fair. Considering the rate that we are publishing data a savings of anything would be good. A quick sample of our (tag heavy) data stream shows that our average message size is 258.8B. So going down to 40B would be huge for us. I, for one, would prefer they lived in the same topic, as we currently have to match up the data topic and the clustering topic and adding a third becomes costly when you consider how many partitions we have for large clusters (and how little traffic the |
It seems like a good idea to allocate the first byte as a header that can identify the packet format (similar to @woodsaj proposal of STORE and INDEX above), then you can mix and match in the same topic without any issues, and you have the ability to iterate on the format later. That said, you can likely get away with using up only one of the available packet types if you define the packet format as containing |
The timestamp and orgId only need 4bytes (uint32). But we also use 1byte for the msg version. So that brings the message size down to 29B or 33B I also agree with @shanson7 that the messages need to be in the same topic. Using the first byte as a message version makes this pretty straight forward. If the initial version is 0x0, then it should be trivial to handle versioned and non-versioned messages during a transition to newer code. To better support these streamlined messages i think we are going to need to move the TSDB ingestion into Metrictank. So users send directly to MT, then MT sends to kafka. With this approach we only need to send the full payload once, then track the orgId+metricId in a large concurrentMap. The benefits of this approach is that when a metricDef is deleted from the index, we can broadcast the delete to other MT nodes to have them delete from their Map as well. If we have data sent to tsdb-gw, and then sent into kafka, there is no real way for us to notify the tsdb-gw's that a metricDef has been deleted. So the tsdb-gw's will need to track the metricIds seen and the last time a full payload was sent so they can periodically send full payloads. If a metricDef is deleted while data is still being sent then metrics will be lost for up to the full payload send frequency. |
Having MT ingest then forward to Kafka also works great for users that want to send carbon directly to MT but also have clustering. |
Not worried about the year 2038 problem? :) EDIT: I guess unsigned int makes this a year 2106 problem |
I have come to realize that this isn't true. Ideally, we want to push the optimized format up the stack to the metric sources. ie carbon-relay-ng should send the optimized format. So lets just keep this simple for now by adding support to MT for processing two message formats. As the work here requires major refactoring of the MT ingestion code, i would also like to address #741 at the same time. |
https://drive.google.com/file/d/15qMNQcLD7fwaA58yZP87REkT380TrO9W/view?usp=sharing Here is what i am thinking. The basic implementation would be to make "AggMetric" an interface with two variants. A Thoughts? |
@woodsaj that's an interesting idea. How would that work with aggregations? I guess the aggregations would have to be generated when a TempAggMetric is converted to a FullAggMetric because before that the correct schema cannot be determined? I'm trying to imagine the startup procedure in a setup like ours. Currently when MT has replayed the kafka backlog it announces that it is ready to handle queries. But in a case like what you illustrate, even if it has replayed the backlog there might be certain queries that it can't handle yet because it might not have the necessary MetricDefinitions for all the data it has replayed. Does that mean that MT can't announce to the rest of the cluster that it is ready before it has all the MDs, in order to ensure that we never serve incomplete data due to a restart? |
When the tempAggMetrics are converted to FullAggMetrics all of the buffered datapoints will be fed in, so the Aggregations will be created as normal.
Requiring full messages every 3hours or so seems reasonable and avoids all of these problems. |
I was thinking on how we can avoid tsdbgw needing to keep list. e.g. for every incoming metric, it can instead do something like if
I think MT should announce readyness as usual (based on priority aka ingestion lag). I need to think a bit more about the schematic, will get back to it soon. depedency analysisFYI, I looked through our code and paraphrased all dependencies between the components/properties. ReorderBufferneeds interval so that it can bucketize. which allows to keep the size static, and allows to discard points early if they would overwrite another anyway (meaning, we could potentially create another ROB that does not need the interval, but is less efficient) intervaldetector:
Aggmetric
schemaneeds name and interval aggregations configonly needs name idneeds metric,unit,mtype,interval,tags indexneeds basically all properties of a metric before it can add it |
I think the diagram makes sense for the most part. 2 comments:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
below is a list of problems i'm seeing with our metric data across the entire stack.
it covers problem in architectural design, performance, and metrics2.0 compatibility.
i may update this in the future as i think of more. in the meantime, happy to discuss.
I'ld like to start adressing these in the medium term (e.g. after high prio stuff such as kafka)
[16]byte
but needlessly implemented as a more costly stringthese id's don't compress as well as sequential numbers but i think that's a reasonable tradeoff.
implementation ideas/proposals:
The text was updated successfully, but these errors were encountered: