You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While the current schema is good for getting started and certain sized workloads, it does not have sub-second queries once data gets larger. We have been mentioning in gitter etc for at least several months that stores like cassandra and recently elasticsearch are what you should use when you have larger data sets.
The current schema isn't designed in a way to make certain queries fast, for example span names or the dependencies. However, it is designed in a way that is easy to troubleshoot. Unlike blob-store approaches, it has a 2 table schema that's relatively easy to comprehend and query.
The problem we have is that a couple people have run into problems, expecting it to have been designed for performance. For example, @yzhang226 was surprised that queries were slow (several seconds) and was expecting 10s of thousands of spans/second. Also, @mansu found that queries were slow, and feels browser caching is still insufficient.
At the very least, we should document that the zipkin v1 mysql schema is not designed for performance, so use cassandra or elasticsearch instead. If we make this more clear, I suspect people will not expect it to be.
There are areas beyond the span names query that are slow, for example, the dependencies endpoint. By documenting more explicitly that this is a small-medium size solution, those looking for larger scale solutions won't consider it. Basically, this substitutes for 1on1 conversations.
We can keep an issue open, perhaps this one, to enumerate the concerns. I don't expect a performant option to look exactly like the current one, as it will at least need more tables. We must be considerate to people who are already using mysql, and not break them. For example, we can propose an alternate impl if we decide we have enough people working on zipkin to support more mysql options. Ideally we can couple this with model v2, so folks don't have to break schema twice.
In the mean time, we can leave this issue open rather than re-hashing 1-1 for each new person to the project.
Copied from original issue: openzipkin/zipkin-java#233
The text was updated successfully, but these errors were encountered:
#228 relates to this. for example, if you get services names across 30k traces and 5M rows in the annotations table, it will take over 2 seconds to return
Just a note so we won't forget, but if we look into altering table structure for the MySQL option it might be wise to also test against the TokuDB engine. This DB engine uses fractal tree indexes for much faster insertions and speedy searches on tables with very large row sets. I think this engine might be very inline with Zipkin's database usage pattern and make the MySQL option scale better.
the refactor intentionally named the artifact with v1 suffix to allow for a
new impl in mysql or even postgres which is frequently asked about. since
we update the README I think this is closeable thanks!
From @adriancole on May 15, 2016 10:7
While the current schema is good for getting started and certain sized workloads, it does not have sub-second queries once data gets larger. We have been mentioning in gitter etc for at least several months that stores like cassandra and recently elasticsearch are what you should use when you have larger data sets.
The current schema isn't designed in a way to make certain queries fast, for example span names or the dependencies. However, it is designed in a way that is easy to troubleshoot. Unlike blob-store approaches, it has a 2 table schema that's relatively easy to comprehend and query.
The problem we have is that a couple people have run into problems, expecting it to have been designed for performance. For example, @yzhang226 was surprised that queries were slow (several seconds) and was expecting 10s of thousands of spans/second. Also, @mansu found that queries were slow, and feels browser caching is still insufficient.
At the very least, we should document that the zipkin v1 mysql schema is not designed for performance, so use cassandra or elasticsearch instead. If we make this more clear, I suspect people will not expect it to be.
There are areas beyond the span names query that are slow, for example, the dependencies endpoint. By documenting more explicitly that this is a small-medium size solution, those looking for larger scale solutions won't consider it. Basically, this substitutes for 1on1 conversations.
We can keep an issue open, perhaps this one, to enumerate the concerns. I don't expect a performant option to look exactly like the current one, as it will at least need more tables. We must be considerate to people who are already using mysql, and not break them. For example, we can propose an alternate impl if we decide we have enough people working on zipkin to support more mysql options. Ideally we can couple this with model v2, so folks don't have to break schema twice.
In the mean time, we can leave this issue open rather than re-hashing 1-1 for each new person to the project.
Copied from original issue: openzipkin/zipkin-java#233
The text was updated successfully, but these errors were encountered: