mysql option is well-known to be not performant, yet people sometimes ask about it #1223

codefromthecrypt · 2016-08-10T06:30:46Z

From @adriancole on May 15, 2016 10:7

While the current schema is good for getting started and certain sized workloads, it does not have sub-second queries once data gets larger. We have been mentioning in gitter etc for at least several months that stores like cassandra and recently elasticsearch are what you should use when you have larger data sets.

The current schema isn't designed in a way to make certain queries fast, for example span names or the dependencies. However, it is designed in a way that is easy to troubleshoot. Unlike blob-store approaches, it has a 2 table schema that's relatively easy to comprehend and query.

The problem we have is that a couple people have run into problems, expecting it to have been designed for performance. For example, @yzhang226 was surprised that queries were slow (several seconds) and was expecting 10s of thousands of spans/second. Also, @mansu found that queries were slow, and feels browser caching is still insufficient.

At the very least, we should document that the zipkin v1 mysql schema is not designed for performance, so use cassandra or elasticsearch instead. If we make this more clear, I suspect people will not expect it to be.

There are areas beyond the span names query that are slow, for example, the dependencies endpoint. By documenting more explicitly that this is a small-medium size solution, those looking for larger scale solutions won't consider it. Basically, this substitutes for 1on1 conversations.

We can keep an issue open, perhaps this one, to enumerate the concerns. I don't expect a performant option to look exactly like the current one, as it will at least need more tables. We must be considerate to people who are already using mysql, and not break them. For example, we can propose an alternate impl if we decide we have enough people working on zipkin to support more mysql options. Ideally we can couple this with model v2, so folks don't have to break schema twice.

In the mean time, we can leave this issue open rather than re-hashing 1-1 for each new person to the project.

Copied from original issue: openzipkin/zipkin-java#233

codefromthecrypt · 2016-08-10T06:30:47Z

#228 relates to this. for example, if you get services names across 30k traces and 5M rows in the annotations table, it will take over 2 seconds to return

codefromthecrypt · 2016-08-10T06:30:48Z

From @basvanbeek on May 25, 2016 14:28

Just a note so we won't forget, but if we look into altering table structure for the MySQL option it might be wise to also test against the TokuDB engine. This DB engine uses fractal tree indexes for much faster insertions and speedy searches on tables with very large row sets. I think this engine might be very inline with Zipkin's database usage pattern and make the MySQL option scale better.

jcchavezs · 2018-06-07T08:19:20Z

Is this still a valid issue? does Zipkin still persisting data in MySQL with V1 model? If not, does V2 have a decent performance?

codefromthecrypt · 2018-06-07T14:26:11Z

the refactor intentionally named the artifact with v1 suffix to allow for a new impl in mysql or even postgres which is frequently asked about. since we update the README I think this is closeable thanks!

codefromthecrypt closed this as completed Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mysql option is well-known to be not performant, yet people sometimes ask about it #1223

mysql option is well-known to be not performant, yet people sometimes ask about it #1223

codefromthecrypt commented Aug 10, 2016

codefromthecrypt commented Aug 10, 2016

codefromthecrypt commented Aug 10, 2016

jcchavezs commented Jun 7, 2018

codefromthecrypt commented Jun 7, 2018 via email

mysql option is well-known to be not performant, yet people sometimes ask about it #1223

mysql option is well-known to be not performant, yet people sometimes ask about it #1223

Comments

codefromthecrypt commented Aug 10, 2016

codefromthecrypt commented Aug 10, 2016

codefromthecrypt commented Aug 10, 2016

jcchavezs commented Jun 7, 2018

codefromthecrypt commented Jun 7, 2018 via email