Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysql option is well-known to be not performant, yet people sometimes ask about it #1223

Closed
codefromthecrypt opened this issue Aug 10, 2016 · 4 comments

Comments

@codefromthecrypt
Copy link
Member

From @adriancole on May 15, 2016 10:7

While the current schema is good for getting started and certain sized workloads, it does not have sub-second queries once data gets larger. We have been mentioning in gitter etc for at least several months that stores like cassandra and recently elasticsearch are what you should use when you have larger data sets.

The current schema isn't designed in a way to make certain queries fast, for example span names or the dependencies. However, it is designed in a way that is easy to troubleshoot. Unlike blob-store approaches, it has a 2 table schema that's relatively easy to comprehend and query.

The problem we have is that a couple people have run into problems, expecting it to have been designed for performance. For example, @yzhang226 was surprised that queries were slow (several seconds) and was expecting 10s of thousands of spans/second. Also, @mansu found that queries were slow, and feels browser caching is still insufficient.

At the very least, we should document that the zipkin v1 mysql schema is not designed for performance, so use cassandra or elasticsearch instead. If we make this more clear, I suspect people will not expect it to be.

There are areas beyond the span names query that are slow, for example, the dependencies endpoint. By documenting more explicitly that this is a small-medium size solution, those looking for larger scale solutions won't consider it. Basically, this substitutes for 1on1 conversations.

We can keep an issue open, perhaps this one, to enumerate the concerns. I don't expect a performant option to look exactly like the current one, as it will at least need more tables. We must be considerate to people who are already using mysql, and not break them. For example, we can propose an alternate impl if we decide we have enough people working on zipkin to support more mysql options. Ideally we can couple this with model v2, so folks don't have to break schema twice.

In the mean time, we can leave this issue open rather than re-hashing 1-1 for each new person to the project.

Copied from original issue: openzipkin/zipkin-java#233

@codefromthecrypt
Copy link
Member Author

#228 relates to this. for example, if you get services names across 30k traces and 5M rows in the annotations table, it will take over 2 seconds to return

@codefromthecrypt
Copy link
Member Author

From @basvanbeek on May 25, 2016 14:28

Just a note so we won't forget, but if we look into altering table structure for the MySQL option it might be wise to also test against the TokuDB engine. This DB engine uses fractal tree indexes for much faster insertions and speedy searches on tables with very large row sets. I think this engine might be very inline with Zipkin's database usage pattern and make the MySQL option scale better.

@jcchavezs
Copy link
Contributor

Is this still a valid issue? does Zipkin still persisting data in MySQL with V1 model? If not, does V2 have a decent performance?

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Jun 7, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants