-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new trie based router #1
base: master
Are you sure you want to change the base?
Conversation
@pixeltrix do you mind having a look over this and giving me some advice as to next steps before I open a PR on rails/rails? I'm asking you as it looks like you imported Journey into Rails and probably are responsible for maintaining it? |
@stevegraham interesting, though it's a little difficult to follow because of all the Journey removal stuff - it might be easier for investigation to see it by itself similar to how Journey used to be and then we can benchmark it easier. Some initial notes:
I'm a bit busy this week but I'll take a deeper look at the weekend. |
@pixeltrix Thanks for coming back to me. To answer your questions:
|
Just found out Rails forks are blacklisted on Travis 😢 |
Don't worry, just try running the tests in 1.9.3, 2.0 and 2.1. All other rubies aren't supported 'officially' by Rails team right now. |
It looks like it's also missing some features too, like graphviz output, etc. I don't mind removing that stuff (since we don't use it), but we should compare apples to apples. I'm not super clear on why we need a rewrite though. AFAICT, a Trie has the same algorithmic complexity as a DFA, O(n)+C where I've pushed a few commits that give a 7% speed increase to the current router. I used this benchmark: require 'benchmark/ips'
require 'action_dispatch'
arr = ["foo", "bar", "baz", "qux", "quux", "corge", "grault", "garply"]
paths = arr.permutation(3).map { |a| "/#{a.join '/'}" }
set = ActionDispatch::Routing::RouteSet.new(ActionDispatch::TestRequest)
class HelloWorld
def initialize(response)
@response = response
end
def call(env)
[ 200, { 'Content-Type' => 'text/plain' }, [@response] ]
end
end
set.draw do
paths.each do |path|
get path, to: HelloWorld.new(path)
end
end
Benchmark.ips do |x|
x.report('omg') do
request = ActionDispatch::TestRequest.new
request.path = paths.first
set.call request.env
end
end Before:
After:
A couple notes though. I switched to benchmark/ips because it's important to see the sttdev. I saw a 7% increase, but both runs had a 5% stddev. 7% is outside the stddev, so it should be a win. @stevegraham you should show stddevs on your benchmarks too. :-) Other thing, I tried to stay true to the benchmark @stevegraham posted, even though it looks like there is a bug in it. Description says:
But if you read the code: paths.each do |path|
request = ActionDispatch::TestRequest.new
request.path = paths.first
set.call request.env
end
end It's always matching the first route: Anyway, I'm not really excited to replace the router with an algorithm that has the same complexity as the current one because I don't understand the advantages. I'd be more excited to rm the code we don't use. |
OH HAI @tenderlove. TIL about
I think the HTML/GraphViz/etc output is the only thing missing. I mean functionally this router does everything Rails asks of Journey. Visualisation could be added to the trie router, but as you say, who uses it?
As I say Journey has been in Rails for sometime now and that no one else has written the code that yielded a 7% increase suggests to me that Journey is quite impenetrable to anyone coming along wanting to contribute to Rails. One advantage is this router is simply IMO vastly simpler to understand: The path is tokenised with the regular expression /[^./]/. Starting as the root node, a segment is shifted from the token array, a child node matching the token is returned, and we continue until either the token array is empty (a match) or there are no child nodes returned (no match). I appreciate that Journey works in a similar way but the addition of the many types of node (terminal, literal, dummy, slash, symbol, dot, unary, group, star, binary, etc) greatly add to the cognitive overhead. Also I find it very difficult (especially compared to a trie) to follow how the simulator moves between nodes in my head. Journey has hardly been touched since it was imported into the Rails source tree despite the low hanging fruit that's been there all along. The bus factor of Journey is too high.
Both a trie and DFA operate O(n) with respect to the length of the path being matched. This is because the whole string needs to be considered if there is a match. They are both asymptotically O(log n) with respect to runtime growth against route set growth. The length of the string k '/foo/bar.baz' has minimal effect on runtime when measuring the effect of the size of the route set n, and it remains a constant factor so we omit that here when talking about that. Despite a trie and DFA having notionally similar algorithmic complexity, Journey has a a lot of constant factors that contribute to lesser performance. I updated my local rails with your latest Journey commits and ran the ips benchmark as per yours, on both branches:
Oh LOL. Sorry about that. I was using Anyway, let me fix up the benchmark: require 'bundler/setup'
Bundler.setup
require 'benchmark'
require './lib/action_dispatch'
arr = ["foo", "bar", "baz", "qux", "quux", "corge", "grault", "garply"]
paths = arr.permutation(3).map { |a| "/#{a.join '/'}" }
set = ActionDispatch::Routing::RouteSet.new(ActionDispatch::TestRequest)
class HelloWorld
def initialize(response)
@response = response
end
def call(env)
[ 200, { 'Content-Type' => 'text/plain' }, [@response] ]
end
end
set.draw do
paths.each do |path|
get path, to: HelloWorld.new(path)
end
end
times = 5.times.map do
Benchmark.measure do
1000.times do
paths.each do |path|
request = ActionDispatch::TestRequest.new
request.path = path
set.call request.env
end
end
end
end
sum = times.reduce(0) { |memo, obj| memo + obj.real }
mean = sum / times.length.to_f
variance = times.reduce(0) { |memo, obj| memo + (obj.real - mean ) ** 2 }
variance = variance / (times.length - 1).to_f
puts times.map(&:real)
puts "Mean elapsed real time: #{mean} seconds"
puts "Standard Deviation: #{ Math.sqrt variance }" I ran both branches a few times before I measured so my machine was at a stable state (temperature, etc) in case this affected the results. This is what happened when I measured. N.B. These are the results for master with your recent optimisations included:
It's still faster than Journey. Notice however the standard deviation is double. This is down to the first run taking longer (although still faster than Journey). There is some caching going on in each of the nodes. If we omit that first run, i.e. when the cache is warm, standard deviation is 0.04907844326093347 (lower than Journey) |
Seems like good enough reasons for me! Aaron Patterson
|
@guilleiguaran the Rails test suite is green for me with Rubies |
@pixeltrix AFAICT Rails only generates URLs using Here is a benchmark for drawing the route set. times = 5.times.map do |i|
set.clear!
Benchmark.measure do
set.draw do
paths.each do |path|
get path, to: HelloWorld.new(path)
end
end
end
end rails/rails stevegraham/rails |
Sorry, I just have to pick one nit that is bugging me:
Just because nobody has tackled this low hanging fruit does not mean the code is impenetrable. It could also mean that the router is an extremely small percentage of the request / response time, so nobody has bothered. Remember that a web app does more than route requests! |
I agree with @tenderlove on this point. Also it can mean the code is very stable and isolated and doesn't require any change. |
Hey @tenderlove. It's hard to tell just by reading text, but it looks like I might have offended you. I really didn't mean to cause offence or bug you. Sorry for that. To clarify, and I can only speak for myself but I did find Journey impenetrable. To be clear when I say that I am not referring to your code but more the idea of how it works. Hope this clears things up. |
@rafaelfranca it could mean that, but with respect there have been changes and they have been mostly rather superficial. Anyway this is not solely about a faster router, it's also about simplification, especially conceptually. Consider:
So we have people removing unused parameters from method signatures, meanwhile there were these performance improvements waiting to be made. Everyone loves improved performance, so what does this tell us? IMO that the difference between what @tenderlove knows and the rest of us is significant and this part of Rails could be simplified from a comprehension point of view. I want to stress that performance was not the main reason I did this, it's purely a side effect. I wanted to add some functionality to the Rails router for myself and found it impossible to know where to begin. This was exacerbated by there being no documentation, n.b. Journey's documentation, which consists of "Too complex right now. :(" and also not being able to get answers to simple questions about it. |
@stevegraham I think we are trying to solve these problems with a too radical solution. It does need a better documentation, it also needs some improvements, and we aware of this. In fact we are planning to work on these two issues during this Summer as a Google Summer of Code project. |
To be clear, I'm not against the change and agree with all your points. I'd only take a smooth path to solve the current problems. For example, to aid the lack of documentation we are going to work on Journey to document its usage and behavior. To aid the unneeded complexity we are going to change Journey to be more Rails specific (Journey was implemented to be a generic router, so it has a lot of features that are not needed to Rails codebase). About code complexity, you may say your solution is simpler but maybe you think this because you implemented it and have the knowledge about that part of code. It is the same case of the current implementation. I'm not 100% sure if changing the implementation would make this part of code easier to everyone understand. I believe we can solve the problems you pointed without changing the implementation. If it is not possible or we think it is only waste of time I'd love to have this implementation inside Rails. Finally, I like to thank you for pointing these problems and actively work to fix they. We really appreciate it. ❤️ |
@stevegraham I'm not offended at all. I'm actually really excited about this PR, I just want you to be more rigorous in your assertions (especially when performance is on the line). For example, you started said your solution "has O(log n) runtime", but didn't qualify what that means (is Journey's algorithms are from undergrad compiler courses. I'd expect most undergrads to get the concepts of NFA, DFA state machines, and AST -> state machine translation techniques (the different node types are used in the AST -> state machine translation but not during route matching). But I also think it might be overkill for this problem. That is why your PR is exciting to me. We may not need a sledge hammer to solve this issue. Unfortunately the "Too complex right now. :(" comment came from the fact that Rack::Mount was a premature extraction from Rails. The API isn't that complex, but it's extremely specific to Rails and that is why I didn't want to document it. Journey was under the requirement that it be API compliant with Rack::Mount. But since Journey has been sucked back in to Rails, that requirement is now removed. We are free to make it good. Anyway, I am looking forward to this work, but I want to see more rigorous work put in to the assertions you make. And I am willing to do what it takes to help! I think @rafaelfranca is concerned about maintenance. Change is dangerous. But I am willing to work with that too. Maybe there is a middleground we can find? The trie and the DFA have essentially the same space / time complexity, so the algorithms to implement them should be very similar. Maybe there is something we can do to meet in the middle? Maybe the racc parser is overkill? Can we replace it first? WDYT? |
@stevegraham the balance of performance is what's important - recognition is a a small part of what a Rails app does on every request and much more important are things like generation, record querying, etc. Given that our time is limited we've spent time optimising those areas so other areas have been left - for example I made a change in ccb0301 to get a 35% boost in generating optimized url paths which I'm worried that you've undone by your changes to |
Personally I think having a proper parser grammar for string expressions is a good thing so would be loathe to see it removed |
This commit completely replaces Journey as the router in Rails. The reasoning being it is a lot less code and also IMO conceptually easier to understand than NFAs and GTGs. It also performs significantly better than Journey. I wrote a simple benchmark containing 336 routes that each respective router was required to match 1000 times, with the averaged time of 5 runs taken as the result. Journey completed this in 39.97 seconds and this router in 32.61 seconds. The simple case is a path is split on forward slashes and each segment is a child of the preceding segment with the route object stored at the terminal leaf node. The result is tree-like structure with O(log n) runtime characteristics as the route table grows. In practice with optional segments, more than one node on the path may point to the same route object. This is so /:controller matches as well as /:controller/:action does given the path /:controller(:/action(/:id))(.:format) The code is straightforward to understand if you seek further details.
FYI: R3 Ruby Gem https://github.com/tonytonyjan/rr3 |
Warning! This is a massive change. Opening a PR to bring this work to wider attention and open discussion, not necessarily to propose a merge at this point in time
A new router for Rails
TL;DR I propose a new router for Rails that is drastically smaller in size while retaining full compatibility with the existing routing API
Why?
Recently I had a need to add some functionality to the Rails router. Unfortunately I had great difficulty in understanding how Journey worked. All of Journey is :nodoc: and the author was unresponsive to basic questions such as would the API be stable for the foreseeable future, i.e. is it safe to write code that extends the router and not have it break when a new version of Rails is released.
Not being able to get basic questions answered, the difficulty in understanding it, and lack of documentation led me to consider writing a compatible router with less code. My reasoning was if it was simpler it would weigh in as less code making it:
Journey has been in Rails since 3.2, and that doesn't appear to have been touched more than superficially since then suggests it is daunting to approach for the uninitiated. Especially when I was able to remove circa 2150 loc with a first stab at an alternative router.
How it works
As I thought about an appropriate data structure I decided a trie would be a natural fit and it has O(log n) runtime and space characteristics. Instead of tokenising on each character as one might normally do with a trie, I thought tokenising on each path segment would be clearer, e.g. when one is in the debugger and inspecting the routing tree.
Consider the following canonical path pattern
/:controller(/:action(/:id))(.:format)
The new router will tokenise at the segment boundaries and annotate each segment with if it is optional or not. e.g.:controller
is required and the remaining are not. The trie will then find or instantiate a node for:controller
, look for a child node for:action
or initialise one and append to the collection. This will be repeated until the path is exhausted.Each node in the trie can point to a route object. Any route object can pointed at by multiple nodes. This is because a route object can be reachable by more than one path e.g. in the above example when optional segments are not present, or in the case of separate matching path, e.g a glob route.
Results
2087 LOC removed from Rails
Full compatibility with existing API. Tests are green.
ActionPack test suite execution time from ~32 seconds to ~28 seconds on my machine.
Benchmarks
I prepared the following benchmark and ran it against master and my branch. It takes 336 distinct paths and creates GET routes for them mapped to a simple Rack app. The router is asked to match each route 1000 times. This process is then repeated 5 times and an average time is taken.
As you can see, so far my router is faster. Anyway the point is, it is not slower AND it amounts to significantly less lines of code. So I would like to kick off a serious discussion about replacing the existing router with this one. Maybe I can start by making it into a Rails plugin?
Cheers
S