-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soften JVM overload failures #358
Conversation
@@ -320,10 +321,10 @@ encode_commit() -> | |||
%% @doc Encode a delete operation into a mochijson2 compatiable term. | |||
-spec encode_delete(delete_op()) -> term(). | |||
encode_delete({key,Key}) -> | |||
Query = ?YZ_RK_FIELD_S ++ ":" ++ ibrowse_lib:url_encode(binary_to_list(Key)), | |||
Query = ?YZ_RK_FIELD_S ++ ":\"" ++ ibrowse_lib:url_encode(binary_to_list(Key)) ++ "\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we are breatking UTF-8 here? I don't think we have tests for indexing/searching/deleting UTF-8 keys. I'll write up a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solves the reserved word problem by using a phrase query but the more appropriate solution down the line is to use the raw or term query parsers.
@rzezeski Awesome, I've read these notes. So, this is informative and tells me that if I want good Solr performance, I need to architect for it. i.e. highly clocked machines. I'll give the patch set a shot. Do I apply this against the 2.0.0-pre20 download? |
@wbrown I'm not sure if it will apply cleanly to pre20 download, but you can certainly try. |
@@ -104,7 +104,7 @@ | |||
<dynamicField name="*_set" type="string" indexed="true" stored="false" multiValued="true" /> | |||
|
|||
<!-- catch-all field --> | |||
<dynamicField name="*" type="text_general" indexed="true" stored="false" multiValued="true" /> | |||
<dynamicField name="*" type="text_general" indexed="false" stored="false" multiValued="true" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not going to index the catch-all field then set the type to ignored
and drop the other attributes so they are inherited from the type. IMO this makes the drop semantic obvious. It also may prevent unnecessary analyzing since it uses the StrField
type (I would hope indexed=false
prevents analysis but use a non-analyzed type just in case).
My reading of this is we should NOT raise the limit to 4. If Solr is
This sentence makes me want to set the max to 1. If Solr can't keep up
I think this is the crux of the issue. In the future I would like @coderoshi Did the |
@rzezeski no, it didn't cause any failures, it was just filling the logs. Here's a script I was running. https://gist.github.com/coderoshi/329b93ee7987b1beecdf I later ran some tests directly against solr in order to riak/yz out of the equation, for straight solr tuning bh = big_hash.clone
bh['_yz_id'] = SecureRandom.hex.to_s
doc = {"commit" => {}, "add" => {"doc" => bh}}.to_json
File.open("tmp/#{bh['_yz_id']}.json", 'w+') {|f| f.write(doc)}
`curl 'http://192.168.1.135:10034/solr/herp/update' -H'Content-type:application/json' --data-binary @tmp/#{bh['_yz_id']}.json 2> /dev/null` |
Putting this together as a way to verify my understanding of the
|
Are you sure this applies across all cores? Each core gets its own |
My understand is it was bumped to avoid logging noise. AFAICT running http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F |
So my reasoning for zeroing out As for bumping up I do believe we should keep autowarmCount at zero, however, I'm not so strident about keeping maxWarmingSearchers at 4. Increasing the size didn't seem to make much of a difference for my tests, so I figured what the hell? If log noise doesn't bother anyone, I'm fine with dropping |
I agree. Given our 1s auto commit I think auto warming may be hurting The real solution, in the future, is to provide APIs for manipulating |
@rzezeski I grabbed the patch from this commit, and applied it to the riak 2.0beta1 sources that went up for download. I even have a nice Arch Linux packaging script that will build me a Riak 2.0 beta1 package with that patch installed. Unfortunately, I ended up beating my head against the desk for hours. What had worked before doesn't appear to work anymore, and it possibly has to do with the bucket types stuff. It's worked in the past by setting the bucket's index_search value to the index and schema in question. So I got the latest bleeding edge Python client that had bucket types support added, worked my way through some of the bugs and lack of documentation there, and transitioned my indexing engine classes to use bucket types. I have indexes being created in Solr, buckets being assigned to bucket types, and bucket objects being grabbed from the bucket type object to ensure that we use the Yokozuna indexing. But I am absolutely not seeing any documents going into Yokozuna/Solr. My setup step are:
bts=(obs entity raw)
for bt in "${bts[@]}";
do
bin/riak-admin bucket-type create $bt '{"props":{}}'
bin/riak-admin bucket-type activate $bt
done
### create our schema if it doesn't exist
try:
self.riak.get_search_schema(schemaName)
except riak.RiakError:
self.riak.create_search_schema(schemaName,
open(schemaFile).read())
### our bucket type is the same name as our bucket name
indexedBucketType = self.riak.bucket_type( self.bucketName )
self.bucket = indexedBucketType.bucket( self.bucketName )
### create our index in Yokozuna if it doesn't exist
if not self.bucketName in [ x['name']
for x in self.riak.list_search_indexes() ]:
self.riak.create_search_index( RKeyString( self.bucketName ),
RKeyString( schemaName ),
self.replicas )
print "* Index created for", self.bucketName
time.sleep(5)
### Set the bucket type to our desired n_val.
if indexedBucketType.get_property( "n_val", None ) != self.replicas:
indexedBucketType.set_property( "n_val", self.replicas )
### set our bucket type's index to our desired index
try:
searchIndex = indexedBucketType.get_property("search_index", None)
except KeyError:
indexedBucketType.set_property("search_index", self.bucketName)
searchIndex = indexedBucketType.get_property("search_index", None)
if searchIndex != self.bucketName:
indexedBucketType.set_property("search_index", self.bucketName) Am I missing any obvious steps here? When I store to an indexed bucket, I ensure that I acquire the bucket object via the if indexEngine:
self.getBucket = self.riak.bucket_type( bucketName ).bucket
else:
self.getBucket = self.riak.bucket
self.bucket = self.getBucket(self.name) When I double check the object representation with print, it is giving the expected representation -- a bucket type of |
There seem to dialyzer issues on develop so going to figure those out first before getting the final review done on this issue. |
@rzezeski @coderoshi So, reporting in now that I've successfully gotten this to work. It was some subtle issues brought up by the bleeding edge version of the Python client that I had to use to get the It is feeling like for high load situations like mine, JVM tuning is absolutely essential. I took the JVM settings that were outlined in the initial comment opening this PR, and applied it. To make it work, I had to enable Huge Pages, and allocate about 2048 huge pages at 2MB each. This is with a slightly reduced Things are looking better now:
The above numbers put me in the ballpark of I'm seeing client side put latency of Looking at the network side of the picture:
So, about We'll see how this does once I've fully reloaded my dataset thus far into the new database. It's been my observation that Yokozuna usually starts out great, and then slows down. A part of this improvement is that I've improved my selection and versioning algorithms since I last used Yokozuna, and had been having good performance with Thank you -- and let me know if there's anything I can test or check out here. I am quite motivated to see a performant Yokozuna in 2.0 release. I'll keep you guys in the loop as I rebuild my dataset. |
Thanks for the detailed report @wbrown. On Mon, Apr 21, 2014 at 7:10 PM, wbrown notifications@github.com wrote:
Jon Meredith |
Update after the load test/data reloading had been running for about 14 hours. Shortly before I went to bed, I went ahead and started a second client feeding another set of data in. I'm at about 30 million keys indexed now. With two observation engine clients and a GeoIP database client populating Riak over Gigabit on a single workstation, my client-side latency stats are:
This is with a current rate of about
Latency measured by Riak is pretty good, though 95th and 99th percentile times aren't great at
With 30 million keys, that's enough to outsize the Riak with Bitcask is at So, things are overall improved versus my first attempt basically causing Riak, Yokozuna, and Solr to crash and burn. However, some confusing flies in the ointment:
Looking closer:
We can see that the connections have completely turned over in a A side question - I'm really big on metrics. You cannot do performance analysis without metrics. Do we have any Yokozuna latency metrics exposed anywhere, or do I need to set up Riak on OmniOS for |
@wbrown First, thank you so much for all the great information from
Yes this is what I was trying to explain in my |
My findings are that higher time out doesn't prevent the I also found that these changes don't solve connection churn as that Overall, the 60 second time out did make things quieter since it makes
In general, I think time outs need to be looked at more closely, With that I'm going to consider the time outs reviewed so that I can |
The riak tests look good:
|
I'm not away of `ON` being a reserved word in Lucene query syntax. Given the original reporter mentioned `OR` I've updated the test to use that.
I don't recall
|
+1 61d632 List of the things I reviewed:
|
+1 61d6326 |
Soften JVM overload failures Reviewed-by: rzezeski
@borshop merge |
Add `json_obj_valgen` to the Yokozuna BB driver. I used this to test indexing "large" JSON objects while investigating #358. The config line to use this value generator looks like so: ``` {value_generator, {function, yz_driver, json_obj_valgen, [<NumFields>]}}. ``` This will generate a static JSON object (i.e. generated once at start up) which contains `<NumFields>` integer fields. All workers will index the identical object. Here is an example config from one of my testing runs: ``` {mode, max}. {concurrent, 32}. {driver, yz_driver}. {code_paths, ["/root/work/yz-bb/misc/bench"]}. {secure, false}. {bucket, {<<"data">>, <<"largeobj">>}}. {index, <<"largeobj">>}. {pb_conns, [{"10.0.1.201", 8087}]}. {http_conns, []}. {duration, infinity}. {key_generator, {to_binstr, "~B", {partitioned_sequential_int, 0, 1000}}}. {value_generator, {function, yz_driver, json_obj_valgen, [4000]}}. {operations, [{load_pb, 1}]}. ``` This uses 32 workers to write the same JSON object containing 4000 integer fields over the key space 0 to 999.
A combination of ibrowse's inefficient load balancing algorithm and default socket inactivity timeout of 10 seconds can cause TIME-WAIT load. It becomes worse as the pool size of pipeline sizes are increased. This patch is a temporary workaround to reduce socket churn and thus TIME-WAIT load. It does so by increasing the inactivity timeout to 60 seconds across the board. Below is a chart of showing the amount of socket churn in connections per minute for the different timeout values both at idle and while under load from basho bench. These numbers were calculated by a DTrace script which counted the number of new connections being accepted on port 8093. | Timeout | Socket Churn At Idle | Socket Churn Under Load | |---------|----------------------|-------------------------| | 10s | ~59 conns/min | ~29 conns/min | | 60s | ~9 conns/min | ~6 conns/min | The timeout is set via application env because ibrowse has the absolute most complex configuration management code I have ever seen and this was the easiest way to make sure the timeout is set correctly. This is just a workaround until after 2.0 when other HTTP clients and pools may be tested. ibrowse seems to have many issues, this is but just one. For more background see the following issues: #367 #358 #330 #320
A combination of ibrowse's inefficient load balancing algorithm and default socket inactivity timeout of 10 seconds can cause TIME-WAIT load. It becomes worse as the pool size of pipeline sizes are increased. This patch is a temporary workaround to reduce socket churn and thus TIME-WAIT load. It does so by increasing the inactivity timeout to 600 seconds across the board. Below is a chart of showing the amount of socket churn in connections per minute for the different timeout values both at idle and while under load from basho bench. These numbers were calculated by a DTrace script which counted the number of new connections being accepted on port 8093. | Timeout | Socket Churn At Idle | Socket Churn Under Load | |---------|----------------------|-------------------------| | 10s | ~59 conns/min | ~29 conns/min | | 600s | ~9 conns/min | ~6 conns/min | The timeout is set via application env because ibrowse has the absolute most complex configuration management code I have ever seen and this was the easiest way to make sure the timeout is set correctly. This is just a workaround until after 2.0 when other HTTP clients and pools may be tested. ibrowse seems to have many issues, this is but just one. For more background see the following issues: #367 #358 #330 #320
Add `json_obj_valgen` to the Yokozuna BB driver. I used this to test indexing "large" JSON objects while investigating #358. The config line to use this value generator looks like so: ``` {value_generator, {function, yz_driver, json_obj_valgen, [<NumFields>]}}. ``` This will generate a static JSON object (i.e. generated once at start up) which contains `<NumFields>` integer fields. All workers will index the identical object. Here is an example config from one of my testing runs: ``` {mode, max}. {concurrent, 32}. {driver, yz_driver}. {code_paths, ["/root/work/yz-bb/misc/bench"]}. {secure, false}. {bucket, {<<"data">>, <<"largeobj">>}}. {index, <<"largeobj">>}. {pb_conns, [{"10.0.1.201", 8087}]}. {http_conns, []}. {duration, infinity}. {key_generator, {to_binstr, "~B", {partitioned_sequential_int, 0, 1000}}}. {value_generator, {function, yz_driver, json_obj_valgen, [4000]}}. {operations, [{load_pb, 1}]}. ``` This uses 32 workers to write the same JSON object containing 4000 integer fields over the key space 0 to 999.
After a week of replicating #330, I've concluded that there is little that we can do about an overloaded Solr JVM. What we can do is reduce the flood of failure messages by making some simple adjustments in key configuration files:
solrconfig.xml
Suggested by http://wiki.apache.org/solr/SolrPerformanceProblems, slow or overly frequent commits can overload the cache warmer. Since all other
autowarmCount
s were set to 0, I followed suit with thefieldValueCache
.Since all cache auto warmers were set to zero, I also bumped up with default
maxWarmingSearchers
from 2 (suitable for read-only slaves, as mentioned in the config comments) to 4. I found that on high loads of large objects, I'd get an exceeded limit of maxWarmingSearchers=X message, which this resolved.default_schema.xml
For objects with many fields, I set the
catch-all field
to false by default, as per #316. This helped tremendously in the reduction of incidental field indexing.jetty.xml
I bumped up the
maxIdleTime
from 50 seconds to 60, since that's the default timeout for many of our PB clients. It's also important that this value matches the ibrowse client timeout, explained below.The
acceptors
I kept at their default, which is 2. Solr recommends that you have 2 acceptors per CPU, so this is something that we should document. A better long term option is to make this a riak.conf, or possibly even automatically set acceptor count by interrogating the system, but I'm unsure this is something we want to tackle for beta.yz_solr.erl
Issue #320 was rife with
org.eclipse.jetty.io.EofException
. This is because ibrowse default times out at 30 seconds. Since jetty defaulted to 50 seconds, it was Solr complaining about the unexpected closed socket. This at least stops that complaint.ibrowse.conf
I made no changes to ibrowse.conf. Though we could increase the number of pooled and opened sockets, similar to this
{dest, "localhost", 10014, 100, 100000, []}.
it doesn't resolve the underlying problem of a slow and overloaded JVM. It might be a advanced setting in isolated cases, but in all of my tests increasing the connection pool just prolonged the inevitable crash from a JVM that wasn't indexing large objects fast enough.riak.conf
Followed many JVM tuning suggestions for larger heaps (http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning), from tweaking pages sizes, settings per heap space, ratios, etc.
I found some values that allowed me to support slightly larger objects sizes on my test cluster, for my particular test set (large JSON objects, 1-2M with 40-10k indexes of various types, updated 8-25 client threads), but expanding those options to be yokozuna default seems too pedantic.
I analyzed opened sockets in my tests, and when the JVM takes a long time, the connection remains active until the process is complete. If the data coming in is too large and fast, with insufficient JVM resources to manage the data being indexed in a timely manner, you'll see ports in use. The only solution I could see to this problem is ensuring that the JVM runs fast enough that the connections aren't used for more than a few seconds, which depends entirely on Solr being fast.
Other change
There's also a minor change in here. While making these _yz_solr changes, Dmitri found an error where his key was named for the state of the Oregon, causing the delete query to fail on a reserved word (_yz_rk:OR), so I wrapped it in quotes.
Documentation
Issue #330 gives us a glimpse of some problems that an ill-fitting JVM can cause. We're going to need to find and share plenty of documentation about how/where a user can got o investigate their resource requirements, and troubleshoot problems: basho/basho_docs#1012.