Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure instance is started before checking for core existence #98

Closed
wants to merge 1 commit into from
Closed

Ensure instance is started before checking for core existence #98

wants to merge 1 commit into from

Conversation

mark-dce
Copy link

@cbeer - does this seem like a plausible diagnosis to you...

It looks to me like the solr_wrapper and rake hydra:server issues being reported in issues like #69 and #92 are timing related. Based on what I can see, it looks like client.exists? check sometimes is issued before the service is fully started. It seems highly dependent on the system you're running on - I can get the failure condition more reliably on an under spec'd VM than I can on my local development laptop.

If I modify the core check in lib/solr_wrapper/client.rb to call multiple times and show the responses like the following code does, I frequently see empty responses on the first call or two

    def core?(name)
      response = conn.get('admin/cores?action=STATUS&wt=json&core=' + name)

      puts "\nFirst call to Client#core? response status: #{JSON.parse(response.body)['status'][name]}"

      sleep 2
      response = conn.get('admin/cores?action=STATUS&wt=json&core=' + name)
      puts "\nSecond call to Client#core? response status: #{JSON.parse(response.body)['status'][name]}"

      sleep 2
      response = conn.get('admin/cores?action=STATUS&wt=json&core=' + name)
      puts "\nThird call to Client#core? response status: #{JSON.parse(response.body)['status'][name]}\n"

      !JSON.parse(response.body)['status'][name].empty?
    end

Gives me output like

$ solr_wrapper
Starting Solr 6.5.0 on port 8985 ... 
First call to Client#core? response status: {}

Second call to Client#core? response status: {"name"=>"hydra-development", "instanceDir"=>"/home/vagrant/home/hyrax-demo/tmp/solr-development/server/solr/hydra-development", "dataDir"=>"/home/vagrant/home/hyrax-demo/tmp/solr-development/server/solr/hydra-development/data/", "config"=>"solrconfig.xml", "schema"=>"schema.xml", "startTime"=>"2017-04-20T05:24:41.814Z", "uptime"=>3394, "index"=>{"numDocs"=>0, "maxDoc"=>0, "deletedDocs"=>0, "indexHeapUsageBytes"=>0, "version"=>2, "segmentCount"=>0, "current"=>true, "hasDeletions"=>false, "directory"=>"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/vagrant/home/hyrax-demo/tmp/solr-development/server/solr/hydra-development/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@670ab1a0; maxCacheMB=48.0 maxMergeSizeMB=4.0)", "segmentsFile"=>"segments_1", "segmentsFileSizeInBytes"=>71, "userData"=>{}, "sizeInBytes"=>71, "size"=>"71 bytes"}}

Third call to Client#core? response status: {"name"=>"hydra-development", "instanceDir"=>"/home/vagrant/home/hyrax-demo/tmp/solr-development/server/solr/hydra-development", "dataDir"=>"/home/vagrant/home/hyrax-demo/tmp/solr-development/server/solr/hydra-development/data/", "config"=>"solrconfig.xml", "schema"=>"schema.xml", "startTime"=>"2017-04-20T05:24:41.814Z", "uptime"=>5407, "index"=>{"numDocs"=>0, "maxDoc"=>0, "deletedDocs"=>0, "indexHeapUsageBytes"=>0, "version"=>2, "segmentCount"=>0, "current"=>true, "hasDeletions"=>false, "directory"=>"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/vagrant/home/hyrax-demo/tmp/solr-development/server/solr/hydra-development/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@670ab1a0; maxCacheMB=48.0 maxMergeSizeMB=4.0)", "segmentsFile"=>"segments_1", "segmentsFileSizeInBytes"=>71, "userData"=>{}, "sizeInBytes"=>71, "size"=>"71 bytes"}}

It seems like the instance status check should account for this, but I'm wondering if solr is reporting back a running status before it gives a reasonable response to more complex calls.

I have no idea how I'd write a timing related test. If you have any ideas, I'd be happy to try to add one. I'm also not sure if I've found the best spot to check for the service being started - it seems to reliably fix the particular issue for me, but might not address others related to startup timing.

Behavior on my test system before implementing the proposed change:
solr_wrapper starts, fails 4x, starts on 5th attempt
https://gist.github.com/mark-dce/0a6bb38adde0d5b7645ed58778158575

After adding a check to see if the instance is started, solr_wrapper appears to start reliably every time:
https://gist.github.com/mark-dce/7aaede62802adea79ab826a48af23ad8

@cbeer
Copy link
Owner

cbeer commented May 8, 2017

Fixed by #102?

@cbeer cbeer closed this May 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants