Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr Container Scaling #4762

Closed
danmcp opened this issue Jun 20, 2018 · 21 comments
Closed

Solr Container Scaling #4762

danmcp opened this issue Jun 20, 2018 · 21 comments
Assignees

Comments

@danmcp
Copy link
Contributor

danmcp commented Jun 20, 2018

There has been some initial work to allow scaling for postgres and glassfish.

#4599
#4626

A similar project needs to be undertaken for Solr. I would expect the implementation to use a StatefulSet and perhaps an Operator.

More details on scaling solr:

https://lucene.apache.org/solr/guide/6_6/introduction-to-scaling-and-distribution.html

@pdurbin
Copy link
Member

pdurbin commented Jul 17, 2018

@thaorell
Copy link
Contributor

After some trials and errors, right now I am working on making solr a headless service with 2 nodes (a master and a slave).

@pdurbin pdurbin self-assigned this Jul 20, 2018
@pdurbin
Copy link
Member

pdurbin commented Jul 23, 2018

@thaorell and I discussed this issue a bit at http://irclog.iq.harvard.edu/dataverse/2018-07-20 and my quick take is that Solr docs recommend SolrCloud but there's concerns that adding Zookeeper to the mix will complicate things.

@pdurbin pdurbin removed their assignment Jul 23, 2018
@thaorell
Copy link
Contributor

thaorell commented Aug 3, 2018

Currently, the following work has been done:

  1. Making Solr a StatefulSet
  2. Configuring master-slave replication for better scalability, for more information, read https://lucene.apache.org/solr/guide/6_6/index-replication.html#index-replication
  3. Setting up how backup and restoration for the master pod in OpenShift

@pdurbin
Copy link
Member

pdurbin commented Aug 7, 2018

@thaorell great! Are you close to making a pull request? Are you blocked in any way? Please let us know how we can help.

@thaorell
Copy link
Contributor

thaorell commented Aug 7, 2018

@pdurbin I think I am ready for a pull request. One question though, I have two new files solrconfig_master.xml and solrconfig_slave.xml, should these be in conf/solr or conf/docker/solr?

@thaorell
Copy link
Contributor

thaorell commented Aug 7, 2018

as I have mentioned with @pdurbin, I also wrote some docs about how to configure persistent volumes on Kubernetes so Solr can backup and restore its index (glassfish and postgres will also follow suit if needed later)

@pdurbin
Copy link
Member

pdurbin commented Aug 7, 2018

I have two new files solrconfig_master.xml and solrconfig_slave.xml, should these be in conf/solr or conf/docker/solr?

Let's have @matthew-a-dunlap comment on this because he's actively working on Solr config files for #4836.

Awesome to hear about the backup and restore! As a developer, I run Solr in a very non-fancy way. I'm quick to reinstall it entirely. I have so little data on my laptop that for me it's quick to delete all the data out of Solr and reindex my installation of Dataverse. Real backup and restore sounds like a great feature for production installations of Dataverse.

@matthew-a-dunlap
Copy link
Contributor

matthew-a-dunlap commented Aug 7, 2018

I've started making some changes to our solr setup in #4836. solrconfig.xml has changed somewhat (I went back to a clean slate) and is definitely going to change more.

More importantly, I changed our solr installation steps based upon recommendations from folks in the solr IRC. Our code in develop is pointing to the installation folder for its templates which could lead to unforseen consequences.

I'm not sure what the best next step is. I can probably update the configs you've created @thaorell as I go, but you may need to test them in the end. Hopefully the solrconfig.xml won't change again much after this story (and if maintaining it becomes a pain we can start using the programmatic API configurations to consolidate what we do).

@thaorell
Copy link
Contributor

thaorell commented Aug 7, 2018

thanks @matthew-a-dunlap, I will create a PR soon so you could see the files. Ideally these files (either for standalone or distributed deployment cases) should be very similar

@matthew-a-dunlap
Copy link
Contributor

Sounds great! @thaorell I realized I didn't answer your initial question about the config placement, maybe put them in the dockers folder for now and if we bring scaling into our normal deployments we can then move those

@thaorell
Copy link
Contributor

thaorell commented Aug 8, 2018

@matthew-a-dunlap when your finish with #4836, I would appreciate it if you could inform me so I would fix my solrconfig_master.xml and solrconfig_slave.xml.

@matthew-a-dunlap
Copy link
Contributor

matthew-a-dunlap commented Aug 8, 2018

@thaorell We decided for #4836 to keep it simple and only fix highlighting in schema.xml. The boosting fix and the solr best-practice changes are being put off for #4938 . Should mean that we don't have any conflicts as you don't seem to be touching schema.xml.

@djbrooke
Copy link
Contributor

djbrooke commented Aug 9, 2018

@thaorell - sending this back your way after talking with @matthew-a-dunlap. Let us know when @danmcp's feedback is implemented and we'll take a look in code review. Thanks!

@thaorell
Copy link
Contributor

@djbrooke I have implemented accordingly to the feedback.

@pdurbin
Copy link
Member

pdurbin commented Aug 14, 2018

@thaorell I just added a review to pull request #4924 and requested some minor changes, removing comments provided that I understand what you've implemented. Overall, this looks great! Thanks!

@pdurbin
Copy link
Member

pdurbin commented Aug 14, 2018

Looking good as of d538fac. Moving to QA. Thanks!

@Himanshusoni9
Copy link

One More issue : On Solr Collection Data Backup Based on Condition/Data Filter.(There is no provision for that . )

Because SOLR BACKUP API with Query is not working
//http://localhost:8983/solr/admin/collections?action=RESTORE&name=myBackupName&location=C:\Users\DELL\Downloads\SOLR_BACKUP&collection=myCondCollection&query=text:cellphone

I am trying out to perform backups of our Solr data with a particular condition in mind.

To provide some context, let's say Solr collection consists of 100 records, among which 70 records contain the text "mobile," and the remaining 30 records contain the text "cellphone." my objective is to take a Solr collection/data backup that contains only the records the text "cellphone" – essentially, we want to create a backup file that reflects these 30 specific records only.

I would greatly appreciate it if you could share insights on the best practices or methods to achieve this selective backup based on a condition. If there are specific parameters or commands we should be utilizing, kindly provide the necessary guidance. Additionally, any documentation or references you could point us to would be immensely helpful.

Thank you in advance for your time and assistance. We value your expertise and look forward to implementing an efficient solution based on your recommendations.

@poikilotherm
Copy link
Contributor

poikilotherm commented Nov 27, 2023

This is not possible via the Backup/Restore API.

Please find a list of supported command options for RESTORE here:
https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#restore

@Himanshusoni9
Copy link

This is not possible via the Backup/Restore API.

Please find a list of supported command options for RESTORE here: https://solr.apache.org/guide/solr/latest/deployment-guide/collection-management.html#restore
@poikilotherm
One more question on Backup Restore: Can we restore Solr 8.11 data to Solr 9.4, considering the change in Lucene version from 8.9.0 to 9.8.0? Do we need to reindex it?"

@pdurbin
Copy link
Member

pdurbin commented Nov 28, 2023

@Himanshusoni9 hi! This question is probably better asked at https://groups.google.com/g/dataverse-community or https://chat.dataverse.org instead of an old, closed issue. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants