-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr Container Scaling #4762
Comments
From discussion at http://irclog.iq.harvard.edu/dataverse/2018-07-17 we might want to watch https://www.youtube.com/watch?v=PC8mYweMgV4&t=434s |
After some trials and errors, right now I am working on making solr a headless service with 2 nodes (a master and a slave). |
@thaorell and I discussed this issue a bit at http://irclog.iq.harvard.edu/dataverse/2018-07-20 and my quick take is that Solr docs recommend SolrCloud but there's concerns that adding Zookeeper to the mix will complicate things. |
Currently, the following work has been done:
|
@thaorell great! Are you close to making a pull request? Are you blocked in any way? Please let us know how we can help. |
@pdurbin I think I am ready for a pull request. One question though, I have two new files solrconfig_master.xml and solrconfig_slave.xml, should these be in conf/solr or conf/docker/solr? |
as I have mentioned with @pdurbin, I also wrote some docs about how to configure persistent volumes on Kubernetes so Solr can backup and restore its index (glassfish and postgres will also follow suit if needed later) |
Let's have @matthew-a-dunlap comment on this because he's actively working on Solr config files for #4836. Awesome to hear about the backup and restore! As a developer, I run Solr in a very non-fancy way. I'm quick to reinstall it entirely. I have so little data on my laptop that for me it's quick to delete all the data out of Solr and reindex my installation of Dataverse. Real backup and restore sounds like a great feature for production installations of Dataverse. |
I've started making some changes to our solr setup in #4836. solrconfig.xml has changed somewhat (I went back to a clean slate) and is definitely going to change more. More importantly, I changed our solr installation steps based upon recommendations from folks in the solr IRC. Our code in develop is pointing to the installation folder for its templates which could lead to unforseen consequences. I'm not sure what the best next step is. I can probably update the configs you've created @thaorell as I go, but you may need to test them in the end. Hopefully the solrconfig.xml won't change again much after this story (and if maintaining it becomes a pain we can start using the programmatic API configurations to consolidate what we do). |
thanks @matthew-a-dunlap, I will create a PR soon so you could see the files. Ideally these files (either for standalone or distributed deployment cases) should be very similar |
Sounds great! @thaorell I realized I didn't answer your initial question about the config placement, maybe put them in the dockers folder for now and if we bring scaling into our normal deployments we can then move those |
@matthew-a-dunlap when your finish with #4836, I would appreciate it if you could inform me so I would fix my solrconfig_master.xml and solrconfig_slave.xml. |
@thaorell - sending this back your way after talking with @matthew-a-dunlap. Let us know when @danmcp's feedback is implemented and we'll take a look in code review. Thanks! |
@djbrooke I have implemented accordingly to the feedback. |
Looking good as of d538fac. Moving to QA. Thanks! |
One More issue : On Solr Collection Data Backup Based on Condition/Data Filter.(There is no provision for that . ) Because SOLR BACKUP API with Query is not working I am trying out to perform backups of our Solr data with a particular condition in mind. To provide some context, let's say Solr collection consists of 100 records, among which 70 records contain the text "mobile," and the remaining 30 records contain the text "cellphone." my objective is to take a Solr collection/data backup that contains only the records the text "cellphone" – essentially, we want to create a backup file that reflects these 30 specific records only. I would greatly appreciate it if you could share insights on the best practices or methods to achieve this selective backup based on a condition. If there are specific parameters or commands we should be utilizing, kindly provide the necessary guidance. Additionally, any documentation or references you could point us to would be immensely helpful. Thank you in advance for your time and assistance. We value your expertise and look forward to implementing an efficient solution based on your recommendations. |
This is not possible via the Backup/Restore API. Please find a list of supported command options for RESTORE here: |
|
@Himanshusoni9 hi! This question is probably better asked at https://groups.google.com/g/dataverse-community or https://chat.dataverse.org instead of an old, closed issue. ❤️ |
There has been some initial work to allow scaling for postgres and glassfish.
#4599
#4626
A similar project needs to be undertaken for Solr. I would expect the implementation to use a StatefulSet and perhaps an Operator.
More details on scaling solr:
https://lucene.apache.org/solr/guide/6_6/introduction-to-scaling-and-distribution.html
The text was updated successfully, but these errors were encountered: