-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-8542: Integrate Learning to Rank into Solr #4
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,330 @@ | ||
Apache Solr Learning to Rank | ||
======== | ||
|
||
This is the main [learning to rank integrated into solr](http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp) | ||
repository. | ||
[Read up on learning to rank](https://en.wikipedia.org/wiki/Learning_to_rank) | ||
|
||
Apache Solr Learning to Rank (LTR) provides a way for you to extract features | ||
directly inside Solr for use in training a machine learned model. You can then | ||
deploy that model to Solr and use it to rerank your top X search results. | ||
|
||
|
||
# Changes to solrconfig.xml | ||
```xml | ||
<config> | ||
... | ||
|
||
<!-- Query parser used to rerank top docs with a provided model --> | ||
<queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" /> | ||
|
||
<!-- Transformer that will encode the document features in the response. | ||
For each document the transformer will add the features as an extra field | ||
in the response. The name of the field we will be the the name of the | ||
transformer enclosed between brackets (in this case [features]). | ||
In order to get the feature vector you will have to | ||
specify that you want the field (e.g., fl="*,[features]) --> | ||
<transformer name="features" class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory" /> | ||
|
||
|
||
<!-- Component that hooks up managed resources for features and models --> | ||
<searchComponent name="ltrComponent" class="org.apache.solr.ltr.ranking.LTRComponent"/> | ||
<requestHandler name="/query" class="solr.SearchHandler"> | ||
<lst name="defaults"> | ||
<str name="echoParams">explicit</str> | ||
<str name="wt">json</str> | ||
<str name="indent">true</str> | ||
<str name="df">id</str> | ||
</lst> | ||
<arr name="last-components"> | ||
<!-- Use the component in your requestHandler --> | ||
<str>ltrComponent</str> | ||
</arr> | ||
</requestHandler> | ||
|
||
<query> | ||
... | ||
|
||
<!-- Cache for storing and fetching feature vectors --> | ||
<cache name="QUERY_DOC_FV" | ||
class="solr.search.LRUCache" | ||
size="4096" | ||
initialSize="2048" | ||
autowarmCount="4096" | ||
regenerator="solr.search.NoOpRegenerator" /> | ||
</query> | ||
|
||
</config> | ||
|
||
``` | ||
|
||
|
||
# Build the plugin | ||
In the solr/contrib/ltr directory run | ||
`ant dist` | ||
|
||
# Install the plugin | ||
In your solr installation, navigate to your collection's lib directory. | ||
In the solr install example, it would be solr/collection1/lib. | ||
If lib doesn't exist you will have to make it, and then copy the plugin's jar there. | ||
|
||
`cp lucene-solr/solr/dist/solr-ltr-X.Y.Z-SNAPSHOT.jar mySolrInstallPath/solr/myCollection/lib` | ||
|
||
Restart your collection using the admin page and you are good to go. | ||
You can find more detailed instructions [here](https://wiki.apache.org/solr/SolrPlugins). | ||
|
||
|
||
# Defining Features | ||
In the learning to rank plugin, you can define features in a feature space | ||
using standard Solr queries. As an example: | ||
|
||
###### features.json | ||
```json | ||
[ | ||
{ "name": "isBook", | ||
"type": "org.apache.solr.ltr.feature.impl.SolrFeature", | ||
"params":{ "fq": ["{!terms f=category}book"] } | ||
}, | ||
{ | ||
"name": "documentRecency", | ||
"type": "org.apache.solr.ltr.feature.impl.SolrFeature", | ||
"params": { | ||
"q": "{!func}recip( ms(NOW,publish_date), 3.16e-11, 1, 1)" | ||
} | ||
}, | ||
{ | ||
"name":"originalScore", | ||
"type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature", | ||
"params":{} | ||
}, | ||
{ | ||
"name" : "userTextTitleMatch", | ||
"type" : "org.apache.solr.ltr.feature.impl.SolrFeature", | ||
"params" : { "q" : "{!field f=title}${user_text}" } | ||
} | ||
] | ||
``` | ||
|
||
Defines four features. Anything that is a valid Solr query can be used to define | ||
a feature. | ||
|
||
### Filter Query Features | ||
The first feature isBook fires if the term 'book' matches the category field | ||
for the given examined document. Since in this feature q was not specified, | ||
either the score 1 (in case of a match) or the score 0 (in case of no match) | ||
will be returned. | ||
|
||
### Query Features | ||
In the second feature (documentRecency) q was specified using a function query. | ||
In this case the score for the feature on a given document is whatever the query | ||
returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated | ||
2 years ago, etc..) . If both an fq and q is used, documents that don't match | ||
the fq will receive a score of 0 for the documentRecency feature, all other | ||
documents will receive the score specified by the query for this feature. | ||
|
||
### Original Score Feature | ||
The third feature (originalScore) has no parameters, and uses the | ||
OriginalScoreFeature class instead of the SolrFeature class. Its purpose is | ||
to simply return the score for the original search request against the current | ||
matching document. | ||
|
||
### External Features | ||
Users can specify external information that can to be passed in as | ||
part of the query to the ltr ranking framework. In this case, the | ||
fourth feature (userTextPhraseMatch) will be looking for an external field | ||
called 'user_text' passed in through the request, and will fire if there is | ||
a term match for the document field 'title' from the value of the external | ||
field 'user_text'. See the "Run a Rerank Query" section for how | ||
to pass in external information. | ||
|
||
### Custom Features | ||
Custom features can be created by extending from | ||
org.apache.solr.ltr.ranking.Feature, however this is generally not recommended. | ||
The majority of features should be possible to create using the methods described | ||
above. | ||
|
||
# Defining Models | ||
Currently the Learning to Rank plugin supports 2 main types of | ||
ranking models: [Ranking SVM](http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf) | ||
and [LambdaMART](http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf) | ||
|
||
### Ranking SVM | ||
Currently only a linear ranking svm is supported. Use LambdaMART for | ||
a non-linear model. If you'd like to introduce a bias set a constant feature | ||
to the bias value you'd like and make a weight of 1.0 for that feature. | ||
|
||
###### model.json | ||
```json | ||
{ | ||
"type":"org.apache.solr.ltr.ranking.RankSVMModel", | ||
"name":"myModelName", | ||
"features":[ | ||
{ "name": "userTextTitleMatch"}, | ||
{ "name": "originalScore"}, | ||
{ "name": "isBook"} | ||
], | ||
"params":{ | ||
"weights": { | ||
"userTextTitleMatch": 1.0, | ||
"originalScore": 0.5, | ||
"isBook": 0.1 | ||
} | ||
|
||
} | ||
} | ||
``` | ||
|
||
This is an example of a toy Ranking SVM model. Type specifies the class to be | ||
using to interpret the model (RankSVMModel in the case of Ranking SVM). | ||
Name is the model identifier you will use when making request to the ltr | ||
framework. Features specifies the feature space that you want extracted | ||
when using this model. All features that appear in the model params will | ||
be used for scoring and must appear in the features list. You can add | ||
extra features to the features list that will be computed but not used in the | ||
model for scoring, which can be useful for logging. | ||
Params are the Ranking SVM parameters. | ||
|
||
Good library for training SVM's (https://www.csie.ntu.edu.tw/~cjlin/liblinear/ , | ||
https://www.csie.ntu.edu.tw/~cjlin/libsvm/) . You will need to convert the | ||
libSVM model format to the format specified above. | ||
|
||
### LambdaMART | ||
|
||
###### model2.json | ||
```json | ||
{ | ||
"type":"org.apache.solr.ltr.ranking.LambdaMARTModel", | ||
"name":"lambdamartmodel", | ||
"features":[ | ||
{ "name": "userTextTitleMatch"}, | ||
{ "name": "originalScore"} | ||
], | ||
"params":{ | ||
"trees": [ | ||
{ | ||
"weight" : 1, | ||
"tree": { | ||
"feature": "userTextTitleMatch", | ||
"threshold": 0.5, | ||
"left" : { | ||
"value" : -100 | ||
}, | ||
"right": { | ||
"feature" : "originalScore", | ||
"threshold": 10.0, | ||
"left" : { | ||
"value" : 50 | ||
}, | ||
"right" : { | ||
"value" : 75 | ||
} | ||
} | ||
} | ||
}, | ||
{ | ||
"weight" : 2, | ||
"tree": { | ||
"value" : -10 | ||
} | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
This is an example of a toy LambdaMART. Type specifies the class to be using to | ||
interpret the model (LambdaMARTModel in the case of LambdaMART). Name is the | ||
model identifier you will use when making request to the ltr framework. | ||
Features specifies the feature space that you want extracted when using this | ||
model. All features that appear in the model params will be used for scoring and | ||
must appear in the features list. You can add extra features to the features | ||
list that will be computed but not used in the model for scoring, which can | ||
be useful for logging. Params are the LambdaMART specific parameters. In this | ||
case we have 2 trees, one with 3 leaf nodes and one with 1 leaf node. | ||
|
||
A good library for training LambdaMART ( http://sourceforge.net/p/lemur/wiki/RankLib/ ). | ||
You will need to convert the RankLib model format to the format specified above. | ||
|
||
# Deploy Models and Features | ||
To send features run | ||
|
||
`curl -XPUT 'http://localhost:8983/solr/collection1/schema/fstore' --data-binary @/path/features.json -H 'Content-type:application/json'` | ||
|
||
To send models run | ||
|
||
`curl -XPUT 'http://localhost:8983/solr/collection1/schema/mstore' --data-binary @/path/model.json -H 'Content-type:application/json'` | ||
|
||
|
||
# View Models and Features | ||
`curl -XGET 'http://localhost:8983/solr/collection1/schema/fstore'` | ||
`curl -XGET 'http://localhost:8983/solr/collection1/schema/mstore'` | ||
|
||
|
||
# Run a Rerank Query | ||
Add to your original solr query | ||
`rq={!ltr model=myModelName reRankDocs=25}` | ||
|
||
The model name is the name of the model you sent to solr earlier. | ||
The number of documents you want reranked, which can be larger than the | ||
number you display, is reRankDocs. | ||
|
||
### Pass in external information for external features | ||
Add to your original solr query | ||
`rq={!ltr reRankDocs=3 model=externalmodel efi.field1='text1' efi.field2='text2'}` | ||
|
||
Where "field1" specifies the name of the customized field to be used by one | ||
or more of your features, and text1 is the information to be pass in. As an | ||
example that matches the earlier shown userTextTitleMatch feature one could do: | ||
|
||
`rq={!ltr reRankDocs=3 model=externalmodel efi.user_text='Casablanca' efi.user_intent='movie'}` | ||
|
||
# Extract features | ||
To extract features you need to use the feature vector transformer + set the | ||
fv parameter to true (this required parameter will be removed in the future). | ||
For now you need to also use a dummy model with all the features you want to | ||
extract inside the features parameter list of the model (this limitation will | ||
also be changed in the future so you can extract features without a dummy model). | ||
|
||
`fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25}` | ||
|
||
## Test the plugin with solr/example/techproducts in 6 steps | ||
|
||
Solr provides some simple example of indices. In order to test the plugin with | ||
the techproducts example please follow these steps | ||
|
||
1. compile solr and the examples | ||
|
||
cd solr | ||
ant dist | ||
ant example | ||
|
||
2. run the example | ||
|
||
./bin/solr -e techproducts | ||
|
||
3. stop it and install the plugin: | ||
|
||
./bin/solr stop | ||
#create the lib folder | ||
mkdir example/techproducts/solr/techproducts/lib | ||
# install the plugin in the lib folder | ||
cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar example/techproducts/solr/techproducts/lib/ | ||
# replace the original solrconfig with one importing all the ltr componenet | ||
cp contrib/ltr/example/solrconfig.xml example/techproducts/solr/techproducts/conf/ | ||
|
||
4. run the example again | ||
|
||
./bin/solr -e techproducts | ||
|
||
5. index some features and a model | ||
|
||
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore' --data-binary "@./contrib/ltr/example/techproducts-features.json" -H 'Content-type:application/json' | ||
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore' --data-binary "@./contrib/ltr/example/techproducts-model.json" -H 'Content-type:application/json' | ||
|
||
6. have fun ! | ||
|
||
# access to the default feature store | ||
http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ | ||
# access to the model store | ||
http://localhost:8983/solr/techproducts/schema/mstore | ||
# perform a query using the model, and retrieve the features | ||
http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
<?xml version="1.0"?> | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one or more | ||
contributor license agreements. See the NOTICE file distributed with | ||
this work for additional information regarding copyright ownership. | ||
The ASF licenses this file to You under the Apache License, Version 2.0 | ||
the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
|
||
<project name="ltr" default="default"> | ||
|
||
<description> | ||
Learning to Rank Package | ||
</description> | ||
|
||
<import file="../contrib-build.xml"/> | ||
</project> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ant example is deprecated in the current master branch,
we should point that with recent releases,
ant server
is necessary!