The IBM Watson Retrieve and Rank service helps users find the most relevant information for their queries by using a combination of search and machine learning algorithms to detect "signals" in the data. You load your data into the service, which is built on top of Apache Solr, and train a machine learning model. Then use the trained model to provide improved results to users.
Give it a try! Click the button below to fork into IBM DevOps Services and deploy your own copy of this application on Bluemix.
View a demo of this app.
This application uses publicly available test data called the Cranfield collection. The collection contains abstracts of aerodynamics journal articles, a set of questions about aerodynamics, and labels to mark how relevant an article is to a question. Some questions are not used as training data, which means that you can use them to validate the performance of the trained ranker. This subset of questions are are used in the demo.
Ensure that you have the following prerequisites before you start:
- A Bluemix account. If you don't have one, sign up
- Java Development Kit version 1.7 or later
- Eclipse IDE for Java EE Developers
- Apache Maven, version 3.1 or later
- Git
- Websphere Liberty Profile server, if you want to run the app in your local environment
-
Create a Bluemix Account
Sign up in Bluemix or use an existing account. Watson Services in Beta are free to use.
-
Download and install the Cloud-foundry CLI tool.
-
Edit the
manifest.yml
file and change the<application-name>
to something unique.
applications:
- services:
- retrieve-and-rank-service
name: <application-name>
path: webApp.war
memory: 512M
The name you use determines your initial application URL, e.g.,
<application-name>.mybluemix.net
.
- Connect to Bluemix in the command line tool.
$ cf api https://api.ng.bluemix.net
$ cf login -u <your-user-ID>
- Create the Retrieve and Rank service in Bluemix.
$ cf create-service retrieve_and_rank standard retrieve-and-rank-service
-
Download and install the maven compiler.
-
Build the project.
You need to use the Apache
maven
to build the war file.
$ maven install
- Push it live!
$ cf push -p target/webApp.war
- Train the service to use the Cranfield collection and train a ranker with the Cranfield data. See a tutorial in Getting started with the Retrieve and Rank service. As you complete the tutorial, save this information:
- Solr cluster ID: The unique identifier of the Apache Solr Cluster that you create.
- Collection name: The name you give to the Solr collection when you create it.
- Ranker ID: The unique identifier of the ranker you create.
-
Use the values from the tutorial to specify environment variables in your app.
-
Navigate to the application dashboard in Bluemix.
-
Click the Retrieve and Rank application you created earlier.
-
Click Environment Variables.
-
Click USER-DEFINED.
-
Add the following three environment variables with the values that you copied from the tutorial:
CLUSTER_ID
COLLECTION_NAME
RANKER_ID
The application uses the WebSphere Liberty profile runtime as its server, so you need to download and install the profile as part of the steps below.
-
Copy the credentials,
CLUSTER_ID
,COLLECTION_NAME
andRANKER_ID
from yourretrieve-and-rank-service
service in Bluemix toRetrieveAndRankResource.java
.
You can use the following command to see the credentials:$ cf env <application-name>
Example output:
System-Provided: { "VCAP_SERVICES": { "retrieve-and-rank": [{ "credentials": { "url": "<url>", "password": "<password>", "username": "<username>" }, "label": "retrieve-and-rank", "name": "retrieve-and-rank-service", "plan": "standard" }] } } User-Provided: CLUSTER_ID: xxxxxxxx_ca0e_zzzz_zzzz_95zzz3aa2404 COLLECTION_NAME: ga RANKER_ID: F131F6-rank-10
You need to copy the
username
,password
, andurl
, -
Create a Liberty profile server in Eclipse.
-
Add the application to the server.
-
Start the server.
-
Go to
http://localhost:9080/webApp
to see the running application.
To troubleshoot your Bluemix application, the most useful source of information is the log files. To see them, run the following command:
$ cf logs <application-name> --recent
This sample code is licensed under Apache 2.0.
Full license text is available in LICENSE.
See CONTRIBUTING.
- Retrieve and Rank service documentation
- Configuring the Retrieve and Rank service
- Retrieve and Rank API reference
Find more open source projects on the IBM Github Page.