docs upgrade for UR v0.7.2

pferrel · pferrel · commit 81f2918def4f · 2018-05-16T17:26:24.000-07:00
diff --git a/debugging_with_intellij_idea.md b/debugging_with_intellij_idea.md
@@ -1,5 +1,9 @@
 # Debugging with IntelliJ IDEA
 
+Unfortunately as of PredictionIO it is not longer possible to debug your template with IntelliJ. The flowing instructions will work for older PredictionIO and IntelliJ. Please complain to the Aoache PredictionIO mailing list if this causes you as much of a pain as it does us.
+
+# Deprecated Debugging Insructions
+
 It is possible to run your template engine with IntelliJ IDEA. This makes the engine specific commands accessible for debugging, like `pio train`, `pio deploy`, and queries made to a deployed engine.
 
 ## Prerequisites
diff --git a/doclist.js b/doclist.js
@@ -78,6 +78,14 @@ DocList = [
                 title: 'Advanced Tuning',
                 template: 'ur_advanced_tuning'
             },
+            {
+                title: 'Use Cases',
+                template: 'ur_use_cases'
+            },
+            {
+                title: 'Business Rules',
+                template: 'ur_biz_rules'
+            },
             {
                 title: 'Model Debugging',
                 template: 'ur_elasticsearch_debugging'
diff --git a/docs_html_partials.js b/docs_html_partials.js
@@ -13,7 +13,7 @@ DocsHtmlPartials = [
   },
   {
     name: "pioversionnum",
-    template: "0.11.0",
+    template: "0.12.1",
     ismd: false,
     shouldLoad: false
   },
@@ -25,7 +25,7 @@ DocsHtmlPartials = [
   },
   {
     name: "urversionnum",
-    template: "0.6.0",
+    template: "0.7.2",
     ismd: false,
     shouldLoad: false
   },
@@ -55,55 +55,61 @@ DocsHtmlPartials = [
   },
   {
     name: "hdfsversionnum",
-    template: "2.7.2",
+    template: "2.8.2",
     ismd: false,
     shouldLoad: false
   },
   {
     name: "sparkversionnum",
-    template: "1.6.3",
+    template: "2.x",
     ismd: false,
     shouldLoad: false
   },
   {
     name: "elasticsearchversionnum",
-    template: "1.7.6",
+    template: "5.x",
     ismd: false,
     shouldLoad: false
   },
   {
     name: "hbaseversionnum",
-    template: "1.2.6",
+    template: "1.3.2",
+    ismd: false,
+    shouldLoad: false
+  },
+  {
+    name: "scalaversionnum",
+    template: "2.11.x",
     ismd: false,
     shouldLoad: false
   },
   {
     name: "hdfsdownload",
-    template: "<a href='http://www.eu.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz'>Hadoop 2.7.2</a>",
+    template: "<a href='http://www.eu.apache.org/dist/hadoop/common/hadoop-2.8.2/hadoop-2.8.2.tar.gz'>Hadoop 2.8.2</a>",
     ismd: false,
     shouldLoad: false
   },
   {
     name: "sparkdownload",
-    template: "<a href='http://www.us.apache.org/dist/spark/spark-1.6.3/spark-1.6.3-bin-hadoop2.6.tgz'>Spark 1.6.3</a>",
+    template: "<a href='http://www.us.apache.org/dist/spark/spark-2.1.2/spark-2.1.2-bin-hadoop2.7.tgz'>Spark 2.1.2</a>",
     ismd: false,
     shouldLoad: false
   },
   {
     name: "esdownload",
-    template: "<a href='https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.6.tar.gz'>Elasticsearch 1.7.6</a>",
+    template: "<a href='https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-5.6.4.tar.gz'>Elasticsearch 5.6.4</a>",
     ismd: false,
     shouldLoad: false
   },
   {
     name: "hbasedownload",
-    template: "<a href='http://www-us.apache.org/dist/hbase/1.2.6/hbase-1.2.6-bin.tar.gz'>HBase 1.2.6</a>",
+    template: "<a href='http://www-us.apache.org/dist/hbase/1.3.2/hbase-1.3.2-bin.tar.gz'>HBase 1.3.2</a>",
     ismd: false,
     shouldLoad: false
   },
   {
     name: 'pio_version',
-    template: 'PredictionIO-v0.11.0-incubating',
+    template: 'PredictionIO-v0.12.1',
     ismd: true,
     shouldLoad: false
   },
diff --git a/pio_by_actionml.md b/pio_by_actionml.md
@@ -1,7 +1,7 @@
 # PredictionIO and ActionML
 
-ActionML supports and directly commits code to Apache PredictionIO beginning with Apache PredictionIO-{{> pioversionnum}}. All added features that we created in our fork have been contributed and merged into Apache PredictionIO so we are now in sync. We may have some extra howtos, documents and certainly templates but check the [Apache site](http://predictionio.incubator.apache.org/) for more information.
+ActionML supports and directly commits code to Apache PredictionIO beginning with Apache PredictionIO-0.10.0 and continuing to the pressent release PredictionIO-{{> pioversionnum}}. We may have some extra howtos, documents, and certainly templates but check the [Apache site](http://predictionio.incubator.apache.org/) for more information.
 
-For help installing Apache PredictionIO-{{> pioversionnum}} please follow [these instructions](/docs/install) to install or upgrade from the ActionML version.
+For help installing Apache PredictionIO-{{> pioversionnum}} please follow [these instructions](/docs/install) to install or upgrade. Bw aware that upgrading can **erase your data** so please backup before any upgrade.
 
 For a description of past versions see the [history](/docs/pio_versions)
diff --git a/pio_start_stop.md b/pio_start_stop.md
@@ -108,6 +108,10 @@ Shutdown is in the opposite order of startup but if the startup is automated the
     /usr/local/hadoop/sbin/stop-dfs.sh
     ```
 
+## PIO Events Accumulate Forever
+
+PIO by default will continue to accumulate events forever, which will eventually make even Big Data fans balk at storage costs and will cause model training to take longer and longer. The answer to this is to trim and/or compress the PIO EventStore for a specific dataset. This can be done by using a template made for this purpose called the [DB Cleaner](/docs/db_cleaner_template).
+
 ## Monitoring
 
-See [**Monitoring PredictionIO**](pio_monitoring)
+See [**Monitoring PredictionIO**](/docs/pio_monitoring)
diff --git a/pio_versions.md b/pio_versions.md
@@ -1,8 +1,3 @@
 # PredictionIO-{{> pioversion}}
 
 ActionML is a direct contributor to the Apache PredictionIO project. The current stable release is {{> pioversion}} Install from [here](/docs/install) or use one of several methods described on the [Apache PredictionIO site](http://predictionio.incubator.apache.org/install/)
-
-# PIO Events Accumulate Forever
-
-PIO by default will continue to accumulate events forever, which will eventually make even Big Data fans balk at storage costs and it will cause model training to take longer and longer. The answer to this is to trim and/or compress the PIO EventStore for a specific dataset. This can be done by a simple mod to your template code described below or can be run as a separate job using a template made for this purpose called the [/docs/db_cleaner_template]().
-
diff --git a/ur.md b/ur.md
@@ -1,8 +1,12 @@
 # The Universal Recommender
 
-The Universal Recommender (UR) is a new type of collaborative filtering recommender based on an algorithm that can use data from a wide variety of user taste indicators&mdash;it is called the Correlated Cross-Occurrence algorithm. Unlike the matrix factorization embodied in things like MLlib's ALS, The UR's CCO algorithm is able to **ingest any number of user actions, events, profile data, and contextual information**. It then serves results in a fast and scalable way. It also supports item properties for filtering and boosting recommendations and can therefor be considered a hybrid collaborative filtering and content-based recommender. 
+The Universal Recommender (UR) is a new type of collaborative filtering recommender based on an algorithm that can use data from a wide variety of user taste indicators&mdash;it uses the Correlated Cross-Occurrence algorithm (CCO). Unlike the matrix factorization embodied in things like MLlib's ALS, The UR's CCO algorithm is able to **ingest any number of user actions, events, profile data, and contextual information**. It then serves results in a fast and scalable way. It also supports item properties for filtering and boosting recommendations and can therefor be considered a hybrid collaborative filtering and content-based recommender. 
 
-The use of multiple **types** of data fundamentally changes the way a recommender is used and, when employed correctly, will provide a significant increase in quality of recommendations vs. using only one user event. Most recommenders, for instance, can only use "purchase" events. Using all we know about a user and their context allows us to much better predict their preferences.
+The use of multiple **types** of data fundamentally changes the way a recommender is used and, when employed correctly, will provide a significant increase in quality of recommendations vs. using only one "conversion event". Most recommenders, for instance, can only use one indicator of user taste a "purchase" event. Using all we know about a user and their context allows us to much better predict their preferences.
+
+Not only does this data give lift to recommendation quality but it allows users who have little or no conversions to get recommendations. Therefore is can be used in places where conversions not as common. It also allows us to enrich preference indicators by extracting entities for text or learning topics and inferring preferences when users read something from a topic. 
+
+Even though this may sound complex, the Universal Recommender can be used well in more typical cases with no complex setup.
 
 ## Quick Start
 
@@ -15,12 +19,16 @@ There is a reason we call this recommender "universal" and it's because of the n
 * **Personalized Recommendations**: "just for you", when you have user history
 * **Similar Item Recommendations**: "people who liked this also like these"
 * **Shopping Cart Recommendations**:  more generally item-set recommendations. This can be applied to wishlists, watchlists, likes, any set of items that may go together. Some also call this "complimentary purchase" recommendations.
-* **Popular Items**: These can even be the primary form of recommendation if desired for some applications since serveral forms are supported. By default if a user has no recommendations popular items will backfill to achieve the number required.
+* **Popular Items**: These can even be the primary form of recommendation if desired for some applications since several forms are supported. By default if a user has no recommendations popular items will backfill to achieve the number required.
 * **Hybrid Collaborative Filtering and Content-based Recommendations**: since item properties can boost or filter recommendations a smooth blend of usage and content can be achieved.
 * **Recommendations with Business Rules**: The UR allows filters and boosts based user-defined properties that can be attached to items. So things like availability, categories, tags, location, or other user-defined properties can be used to rule in or out items to be recommended.
 
+## Simple Configuration
+
+All of the above use cases can be very simple to configure and setup. If you have an E-Commerce application, you may be able to get away with one type of input data and some item properties to get all of the benefits. If you have more complex needs, read the [Use Cases](ur_use_cases.md) section for tips.
+
 ## The Correlated Cross-Occurrence Algorithm (CCO)
 
 For most of the history of recommenders the data science could only find ways to use one type in user-preference indicator. To be sure this was one type per application but there is so much more we know from user behavior that was going unused. Correlated Cross-Occurrence (CCO) was developed to discover what behavior of a give user correlated to the type of action you want to recommend. If you want to recommend ***buy***, ***play***, ***watch***, or ***read***, is it possible that other things known about a user correlates to this recommended action&mdash;things like a ***pageview***, a ***like***, a ***category preference***, the ***location*** logged in from, the ***device*** used, item detail ***views***, or ***anything else*** known about the user. Furthermore how would we test for correlation?
 
-Enter the Log-Likelihood Ratio (LLR)&mdash;a probabilistic test for correlation between 2 events. This is super important because there is no linear relationship between the **event-types**. The correlation is at the indiviual user and event level and this is where LLR excels. To illustrate this ask yourself in an E-commerce situation is a product view 1/2 of a buy? You might think so but if the user viewed 2 things and bought one of them the correlation is 100% for one of the views and 0% for the other. So some view data is useful in predicting purchases and others are useless. LLR is a very well respected test for this type of correlation. 
+Enter the Log-Likelihood Ratio (LLR)&mdash;a probabilistic test for correlation between 2 events. This is super important because there is no linear relationship between the **event-types**. The correlation is at the indiviual user and event level and this is where LLR excels. To illustrate this ask yourself in an E-commerce situation is a product view 1/2 of a buy? You might think so but if the user viewed 2 things and bought one of them the correlation is 100% for one of the views and 0% for the other. So some view data is useful in predicting purchases and others are useless. LLR is a very well respected test for this type of correlation.
diff --git a/ur_biz_rules.md b/ur_biz_rules.md
@@ -0,0 +1,113 @@
+# Business Rules
+
+Everyone has seen apps like Amazon and Netflix, which show user recommendations but also may narrow down recommendations to a specific category of genre based on the user's location in the app or to fill a special row in the UI. This is done by applying Business Rules based on item properties. Most recommenders do not have this ability so an app must get many many recommendations then filter the ones that have the wrong properties. This is built into the UR in a most efficiently and simply way.
+
+First the input can be as simple as a "buy" where we know a user-id and an item-id. This is sufficient to make the recommendation, but to use buiness rules we must also set properties for items, like category or genre in this use case. We sent the JSON.
+
+```
+{
+    "event": "buy",
+    "entityType": "user",
+    "entityId": "John Doe",
+    "targetEntityType": "item",
+    "tagetEntityId": "some-item",
+    "eventTime": "ISO-encoded-datetime"
+}
+```
+
+To set the item property so the "some-item" has a category = "Electronics" we send the JSON:    
+
+```
+{
+    "event": "$set", <-- special reserved event name
+    "entityType": "item", <-- must be "item"
+    "entityId": "some-item", <-- same type of id as in the "buy"
+    "properties": { <-- an object may have several properties
+        "category": ["Electronics"], <-- and array allows several categories
+    },
+    "eventTime": "ISO-encoded-datetime"
+}
+```
+
+## Inclusion Business Rule
+
+Once the UR has trained on this data a simple query will return recommendations for "John Doe" that are all in the "Electronics" category:
+
+```
+{
+    "user": "John Doe",
+    "fields": [{
+        "name": "category",
+        "values": ["Electronics"],
+        "bias": -1
+    }]
+}    
+```
+
+This is called an *inclusion rule* since no item will be returned as a recommendation unless it includes the correct category. The `"bias": -1` tell the recommender to include no other recommendations.
+
+## Exclusion Business Rule
+
+Imagine that instead of including recs from the "Electronics" category all you want to do is **exclude** "Toys":
+
+```
+{
+    "user": "John Doe",
+    "fields": [{
+        "name": "category",
+        "values": ["Toys"],
+        "bias": 0
+    }]
+}    
+```
+
+This is called an *exclusion rule* since `"bias": 0` excludes matching items.
+
+## Boost Business Rules with Logical ANDs and ORs
+
+
+Inclusion and exclusion rules are dangerous because they can lead to no recommendations returned. They do not limit items, the limit recommended items so if there is not enough data too return a recommendation with category = "Electronics" while excluding all with category = "Toys" we might use a boost. The `bias` will be > 1.0 for a positive boost and 0 < boost < 1.0 to de-boost or disfavor something by it's properties. 
+
+To boost "Electronics" AND de-boost "Toys" we would send the query:
+
+```
+{
+    "user": "John Doe",
+    "fields": [{
+        "name": "category",
+        "values": ["Electronics"],
+        "bias": 10.0
+    }{
+        "name": "category",
+        "values": ["Toys"],
+        "bias": 0.001
+    }]
+}    
+```
+
+This is does 2 things:
+ - Any recommendation matching category "Electronics" will have it's score multiplied by 10. This will greatly increase it's rank and may increase it above all other items but if there are no recommendations with category "Electronics" is will still return them.
+ - AND if the recommended item matches the category "Toys" is will have it's score multiplied by 0.001 greatly decreasing its overall rank so that it may not be returned with the number of recs requested. 
+
+This query shows how to create queries that will not disqualify all recommendations. You are not guaranteed the recs match "Electronics" and do not match "Toys" but if the rules for matching with `"bias" -1"` and `"bias": 0` would have led to no returns this query will fall back to other items, avoiding no recs at all. Use inclusion and exclusion rules if you know for sure that you don't want non-matching recommendations.
+
+The other thing this does is show how to combine rules. Including 2 rules as different fields will AND them logically. 
+
+If we wanted to include recs from "Electronics" OR "Toys" we would send:  
+
+```
+{
+    "user": "John Doe",
+    "fields": [{
+        "name": "category",
+        "values": ["Electronics", "Toys"],
+        "bias": 10.0
+    }]
+}    
+```
+
+This will boost by 10 the score of any recommended item that matches either category and boost by 20 anything matching both. Give enough possible recommendations this will return recommendations matching both categories if it can.
+
+# WARNING
+
+Business rules can be very effective in broadening recommendations when showing them from several categories. They can be used to exclude items that are not "Available" or "In-stock". But be aware that you are creating a bias in recs. You are bending the rules used to find the best thing for the user. Unless there is a hard rule for not show something, try to use boosts. And when using boosts try to find a prominent place to show un-biased recommendations. That way you are using the rules in such a way that they do not exclude what the recommender thinks are the best items for the user.
diff --git a/ur_queries.md b/ur_queries.md
diff --git a/ur_use_cases.md b/ur_use_cases.md
diff --git a/ur_version_log.md b/ur_version_log.md