layout | title | parent | has_children | nav_order |
---|---|---|---|---|
default |
UBI index schemas |
User behavior insights |
false |
7 |
User Behavior Insights (UBI) Logging is really a matter of linking and indexing queries, results to user interactions (events) with your application.
UBI is not functional unless the links between the following fields are consistently maintained within your UBI-enabled application:
client_id
represents a unique user with their client application.object_id
represents an id for whatever item the user is searching for, such asepc
,isbn
,ssn
,handle
.
object_id_field
tells us the type ofobject_id
, i.e. the actual labels: "epc", "isbn", "ssn", or "handle" for eachobject_id
.
query_id
is a unique id for the raw query language executed and the resultantobject_id
's (hits) that the query returned.action_name
, though not technically an id, theaction_name
tells us what exact user action (such asclick
oradd_to_cart
,watch
,view
,purchase
) that was taken (or not) with a givenobject_id
.
To summarize: the query_id
signals the beginning of a client_id
's Search Journey every time a user queries the search index, the action_name
tells us how the user is interacting with the query results within the application, and event_attributes.object.object_id
is referring to the precise query result that the user interacts with.
{% comment %}
{% endcomment %}
- Search Client: in charge of searching, and then recieving objects from some document index in OpenSearch. (1, 2, 5 and 7, in following sections)
- User Behavior Insights plugin: if activated in the
ext.ubi
stanza of the search request, manages the UBI Queries store in the background, indexing each underlying, technical, DSL, index query with a uniquequery_id
along with all returned resultantobject_id
's, and then passing thequery_id
back to the Search Client so that events can be linked to this query. (3, 4 and 5, in following sections) - objects: are whatever items the user is searching for with the queries. Activating UBI involves mapping your real-world objects (using it's
isbn
,ssn
) to theobject_id
fields in the schemas. - The Search Client, if separate from the UBI Client, forwards the indexed
query_id
to the UBI Client. Note: We break out the roles of search and UBI event indexing here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing. (6, following section) - The UBI Client then indexes all user events with this
query_id
until a new search is performed, and a newquery_id
is generated by User Behavior Insights and passed back to the UBI Client - If the UBI Client interacts with a result object, such as
onClick
, thatobject_id
,onClick
action_name
andquery_id
are all indexed together, signalling the causal link between the search and the object. (8 and 9, following section)
{% comment %} The mermaid source is converted into an png under .../images/ubi/ubi-schema-interactions.png
graph LR
style L fill:none,stroke-dasharray: 5 5
subgraph L["`*Legend*`"]
style ss height:150px
subgraph ss["Standard Search"]
direction LR
style ln1a fill:blue
ln1a[ ]--->ln1b[ ];
end
subgraph ubi-leg["UBI data flow"]
direction LR
ln2a[ ].->|"`**UBI interaction**`"|ln2b[ ];
style ln1c fill:red
ln1c[ ]-->|<span style="font-family:Courier New">query_id</span> flow|ln1d[ ];
end
end
linkStyle 0 stroke-width:2px,stroke:#0A1CCF
linkStyle 2 stroke-width:2px,stroke:red
%%{init: {
"flowchart": {"htmlLabels": false},
}
}%%
graph TB
User--1) <i>raw search string</i>-->Search;
Search--2) <i>search string</i>-->Docs
style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5
subgraph OS[OpenSearch Cluster fa:fa-database]
style E stroke-width:1px,stroke:red
E[( <b>UBI Events</b> )]
style Docs stroke-width:1px,stroke:#0A1CCF
style Q stroke-width:1px,stroke:red
Docs[(Document Index)] -."3) {<i>DSL</i>...} & [<i>object_id's</i>,...]".-> Q[( <b>UBI Queries</b> )];
Q -.4) <span style="font-family:Courier New">query_id</span>.-> Docs ;
end
Docs -- "5) <i>return</i> both <span style="font-family:Courier New">query_id</span> & [<i>objects</i>,...]" --->Search ;
Search-.6) <span style="font-family:Courier New">query_id</span>.->U;
Search --7) [<i>results</i>, ...]--> User
style *client-side* stroke-width:1px, stroke:#D35400
subgraph "`*client-side*`"
style User stroke-width:4px, stroke:#EC636
User["`**User**`" fa:fa-user]
App
Search
U
style App fill:#D35400,opacity:.35, stroke:#0A1CCF, stroke-width:2px
subgraph App[       UserApp fa:fa-store]
style Search stroke-width:2px, stroke:#0A1CCF
Search( Search Client )
style U stroke-width:1px,stroke:red
U( <b>UBI Client</b> )
end
end
User -.8) <i>selects</i> <span style="font-family:Courier New">object_id:123</span>.->U;
U-."9) <i>index</i> event:{<span style="font-family:Courier New">query_id, onClick, object_id:123</span>}".->E;
linkStyle 1,2,0,6 stroke-width:2px,fill:none,stroke:#0A1CCF
linkStyle 3,4,5,8 stroke-width:2px,fill:none,stroke:red
{% endcomment %}
There are 2 separate stores for UBI:
All underlying query information and results (object_id
's) are stored in the UBI Queries store, and remains largely invisible in the background.
The only obvious difference will be in the ubi
stanza of the JSON response, which could cause index bloat if one forgets that this is enabled.
UBI Queries schema: Since UBI manages the UBI Queries store, the developer should never have to write directly to this store (except for importing data).
-
timestamp
(events and queries) A UNIX timestamp of when the query was received -
query_id
(events and queries) A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate differentquery_id
. -
client_id
(events) A user/client ID provided by the client application -
query_response_objects_ids
(queries) This is an array of theobject_id
's. This could be the same id as the_id
but is meant to be the externally valid id of document/item/product.
This is the event store that the client side directly indexes events to, linking the event action_name
, object_id
's and query_id
's together with any other important event information.
Since this schema is dynamic, the developer can add any new fields and structures (such as user information, geo-location information) at index time that are not in the current UBI Events schema:
application
(size 100) - name of the application tracking UBI events (e.g. amazon-shop
, ABC-microservice
)
action_name
(size 100) - any name you want to call your event such as click
, watch
, purchase
, and add_to_cart
, but one could map these to any common JavaScript events, or debugging events.
TODO: How to formalize? A list of standard ones and then custom ones.
query_id
  (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated at index time by the **UBI Plugin**.
The `client_id` must be consistent in both the **UBI Queries** and **UBI Events** stores.
-
timestamp
: UTC-based, UNIX epoch time. -
message_type
(size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything else useful such as
QUERY
orCONVERSION
. Can be used to groupaction_name
together in logical bins. Thinking this should be backend logic in analysis -
message
(size 256) - optional text message for the log entry. For example, with a
message_type
ofINFO
, people might expect an informational or debug type text for this field, but amessage_type
ofQUERY
, we would expect the text to be more about what the user is searching on.
event_attributes
has dynamic mapping, meaning if events are indexed with many custom fields, the index could bloat quickly with many new fields.
{: .warning}
event_attributes
's structure that describes any important context about the event. Within it, it has 2 primary structuresposition
andobject
, as well as being extensible to add anymore relevant, custom, information about the event can be stored such as timing informaiton, individual user or session information, etc.
The two primary structures in the event_attributes
:
-
event_attributes.position
- structure that contains information on the location of the event origin, such as screen x,y coordinates, or the n-th object out of 10 results, ....event_attributes.position.ordinal
tracks the nth item within a list that a user could select, click (i.e. selecting the 3rd element could be event{onClick, results[4]
})
event_attributes.position.{x,y}
tracks x and y values, that the client defines
event_attributes.position.page_depth
tracks page depth of results
event_attributes.position.scroll_depth
tracks scroll depth of page results
event_attributes.position.trail
text field for tracking the path/trail that a user took to get to this location
-
event_attributes.object
, which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post). Theobject
structure has two ways to refer to the object, withobject_id
being the id that links prior queries to this object:event_attributes.object.internal_id
is a unique id that OpenSearch can use to internally to index the object, think the_id
field in the indexes.event_attributes.object.object_id
is the id that a user could look up and find the object instance within the document corpus. Examples include:ssn
,isbn
,ean
. Variants need to be incorporated in theobject_id
, so for a t-shirt that is red, you would need SKU level as theobject_id
. Initializing UBI requires mapping from the Document Index's primary key to thisobject_id
-
event_attributes.object.object_id_field
  indicates the type/class of object _and_ the ID field of the search index.
event_attributes.object.description
  optional description of the object
event_attributes.object.object_detail
  optional text for further data object details
- extensible fields: any new fields by any other names in the
object
that one indexes will dynamically expand this schema to that use-case. {: .warning}