Mustang solves the problem of efficiently indexing Boolean expressions (both Disjunctive Normal Form (DNF) and Conjunctive Normal Form (CNF)) in a high-dimensional multi-valued attribute space. The goal is to rapidly find the set of Boolean expressions that evaluate to true for a given assignment of values to attributes. A solution to this problem has applications in online advertising (where a Boolean expression represents an advertiser’s user targeting requirements, and an assignment of values to attributes represents the characteristics of a user visiting an online page) and in general any targeting system (where a Boolean expression represents a targeting criteria, and an assignment of values to attributes represents an event).
Mustang presents a novel solution based on the inverted list data structure that enables us to index arbitrarily complex DNF and CNF Boolean expressions over multi-valued attributes. An interesting aspect of our solution is that, by virtue of leveraging inverted lists traditionally used for ranked information retrieval, we can efficiently return the top-N matching Boolean expressions. This capability enables applications such as ranked targeting systems, where only the top targeting criterias that match an event are desired. For example, in online advertising there is a limit on the number of advertisements that can be shown on a given page and only the “best” advertisements can be displayed.
<dependency>
<groupId>com.phonepe</groupId>
<artifactId>mustang</artifactId>
<version>2.2.3</version>
</dependency>
Mustang allows indexing Boolean Expressions in high-dimensional multi-valued attribute space.
`Criteria` represents the boolean expressions in one of the two normalized forms.
-
DNF : Disjunctive Normal Form, which is a disjunction of conjunctions
(A ∈ {a1, a2} ∧ B ∉ {b1, b2} ∧ C ∈ {c1}) ∨ (A ∈ {a1, a3} ∧ D ∉ {d1})
-
CNF : Conjunctive Normal Form, which is a conjunction of disjunctions
(A ∈ {a1, a2} ∨ B ∉ {b1, b2} ∨ C ∈ {c1}) ∧ (A ∈ {a1, a3} ∨ D ∉ {d1})
Composition
is a set of Predicate
(s). Depending upon how the constituent results are considered, it could be either :
Conjunction(∧)
is satisfied only when all constituent predicates evaluate to true.Disjunction(∨)
is satisfied when any of the constituent predicates evaluate to true.
Predicate
is a conditional and has the Detail
that needs to be satisfied.
INCLUDED
to indicate inclusion.EXCLUDED
to indicate exclusion.
Further, Mustang allows for logical grouping of Criteria
(s) when indexing through identification by a name.
Criteria
of any form can be indexed into an index-group. And searches are always directed to a specific index-group.
Detail
holds the information about the Caveat
that needs to be satisfied. Detail
can be predominantly of three types :
EqualityDetail
to enforceEQUALITY
caveat.RegexDetail
to enforceREGEX
caveat.RangeDetail
to enforceRANGE
caveat. Supports all flavors - greater_than, greater_than_equals, less_than, less_than_equals and between (both open & closed).VersioningDetail
to enforce easier version checks.
Below table summarizes Caveat
support across data types -
Caveat |
Data Types Supported |
---|---|
EQUALITY |
String, Number, Boolean |
REGEX |
String |
RANGE |
Number |
VERSIONING |
String |
ObjectMapper mapper = new ObjectMapper();
MustangEngine engine = MustangEngine.builder().mapper(mapper).build();
Criteria dnf = DNFCriteria.builder()
.id("C1") // id we would get back should this criteria match a given assignment
.conjunction(Conjunction.builder()
.predicate(IncludedPredicate.builder()
.lhs("$.a")
.detail(EqualityDetail.builder()
.values(Sets.newHashSet("A1", "A2", "A3"))
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.n")
.detail(RangeDetail.builder() // example for greater_than_equals
.lowerBound(3)
.includeLowerBound(true)
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.x")
.detail(RangeDetail.builder() // example for less_than
.upperBound(3)
.build())
.build())
.build())
.build();
Criteria cnf = CNFCriteria.builder()
.id("C2") // id we would get back should this criteria match a given assignment
.disjunction(Disjunction.builder()
.predicate(IncludedPredicate.builder()
.lhs("$.a")
.detail(EqualityDetail.builder()
.values(Sets.newHashSet("A1", "A2"))
.build())
.build())
.predicate(ExcludedPredicate.builder()
.lhs("$.b")
.detail(RegexDetail.builder()
.regex("B.?")
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.n")
.detail(EqualityDetail.builder()
.values(Sets
.newHashSet(0.000000000000001, 0.000000000000002, 0.000000000000003))
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.x")
.detail(RangeDetail.builder() // Example for lesser_than_equals
.upperBound(7)
.includeUpperBound(true)
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.p")
.detail(EqualityDetail.builder()
.values(Sets.newHashSet(true))
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.v")
.detail(VersioningDetail.builder()
.check(CheckType.ABOVE)
.baseVersion("1.2.3.4-alpha") // good coverage across formats
.build())
.build())
.build())
.build();
Index a single criteria
engine.add("index_name", criteria)
OR
Multiple criteria(s) at once.
engine.add("index_name", Arrays.asList(criteria1, criteria2, ...));
An assignment is a set of attribute name and value pairs. Json is a very good example of multiple-level K-V pairs.
Example : JsonNode event = { "a" : "A1", "b" : "B3", "n" : 5, "p" : true }
First we need to build the context -
EvaluationContext context = EvaluationContext.builder().node(event).build();
And search it in the required index -
Set<String> searchResults = engine.search("index_name",context);
which returns a set of id(s) of all matching criteria(s) in an ordered manner (More on this in the topN section).
At times, an ordered list is not required, in which case, we can skip the scoring part as below.
Set<String> searchResults = engine.search("index_name",context, false);
We would need to supply the weights for each of the predicates
to arrive at a notion of scores for any Criteria
.
These are then leveraged to sort rank the top N criteria.
Score of a criteria - E
reflects its relevance wrt to an assignment - S
.
If E is a conjunction of ∈ and ∉ predicates, the score of E is defined as
Scoreconj(E,S) = \sum {(A,v) \in IN(E) \cap S} w{E}(A,v) * w_{S}(A,v)
where
- IN (E ) is the set of all attribute name and value pairs in the ∈ predicates of E (we ignore scoring ∉ predicates)
- wE (A, v) is the weight of the pair (A, v) in E
- wS (A, v) is the weight of the pair (A, v) in S
Scores of different Criteria
are defined as below :
- Score of a
DNFCriteria
is defined as the maximum of the scores of the conjunctions. - Score of a
CNFCriteria
is defined as sum of the scores of the disjunctions.
Update the already indexed criteria.
engine.update("index_name", criteria)
PS :
update
is NOT limited to only already indexedcriteria
. Ifcriteria
is not already indexed, behavior will be akin toadd
.- Successive
add
operations to index a givencriteria
(identified bycriteriaId
) are not allowed. - For changes in any Criteria thats already indexed to reflect in the index,
update
is the way to go. - Post the
update
operation, for all practical purposes, only the newer version ofcriteria
will be considered for searches.
Delete an already indexed criteria.
engine.delete("index_name", criteria)
PS :
delete
is limited to only already indexedcriteria
.- Post the
delete
operation, for all practical purposes, deletedcriteria
will not be considered for searches.
Mustang provides support for scanning a list of Criteria
against a context
and arriving at the satisfying ones.
List<Criteria> matchingCriterias = engine.scan(criterias, context);
A specific Criteria
can also be evaluated against a given context
to pull out the result.
boolean result = evaluate(criteria, context);
At times we may need to update/delete a bunch of Criteria
s. Also, we may not know which all Criteria
s have already been indexed that needs deletion. In such cases, it is recommended to go for building a new index ground-up and replace it with the existing required index. So, one can build up a temporary index and replace this temporary index with the existing / old index. Index replacement is an atomic operation. Creation of a temporary index would need extra head room in the heap but wouldn't hold onto the extra memory post replacement.
replace(oldIndex, newIndex);
Ratification is a predictable way of identifying anomalies in search results for a given index. Its a very detailed process that looks out for discrepancies between the search results and the scan results for all possible Query
combinations. As the size of the index grows, needless to say, this will take more time and hence should be used judiciously and sparingly. Suggested way is to invoke ratification when changes done onto an index (such as add
,update
,delete
,replace
) are SUSPECT.
engine.ratify(indexName); // This triggers the ratification process in the background
RatificationResult result = engine.getRatificationResult(indexName); // Check back the results after a while
To aid in debugging, there is an export & import functionality provided on the index group.
String indexGroup = remoteMustangEngine.exportIndexGroup("index_name");
localMustangEngine.importIndexGroup(indexGroup);
// Try out searches on this index group now.
Set<String> searchResults = localMustangEngine.search("index_name",context);