Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): Add searchable annotation to maps #3136

Merged
merged 15 commits into from
Sep 8, 2021
Merged
13 changes: 8 additions & 5 deletions docs/modeling/extending-the-metadata-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,11 +167,11 @@ The Aspect has four key components: its properties, the @Aspect annotation, the
references to other entities, of type Urn or optionally `<Entity>Urn`
- The @Aspect annotation. This is used to declare that the record is an Aspect and can be included in an entity’s
Snapshot. Unlike the other two annotations, @Aspect is applied to the entire record rather than a specific field.
Note, you can mark an aspect as a timeseries aspect. Check out
this [doc](metadata-model.md#timeseries-aspects) for details.
- The @Searchable annotation. This annotation can be applied to any primitive field to indicate that it should be
indexed in Elasticsearch and can be searched on. For a complete guide on using the search annotation, see the
annotation docs further down in this document.
Note, you can mark an aspect as a timeseries aspect. Check out this [doc](metadata-model.md#timeseries-aspects) for
details.
- The @Searchable annotation. This annotation can be applied to any primitive field or a map field to indicate that it
should be indexed in Elasticsearch and can be searched on. For a complete guide on using the search annotation, see
the annotation docs further down in this document.
- The @Relationship annotations. These annotations create edges between the Snapshot’s Urn and the destination of the
annotated field when the snapshots are ingested. @Relationship annotations must be applied to fields of type Urn. In
the case of DashboardInfo, the `charts` field is an Array of Urns. The @Relationship annotation cannot be applied
Expand Down Expand Up @@ -398,6 +398,9 @@ ranking.
Now, when Datahub ingests Dashboards, it will index the Dashboard’s title in Elasticsearch. When a user searches for
Dashboards, that query will be used to search on the title index and matching Dashboards will be returned.

Note, when @Searchable annotation is applied to a map, it will convert it into a list with "key.toString()
=value.toString()" as elements. This allows us to index map fields, while not increasing the number of columns indexed.

#### @Relationship

This annotation is applied to fields inside an Aspect. This annotation creates edges between an Entity’s Urn and the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ public class SearchableFieldSpecExtractor implements SchemaVisitor {
private final List<SearchableFieldSpec> _specs = new ArrayList<>();
private final Map<String, String> _searchFieldNamesToPatch = new HashMap<>();

private static final String MAP = "map";

public List<SearchableFieldSpec> getSpecs() {
return _specs;
}
Expand All @@ -38,45 +40,59 @@ public void callbackOnContext(TraverserContext context, DataSchemaTraverse.Order

final DataSchema currentSchema = context.getCurrentSchema().getDereferencedDataSchema();

// First, check properties for primary annotation definition.
final Map<String, Object> properties = context.getEnclosingField().getProperties();
final Object primaryAnnotationObj = properties.get(SearchableAnnotation.ANNOTATION_NAME);

if (primaryAnnotationObj != null) {
validatePropertiesAnnotation(currentSchema, primaryAnnotationObj, context.getTraversePath().toString());
}

// Next, check resolved properties for annotations on primitives.
final Map<String, Object> resolvedProperties = FieldSpecUtils.getResolvedProperties(currentSchema);
final Object resolvedAnnotationObj = resolvedProperties.get(SearchableAnnotation.ANNOTATION_NAME);
final Object annotationObj = getAnnotationObj(context);

if (resolvedAnnotationObj != null) {
if (annotationObj != null) {
if (currentSchema.getDereferencedDataSchema().isComplex()) {
final ComplexDataSchema complexSchema = (ComplexDataSchema) currentSchema;
if (isValidComplexType(complexSchema)) {
extractSearchableAnnotation(resolvedAnnotationObj, currentSchema, context);
extractSearchableAnnotation(annotationObj, currentSchema, context);
}
} else if (isValidPrimitiveType((PrimitiveDataSchema) currentSchema)) {
extractSearchableAnnotation(resolvedAnnotationObj, currentSchema, context);
extractSearchableAnnotation(annotationObj, currentSchema, context);
} else {
throw new ModelValidationException(String.format("Invalid @Searchable Annotation at %s", context.getSchemaPathSpec().toString()));
throw new ModelValidationException(
String.format("Invalid @Searchable Annotation at %s", context.getSchemaPathSpec().toString()));
}
}
}
}

private void extractSearchableAnnotation(
final Object annotationObj,
final DataSchema currentSchema,
private Object getAnnotationObj(TraverserContext context) {
final DataSchema currentSchema = context.getCurrentSchema().getDereferencedDataSchema();

// First, check properties for primary annotation definition.
final Map<String, Object> properties = context.getEnclosingField().getProperties();
final Object primaryAnnotationObj = properties.get(SearchableAnnotation.ANNOTATION_NAME);

if (primaryAnnotationObj != null) {
validatePropertiesAnnotation(currentSchema, primaryAnnotationObj, context.getTraversePath().toString());
// Unfortunately, annotations on collections always need to be a nested map (byproduct of making overrides work)
// As such, for annotation maps, we make it a single entry map, where the key has no meaning
if (currentSchema.getDereferencedType() == DataSchema.Type.MAP && primaryAnnotationObj instanceof Map
&& !((Map) primaryAnnotationObj).isEmpty()) {
return ((Map<?, ?>) primaryAnnotationObj).entrySet().stream().findFirst().get().getValue();
}
}

// Check if the path has map in it. Individual values of the maps (actual maps are caught above) can be ignored
if (context.getTraversePath().contains(MAP)) {
return null;
}

// Next, check resolved properties for annotations on primitives.
final Map<String, Object> resolvedProperties = FieldSpecUtils.getResolvedProperties(currentSchema);
return resolvedProperties.get(SearchableAnnotation.ANNOTATION_NAME);
}

private void extractSearchableAnnotation(final Object annotationObj, final DataSchema currentSchema,
final TraverserContext context) {
final PathSpec path = new PathSpec(context.getSchemaPathSpec());
final SearchableAnnotation annotation =
SearchableAnnotation.fromPegasusAnnotationObject(
annotationObj,
FieldSpecUtils.getSchemaFieldName(path),
SearchableAnnotation.fromPegasusAnnotationObject(annotationObj, FieldSpecUtils.getSchemaFieldName(path),
currentSchema.getDereferencedType(), path.toString());
if (_searchFieldNamesToPatch.containsKey(annotation.getFieldName())
&& !_searchFieldNamesToPatch.get(annotation.getFieldName()).equals(context.getSchemaPathSpec().toString())) {
if (_searchFieldNamesToPatch.containsKey(annotation.getFieldName()) && !_searchFieldNamesToPatch.get(
annotation.getFieldName()).equals(context.getSchemaPathSpec().toString())) {
throw new ModelValidationException(
String.format("Entity has multiple searchable fields with the same field name %s",
annotation.getFieldName()));
Expand All @@ -97,7 +113,8 @@ public SchemaVisitorTraversalResult getSchemaVisitorTraversalResult() {
}

private Boolean isValidComplexType(final ComplexDataSchema schema) {
return DataSchema.Type.ENUM.equals(schema.getDereferencedDataSchema().getDereferencedType());
return DataSchema.Type.ENUM.equals(schema.getDereferencedDataSchema().getDereferencedType())
|| DataSchema.Type.MAP.equals(schema.getDereferencedDataSchema().getDereferencedType());
}

private Boolean isValidPrimitiveType(final PrimitiveDataSchema schema) {
Expand All @@ -107,34 +124,32 @@ private Boolean isValidPrimitiveType(final PrimitiveDataSchema schema) {
private void validatePropertiesAnnotation(DataSchema currentSchema, Object annotationObj, String pathStr) {

// If primitive, assume the annotation is well formed until resolvedProperties reflects it.
if (currentSchema.isPrimitive() || currentSchema.getDereferencedType().equals(DataSchema.Type.ENUM)) {
if (currentSchema.isPrimitive() || currentSchema.getDereferencedType().equals(DataSchema.Type.ENUM) || currentSchema
.getDereferencedType()
.equals(DataSchema.Type.MAP)) {
return;
}

// Required override case. If the annotation keys are not overrides, they are incorrect.
if (!Map.class.isAssignableFrom(annotationObj.getClass())) {
throw new ModelValidationException(String.format(
"Failed to validate @%s annotation declared inside %s: Invalid value type provided (Expected Map)",
SearchableAnnotation.ANNOTATION_NAME,
pathStr
));
SearchableAnnotation.ANNOTATION_NAME, pathStr));
}

Map<String, Object> annotationMap = (Map<String, Object>) annotationObj;

if (annotationMap.size() == 0) {
throw new ModelValidationException(
String.format("Invalid @Searchable Annotation at %s. Annotation placed on invalid field of type %s. Must be placed on primitive field.",
pathStr,
currentSchema.getType()));
throw new ModelValidationException(String.format(
"Invalid @Searchable Annotation at %s. Annotation placed on invalid field of type %s. Must be placed on primitive field.",
pathStr, currentSchema.getType()));
}

for (String key : annotationMap.keySet()) {
if (!key.startsWith(Character.toString(PathSpec.SEPARATOR))) {
throw new ModelValidationException(
String.format("Invalid @Searchable Annotation at %s. Annotation placed on invalid field of type %s. Must be placed on primitive field.",
pathStr,
currentSchema.getType()));
throw new ModelValidationException(String.format(
"Invalid @Searchable Annotation at %s. Annotation placed on invalid field of type %s. Must be placed on primitive field.",
pathStr, currentSchema.getType()));
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ private static FieldType getDefaultFieldType(DataSchema.Type schemaDataType) {
case INT:
case FLOAT:
return FieldType.COUNT;
case MAP:
return FieldType.KEYWORD;
default:
return FieldType.TEXT;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,12 @@ private void validateTestEntityInfo(final AspectSpec testEntityInfo) {
assertEquals(new TestEntityInfo().schema().getFullName(), testEntityInfo.getPegasusSchema().getFullName());

// Assert on Searchable Fields
assertEquals(7, testEntityInfo.getSearchableFieldSpecs().size());
assertEquals(8, testEntityInfo.getSearchableFieldSpecs().size());
assertEquals("customProperties", testEntityInfo.getSearchableFieldSpecMap().get(
new PathSpec("customProperties").toString()).getSearchableAnnotation().getFieldName());
assertEquals(SearchableAnnotation.FieldType.KEYWORD, testEntityInfo.getSearchableFieldSpecMap().get(
new PathSpec("customProperties").toString())
.getSearchableAnnotation().getFieldType());
assertEquals("textFieldOverride", testEntityInfo.getSearchableFieldSpecMap().get(
new PathSpec("textField").toString()).getSearchableAnnotation().getFieldName());
assertEquals(SearchableAnnotation.FieldType.TEXT, testEntityInfo.getSearchableFieldSpecMap().get(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import com.linkedin.metadata.models.AspectSpec;
import com.linkedin.metadata.models.EntitySpec;
import com.linkedin.metadata.models.FieldSpec;
import com.linkedin.util.Pair;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
Expand All @@ -21,6 +22,7 @@
public class FieldExtractor {

private static final String ARRAY_WILDCARD = "*";
private static final int MAX_VALUE_LENGTH = 200;

private FieldExtractor() {
}
Expand All @@ -40,7 +42,17 @@ public static <T extends FieldSpec> Map<T, List<Object>> extractFields(RecordTem
long numArrayWildcards = getNumArrayWildcards(fieldSpec.getPath());
// Not an array field
if (numArrayWildcards == 0) {
extractedFields.put(fieldSpec, Collections.singletonList(value.get()));
// For maps, convert it into a list of the form key=value (Filter out long values)
if (value.get() instanceof Map) {
extractedFields.put(fieldSpec, ((Map<?, ?>) value.get()).entrySet()
.stream()
.map(entry -> new Pair<>(entry.getKey().toString(), entry.getValue().toString()))
.filter(entry -> entry.getValue().length() < MAX_VALUE_LENGTH)
.map(entry -> entry.getKey() + "=" + entry.getValue())
.collect(Collectors.toList()));
} else {
extractedFields.put(fieldSpec, Collections.singletonList(value.get()));
}
} else {
List<Object> valueList = (List<Object>) value.get();
// If the field is a nested list of values, flatten it
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ public static void setValue(final SearchableFieldSpec fieldSpec, final List<Obje
return;
}

if (isArray) {
if (isArray || valueType == DataSchema.Type.MAP) {
ArrayNode arrayNode = JsonNodeFactory.instance.arrayNode();
fieldValues.forEach(value -> getNodeForValue(valueType, value, fieldType).ifPresent(arrayNode::add));
searchDocument.set(fieldName, arrayNode);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@
import com.datahub.test.TestEntityKey;
import com.datahub.test.TestEntitySnapshot;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.linkedin.common.urn.TestEntityUrn;
import com.linkedin.common.urn.Urn;
import com.linkedin.data.template.StringArray;
import com.linkedin.data.template.StringMap;


public class TestEntityUtil {
Expand All @@ -37,6 +39,7 @@ public static TestEntityInfo getTestEntityInfo(Urn urn) {
ImmutableList.of(new SimpleNestedRecord2().setNestedArrayStringField("nestedArray1"),
new SimpleNestedRecord2().setNestedArrayStringField("nestedArray2")
.setNestedArrayArrayField(new StringArray(ImmutableList.of("testNestedArray1", "testNestedArray2"))))));
testEntityInfo.setCustomProperties(new StringMap(ImmutableMap.of("key1", "value1", "key2", "value2")));
return testEntityInfo;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,6 @@ public void testExtractor() {
assertEquals(result.get(nameToSpec.get("nestedIntegerField")), ImmutableList.of(1));
assertEquals(result.get(nameToSpec.get("nestedArrayStringField")), ImmutableList.of("nestedArray1", "nestedArray2"));
assertEquals(result.get(nameToSpec.get("nestedArrayArrayField")), ImmutableList.of("testNestedArray1", "testNestedArray2"));
assertEquals(result.get(nameToSpec.get("customProperties")), ImmutableList.of("key1=value1", "key2=value2"));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ public void testMappingsBuilder() {
Map<String, Object> result = MappingsBuilder.getMappings(TestEntitySpecBuilder.getSpec());
assertEquals(result.size(), 1);
Map<String, Object> properties = (Map<String, Object>) result.get("properties");
assertEquals(properties.size(), 11);
assertEquals(properties.size(), 12);
assertEquals(properties.get("urn"), ImmutableMap.of("type", "keyword"));
assertTrue(properties.containsKey("browsePaths"));
// KEYWORD
assertEquals(properties.get("keyPart3"), ImmutableMap.of("type", "keyword", "normalizer", "keyword_normalizer"));

assertEquals(properties.get("customProperties"),
ImmutableMap.of("type", "keyword", "normalizer", "keyword_normalizer"));
// TEXT
Map<String, Object> nestedArrayStringField = (Map<String, Object>) properties.get("nestedArrayStringField");
assertEquals(nestedArrayStringField.get("type"), "keyword");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,11 @@ public void testQueryBuilder() {
assertEquals(keywordQuery.queryString(), "testQuery");
assertEquals(keywordQuery.analyzer(), "custom_keyword");
Map<String, Float> keywordFields = keywordQuery.fields();
assertEquals(keywordFields.size(), 7);
assertEquals(keywordFields.size(), 8);
assertEquals(keywordFields.get("keyPart1").floatValue(), 10.0f);
assertFalse(keywordFields.containsKey("keyPart3"));
assertEquals(keywordFields.get("textFieldOverride").floatValue(), 1.0f);
assertEquals(keywordFields.get("customProperties").floatValue(), 1.0f);
QueryStringQueryBuilder textQuery = (QueryStringQueryBuilder) shouldQueries.get(1);
assertEquals(textQuery.queryString(), "testQuery");
assertEquals(textQuery.analyzer(), "word_delimited");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ public void testSearchRequestHandler() {
HighlightBuilder highlightBuilder = sourceBuilder.highlighter();
List<String> fields =
highlightBuilder.fields().stream().map(HighlightBuilder.Field::name).collect(Collectors.toList());
assertEquals(fields.size(), 14);
assertEquals(fields.size(), 16);
List<String> highlightableFields =
ImmutableList.of("keyPart1", "textArrayField", "textFieldOverride", "foreignKey", "nestedForeignKey",
"nestedArrayStringField", "nestedArrayArrayField");
"nestedArrayStringField", "nestedArrayArrayField", "customProperties");
highlightableFields.forEach(field -> {
assertTrue(fields.contains(field));
assertTrue(fields.contains(field + ".*"));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,10 @@ record CustomProperties {
/**
* Custom property bag.
*/
@Searchable = {
"/*": {
"queryByDefault": true
}
}
customProperties: map[string, string] = { }
}
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
namespace com.datahub.test

import com.linkedin.common.Urn
import com.linkedin.common.CustomProperties

/**
* Info associated with a Test Entity
*/
@Aspect = {
"name": "testEntityInfo"
}
record TestEntityInfo {
record TestEntityInfo includes CustomProperties {

@Searchable = {
"fieldName": "textFieldOverride",
Expand Down