Skip to content

Commit

Permalink
materialize: boilerplate validator recommends explicit projections am…
Browse files Browse the repository at this point in the history
…ong ambiguous fields

A common case for user-defined projections are to change the capitalization of a
field. For systems that do not allow identical-except-for-capitalization named
fields, this is unfortunate because the user-defined explicit projection is
currently getting marked as optional even if it would otherwise be recommended,
since the field names only differ in capitalization.

This change has us recommend explicit projections if they would otherwise be
recommended even if their field name is ambiguous with others. It is relatively
simple to do so since the Flow build process will not allow projected field
names that differ only in capitalization, so we know if there is an explicit
projection it's the only one with those case-insensitive characters.
  • Loading branch information
williamhbaker committed Oct 17, 2024
1 parent 5c2a4cb commit 0008448
Show file tree
Hide file tree
Showing 27 changed files with 56 additions and 23 deletions.
12 changes: 12 additions & 0 deletions materialize-boilerplate/.snapshots/TestValidate
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ table already exists with incompatible proposed spec:
new materialization with ambiguous fields:
{"Field":"FIRSTBADFIELD","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'FIRSTBADFIELD' would be materialized as 'firstbadfield', which is ambiguous with fields [firstBadField,firstbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"SECONDBADFIELD","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'SECONDBADFIELD' would be materialized as 'secondbadfield', which is ambiguous with fields [secondBadField,secondbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"THIRDBADFIELD","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'THIRDBADFIELD' would be materialized as 'thirdbadfield', which is ambiguous with fields [tHiRdBaDfIeLd,thirdBadField,thirdbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"_meta/flow_truncated","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"The projection has a single scalar type"}
{"Field":"firstBadField","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'firstBadField' would be materialized as 'firstbadfield', which is ambiguous with fields [FIRSTBADFIELD,firstbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"firstbadfield","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'firstbadfield' would be materialized as 'firstbadfield', which is ambiguous with fields [FIRSTBADFIELD,firstBadField]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
Expand All @@ -148,10 +149,14 @@ new materialization with ambiguous fields:
{"Field":"key","Type":2,"TypeString":"LOCATION_REQUIRED","Reason":"All Locations that are part of the collections key are required"}
{"Field":"secondBadField","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'secondBadField' would be materialized as 'secondbadfield', which is ambiguous with fields [SECONDBADFIELD,secondbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"secondbadfield","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'secondbadfield' would be materialized as 'secondbadfield', which is ambiguous with fields [SECONDBADFIELD,secondBadField]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"tHiRdBaDfIeLd","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"The projection has a single scalar type"}
{"Field":"thirdBadField","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'thirdBadField' would be materialized as 'thirdbadfield', which is ambiguous with fields [THIRDBADFIELD,tHiRdBaDfIeLd,thirdbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"thirdbadfield","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'thirdbadfield' would be materialized as 'thirdbadfield', which is ambiguous with fields [THIRDBADFIELD,tHiRdBaDfIeLd,thirdBadField]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}

table already exists with a column for an ambiguous field for a new materialization:
{"Field":"FIRSTBADFIELD","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'FIRSTBADFIELD' would be materialized as 'firstbadfield', which is ambiguous with fields [firstBadField,firstbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"SECONDBADFIELD","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'SECONDBADFIELD' would be materialized as 'secondbadfield', which is ambiguous with fields [secondBadField,secondbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"THIRDBADFIELD","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'THIRDBADFIELD' would be materialized as 'thirdbadfield', which is ambiguous with fields [tHiRdBaDfIeLd,thirdBadField,thirdbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"_meta/flow_truncated","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"The projection has a single scalar type"}
{"Field":"firstBadField","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'firstBadField' would be materialized as 'firstbadfield', which is ambiguous with fields [FIRSTBADFIELD,firstbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"firstbadfield","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'firstbadfield' would be materialized as 'firstbadfield', which is ambiguous with fields [FIRSTBADFIELD,firstBadField]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
Expand All @@ -161,10 +166,14 @@ table already exists with a column for an ambiguous field for a new materializat
{"Field":"key","Type":1,"TypeString":"FIELD_REQUIRED","Reason":"This field is a key in the current materialization"}
{"Field":"secondBadField","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'secondBadField' would be materialized as 'secondbadfield', which is ambiguous with fields [SECONDBADFIELD,secondbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"secondbadfield","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'secondbadfield' would be materialized as 'secondbadfield', which is ambiguous with fields [SECONDBADFIELD,secondBadField]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"tHiRdBaDfIeLd","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"The projection has a single scalar type"}
{"Field":"thirdBadField","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'thirdBadField' would be materialized as 'thirdbadfield', which is ambiguous with fields [THIRDBADFIELD,tHiRdBaDfIeLd,thirdbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"thirdbadfield","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'thirdbadfield' would be materialized as 'thirdbadfield', which is ambiguous with fields [THIRDBADFIELD,tHiRdBaDfIeLd,thirdBadField]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}

update an existing materialization with ambiguous fields:
{"Field":"FIRSTBADFIELD","Type":5,"TypeString":"FIELD_FORBIDDEN","Reason":"Flow collection field 'FIRSTBADFIELD' is ambiguous with fields already being materialized as 'firstbadfield' in the destination. Consider using an alternate, unambiguous projection of this field to allow it to be materialized"}
{"Field":"SECONDBADFIELD","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'SECONDBADFIELD' would be materialized as 'secondbadfield', which is ambiguous with fields [secondBadField,secondbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"THIRDBADFIELD","Type":5,"TypeString":"FIELD_FORBIDDEN","Reason":"Flow collection field 'THIRDBADFIELD' is ambiguous with fields already being materialized as 'thirdbadfield' in the destination. Consider using an alternate, unambiguous projection of this field to allow it to be materialized"}
{"Field":"_meta/flow_truncated","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"The projection has a single scalar type"}
{"Field":"firstBadField","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"This location is part of the current materialization"}
{"Field":"firstbadfield","Type":5,"TypeString":"FIELD_FORBIDDEN","Reason":"Flow collection field 'firstbadfield' is ambiguous with fields already being materialized as 'firstbadfield' in the destination. Consider using an alternate, unambiguous projection of this field to allow it to be materialized"}
Expand All @@ -174,6 +183,9 @@ update an existing materialization with ambiguous fields:
{"Field":"key","Type":1,"TypeString":"FIELD_REQUIRED","Reason":"This field is a key in the current materialization"}
{"Field":"secondBadField","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'secondBadField' would be materialized as 'secondbadfield', which is ambiguous with fields [SECONDBADFIELD,secondbadfield]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"secondbadfield","Type":4,"TypeString":"FIELD_OPTIONAL","Reason":"Flow collection field 'secondbadfield' would be materialized as 'secondbadfield', which is ambiguous with fields [SECONDBADFIELD,secondBadField]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields"}
{"Field":"tHiRdBaDfIeLd","Type":5,"TypeString":"FIELD_FORBIDDEN","Reason":"Flow collection field 'tHiRdBaDfIeLd' is ambiguous with fields already being materialized as 'thirdbadfield' in the destination. Consider using an alternate, unambiguous projection of this field to allow it to be materialized"}
{"Field":"thirdBadField","Type":5,"TypeString":"FIELD_FORBIDDEN","Reason":"Flow collection field 'thirdBadField' is ambiguous with fields already being materialized as 'thirdbadfield' in the destination. Consider using an alternate, unambiguous projection of this field to allow it to be materialized"}
{"Field":"thirdbadfield","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"This location is part of the current materialization"}

field names over the length limit are forbidden:
{"Field":"_meta/flow_truncated","Type":3,"TypeString":"LOCATION_RECOMMENDED","Reason":"The projection has a single scalar type"}
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
6 changes: 6 additions & 0 deletions materialize-boilerplate/testdata/generate-spec-proto.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,10 @@ flowctl raw build \
--db-path ${TEMP_DIR}/build.db \
--source $1

errors=$(sqlite3 ${TEMP_DIR}/build.db "select * from errors")
if [[ -n "$errors" ]]; then
echo "$errors" >&2
exit 1
fi

sqlite3 ${TEMP_DIR}/build.db "select writefile('${OUTPUT}', spec) from built_materializations;"
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,12 @@ collections:
secondBadField: { type: string }
secondbadfield: { type: string }
SECONDBADFIELD: { type: string }
thirdBadField: { type: string }
thirdbadfield: { type: string }
THIRDBADFIELD: { type: string }
required: [key]
projections:
tHiRdBaDfIeLd: /THIRDBADFIELD
key: [/key]

materializations:
Expand All @@ -29,3 +34,4 @@ materializations:
key: {}
goodField: {}
firstBadField: {}
thirdbadfield: {} # Note: not the explicit projection
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
55 changes: 32 additions & 23 deletions materialize-boilerplate/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -136,19 +136,24 @@ func (v Validator) validateNewBinding(boundCollection pf.CollectionSpec, deltaUp
}
sawRoot = true
} else if ambiguousFields := v.ambiguousFields(p, boundCollection.Projections); len(ambiguousFields) > 0 && c.Type == pm.Response_Validated_Constraint_LOCATION_RECOMMENDED {
// Any fields that would be ambiguous to materialize should be marked as optional if
// they would otherwise be recommended. Only one of these fields should be selected to
// materialize.
c = &pm.Response_Validated_Constraint{
Type: pm.Response_Validated_Constraint_FIELD_OPTIONAL,
Reason: fmt.Sprintf(
// See identical "reason" text in validateMatchesExistingBinding for optional
// ambiguous field constraints. These two messages should be kept in sync.
"Flow collection field '%s' would be materialized as '%s', which is ambiguous with fields [%s]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields",
p.Field,
v.is.translateField(p.Field),
strings.Join(ambiguousFields, ","),
),
if p.Explicit {
// User-defined projections of ambiguous fields are allowed as-is.
} else {
// Any other fields that would be ambiguous to materialize are
// marked as optional if they would otherwise be recommended.
// Only one of these fields should be selected to materialize,
// and that will require manual field selection.
c = &pm.Response_Validated_Constraint{
Type: pm.Response_Validated_Constraint_FIELD_OPTIONAL,
Reason: fmt.Sprintf(
// See identical "reason" text in validateMatchesExistingBinding for optional
// ambiguous field constraints. These two messages should be kept in sync.
"Flow collection field '%s' would be materialized as '%s', which is ambiguous with fields [%s]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields",
p.Field,
v.is.translateField(p.Field),
strings.Join(ambiguousFields, ","),
),
}
}
}

Expand Down Expand Up @@ -227,16 +232,20 @@ func (v Validator) validateMatchesExistingBinding(
} else if c.Type == pm.Response_Validated_Constraint_LOCATION_RECOMMENDED || c.Type == pm.Response_Validated_Constraint_FIELD_OPTIONAL {
// None of these ambiguous fields have been selected yet, so it's still possible to
// pick one.
c = &pm.Response_Validated_Constraint{
Type: pm.Response_Validated_Constraint_FIELD_OPTIONAL,
Reason: fmt.Sprintf(
// See identical "reason" text in validateNewBinding for optional ambiguous
// field constraints. These two messages should be kept in sync.
"Flow collection field '%s' would be materialized as '%s', which is ambiguous with fields [%s]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields",
p.Field,
v.is.translateField(p.Field),
strings.Join(ambiguousFields, ","),
),
if p.Explicit {
// User-defined projections of ambiguous fields are allowed as-is.
} else {
c = &pm.Response_Validated_Constraint{
Type: pm.Response_Validated_Constraint_FIELD_OPTIONAL,
Reason: fmt.Sprintf(
// See identical "reason" text in validateNewBinding for optional ambiguous
// field constraints. These two messages should be kept in sync.
"Flow collection field '%s' would be materialized as '%s', which is ambiguous with fields [%s]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields",
p.Field,
v.is.translateField(p.Field),
strings.Join(ambiguousFields, ","),
),
}
}
}
} else if existingField, err := v.is.GetField(path, p.Field); err == nil {
Expand Down

0 comments on commit 0008448

Please sign in to comment.