-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Consistent SQL Query Generation #1015
Conversation
Nit: I think we should have a changelog entry for this since it was a community request. Also it affects output, however slightly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the most part the changes look reasonable apart from a few things that don't seem right (the enum.value call, the semantic_model.name removal, and the info log with no corresponding action).
Given the extent of the snapshot changes due to renames (several files will simply move to new locations and then get updated) I don't see how we can evaluate the output difference from them.
Do you have concrete examples you can point to/selectively add to this PR that illustrate the expected nature of these changes? It'd be nice to double-check things like "most prefixes aren't changing" and "semantic model names aren't really important" and "numbering is/is not shifting".
@@ -9,9 +9,61 @@ class IdPrefix(Enum): | |||
TODO: Move all ID prefixes here. | |||
""" | |||
|
|||
DATAFLOW_NODE_AGGREGATE_MEASURES_ID_PREFIX = "am" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit is a thing of beauty.
@@ -423,8 +424,8 @@ def create_sql_source_data_set(self, semantic_model: SemanticModel) -> SemanticM | |||
all_entity_instances: List[EntityInstance] = [] | |||
|
|||
all_select_columns: List[SqlSelectColumn] = [] | |||
from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") | |||
|
|||
# from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. For context, when I rewrite lines, I often comment out the old line, write the revised line, then remove the commented line. In this case, I forgot to remove the commented line.
metricflow/dag/id_prefix.py
Outdated
@@ -67,3 +67,7 @@ class IdPrefix(Enum): | |||
EXEC_PLAN_PREFIX = "ep" | |||
|
|||
MF_DAG = "mfd" | |||
|
|||
TIME_SPINE_SOURCE = "spine" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this change? Wouldn't keeping it time_spine_src
reduce snapshot thrash considerably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it in the interest of making the aliases shorter. I thought the snapshot thrash was fine.
dag_id=DagId.from_str( | ||
IdGeneratorRegistry.for_class(self.__class__).create_id(IdPrefix.GROUP_BY_ITEM_RESOLUTION_DAG.value) | ||
), | ||
dag_id=DagId.from_str(PrefixIdGenerator.create_next_id(IdPrefix.GROUP_BY_ITEM_RESOLUTION_DAG.value)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right, or should it be create_next_id(IdPrefix.GROUP_BY_ITEM_RESOLUTION_DAG)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is incorrect and has been fixed in a later commit that rolled in the fixes for the type-checker errors.
metricflow/dag/id_prefix.py
Outdated
@@ -67,3 +67,7 @@ class IdPrefix(Enum): | |||
EXEC_PLAN_PREFIX = "ep" | |||
|
|||
MF_DAG = "mfd" | |||
|
|||
TIME_SPINE_SOURCE = "spine" | |||
SEMANTIC_MODEL_SOURCE = "src" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. This is really nondescript. It's also not used as a prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") | ||
|
||
# from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") | ||
from_source_alias = PrefixIdGenerator.create_next_id(IdPrefix.SEMANTIC_MODEL_SOURCE).str_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big change since the constant now formats in src
instead of {semantic_model.name}_src
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought it simplified the SQL output with shorter aliases. What do you think?
@@ -376,6 +388,7 @@ def __init__( | |||
@log_call(module_name=__name__, telemetry_reporter=_telemetry_reporter) | |||
def query(self, mf_request: MetricFlowQueryRequest) -> MetricFlowQueryResult: # noqa: D | |||
logger.info(f"Starting query request:\n{indent(mf_pformat(mf_request))}") | |||
logger.info(f"Setting ID generation to start at: {MetricFlowEngine._QUERY_ID_START_VALUE}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reset call anywhere that I can find. Either remove the logging (and the constant, which is otherwise unused) or add the reset, whichever is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I was shuffling some things around and did not complete the change. Please see the added commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe it's simpler to see if I separated out the snapshot changes. I've added commits that revert all of them except the numbering change and we can take a look at the other changes in a later PR.
@@ -423,8 +424,8 @@ def create_sql_source_data_set(self, semantic_model: SemanticModel) -> SemanticM | |||
all_entity_instances: List[EntityInstance] = [] | |||
|
|||
all_select_columns: List[SqlSelectColumn] = [] | |||
from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") | |||
|
|||
# from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. For context, when I rewrite lines, I often comment out the old line, write the revised line, then remove the commented line. In this case, I forgot to remove the commented line.
from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") | ||
|
||
# from_source_alias = IdGeneratorRegistry.for_class(self.__class__).create_id(f"{semantic_model.name}_src") | ||
from_source_alias = PrefixIdGenerator.create_next_id(IdPrefix.SEMANTIC_MODEL_SOURCE).str_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought it simplified the SQL output with shorter aliases. What do you think?
metricflow/dag/id_prefix.py
Outdated
@@ -67,3 +67,7 @@ class IdPrefix(Enum): | |||
EXEC_PLAN_PREFIX = "ep" | |||
|
|||
MF_DAG = "mfd" | |||
|
|||
TIME_SPINE_SOURCE = "spine" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it in the interest of making the aliases shorter. I thought the snapshot thrash was fine.
metricflow/dag/id_prefix.py
Outdated
@@ -67,3 +67,7 @@ class IdPrefix(Enum): | |||
EXEC_PLAN_PREFIX = "ep" | |||
|
|||
MF_DAG = "mfd" | |||
|
|||
TIME_SPINE_SOURCE = "spine" | |||
SEMANTIC_MODEL_SOURCE = "src" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dag_id=DagId.from_str( | ||
IdGeneratorRegistry.for_class(self.__class__).create_id(IdPrefix.GROUP_BY_ITEM_RESOLUTION_DAG.value) | ||
), | ||
dag_id=DagId.from_str(PrefixIdGenerator.create_next_id(IdPrefix.GROUP_BY_ITEM_RESOLUTION_DAG.value)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is incorrect and has been fixed in a later commit that rolled in the fixes for the type-checker errors.
@@ -376,6 +388,7 @@ def __init__( | |||
@log_call(module_name=__name__, telemetry_reporter=_telemetry_reporter) | |||
def query(self, mf_request: MetricFlowQueryRequest) -> MetricFlowQueryResult: # noqa: D | |||
logger.info(f"Starting query request:\n{indent(mf_pformat(mf_request))}") | |||
logger.info(f"Setting ID generation to start at: {MetricFlowEngine._QUERY_ID_START_VALUE}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - I was shuffling some things around and did not complete the change. Please see the added commits.
b32103a
to
5b66b01
Compare
MetricFlowEngine
5b66b01
to
44236bd
Compare
cfe58ae
to
f51b845
Compare
c1a817c
to
0f903ff
Compare
4e569e8
to
5c652ff
Compare
0f903ff
to
05105e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, the updated form of this is a lot easier to reason about in terms of the changes I'm seeing.
pass | ||
|
||
|
||
class StaticIdPrefix(IdPrefix, Enum, metaclass=EnumMetaClassHelper): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see.
The other option is to make IdPrefix a protocol, but we've been over the tradeoffs there and I think having it be a superclass makes sense even though enum-type subclasses are a bit weird.
afe1d19
to
7840539
Compare
The majority of prefixes used for ID generation were previously listed as constants in a module. For improved encapsulation / structure, this commit moves them to the enum IdPrefix.
Instead of using IdGeneratorRegistry.for_class(), this commit updates those callsites to use PrefixIdGenerator instead. This also moves cases where strings were used to the IdPrefix enum.
There were some cases where the DagId could be passed into the initializer, but it was seldom used. This mades the ID generation more automated and also makes use of IdPrefix instead of strings.
05105e8
to
f4b1f1e
Compare
Resolves #1020
Description
This PR includes updates to make the SQL generated by
MetricFlowEnigne
more consistent. Previously, they were not consistent because there were SQL elements (e.g. sub-query aliases) were generated via a counter that carried state from query to query. This PR adds facilities for resetting the ID generation counters, and also streamlines / simplifies how IDs are handled. Please view by commit.