-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cli): Make consistent use of DataHubGraphClientConfig #10466
Conversation
@@ -541,6 +546,9 @@ def _relationships_endpoint(self): | |||
def _aspect_count_endpoint(self): | |||
return f"{self.config.server}/aspects?action=getCount" | |||
|
|||
def _session(self) -> Session: | |||
return super()._session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look right
I don't think it's necessary, unless you want _session to be a method instead of an attr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able in the DataHubGraph object to pull attributes from the super class, so I built this function to surface that. At least the auto-import in Intellij wasn't finding it.
What would be the right way to surface this super
's attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be able to do graph._session without any changes
if you really want a method, add a @property
decorator
return get_url_and_token()[1] | ||
|
||
|
||
def get_session_and_host(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to reduce the amount of code changes, you could make this call get_default_graph() and then return the tuple
basically keep it as a compatibility shim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing that causes a cyclical import dependency. It was my first implementation attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put the import inside the get_session_and_host
method to avoid that issue
64b9ec4
to
8df3e7e
Compare
This is a GREAT contribution for code quality. @hsheth2 what needs to happen to get this in? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor nits, but otherwise lgtm
apologies for not reviewing this earlier - I thought it was still in progress given the WIP title
logger.debug( | ||
f"Failed to retrieve config from file {config_utils.DATAHUB_CONFIG_PATH}: {e}. This isn't fatal." | ||
) | ||
config_dict = {**previous_config, **config.dict()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this merge logic is strange and probably hurts correctness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function already existed in metadata-ingestion/src/datahub/cli/config_utils.py
, I simply moved it to the client. It is used when users call datahub init
when users fill out a form to configure connectivity to DataHub. A user will provide partial information like the GMS host & token, that information should merge with other configs like retry_max_times.
What is your concern with correctness here? Can you provide more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's existing behavior then we can leave it as-is. however, I do think we should be overwriting here - having the write_gms_config method not fully overwrite is highly unexpected
WalkthroughThe updates to the Datahub CLI enhance configuration management, particularly with the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Environment
participant DatahubAPI
User->>CLI: Execute CLI Command
CLI->>Environment: Read ~/.datahubenv
Environment-->>CLI: Get Configurations
CLI->>DatahubAPI: Call with Configurations
DatahubAPI-->>CLI: Response
CLI-->>User: Display Result
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range, codebase verification and nitpick comments (2)
metadata-ingestion/src/datahub/cli/config_utils.py (1)
33-38
: Ensure proper handling ofNone
return type fromget_client_config
.The function
get_client_config
might returnNone
if there is an error in loading the YAML configuration. However, the following usages do not handle this case and assume it always returns a dictionary:
metadata-ingestion/src/datahub/cli/lite_cli.py
- Lines around
client_config_dict = get_client_config()
- Lines around
cli_config = get_client_config()
metadata-ingestion/src/datahub/ingestion/graph/client.py
- Lines around
client_config_dict = config_utils.get_client_config()
- Lines around
previous_config = config_utils.get_client_config()
Please add checks to handle the
None
return type to avoid potential runtime errors.Analysis chain
LGTM! But verify the function usage in the codebase.
The code changes are approved.
However, ensure that all function calls to
get_client_config
are correctly updated to handle the new return type.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_client_config` are correctly updated to handle the new return type. # Test: Search for the function usage. Expect: Only occurrences of the new return type handling. rg --type python -A 5 $'get_client_config'Length of output: 72
Script:
#!/bin/bash # Description: Verify all function calls to `get_client_config` are correctly updated to handle the new return type. # Test: Search for the function usage. Expect: Only occurrences of the new return type handling. rg --type py -A 5 $'get_client_config'Length of output: 3313
Script:
#!/bin/bash # Description: Verify that the return value of `get_client_config` is handled properly in the identified files. # Check for handling of None return type in lite_cli.py rg --type py -A 5 'get_client_config' metadata-ingestion/src/datahub/cli/lite_cli.py # Check for handling of None return type in client.py rg --type py -A 5 'get_client_config' metadata-ingestion/src/datahub/ingestion/graph/client.pyLength of output: 1346
metadata-ingestion/src/datahub/ingestion/graph/client.py (1)
1791-1806
: Consider adding type hints for variables.Adding type hints for
host
,port
,token
,protocol
, andurl
can improve code readability and maintainability.def get_details_from_env() -> Tuple[Optional[str], Optional[str]]: host: Optional[str] = os.environ.get(ENV_METADATA_HOST) port: Optional[str] = os.environ.get(ENV_METADATA_PORT) token: Optional[str] = os.environ.get(ENV_METADATA_TOKEN) protocol: str = os.environ.get(ENV_METADATA_PROTOCOL, "http") url: Optional[str] = os.environ.get(ENV_METADATA_HOST_URL) if port is not None: url = f"{protocol}://{host}:{port}" return url, token # The reason for using host as URL is backward compatibility # If port is not being used we assume someone is using host env var as URL if url is None and host is not None: logger.warning( f"Do not use {ENV_METADATA_HOST} as URL. Use {ENV_METADATA_HOST_URL} instead" ) return url or host, token
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (26)
- docs/how/updating-datahub.md (1 hunks)
- metadata-ingestion/src/datahub/cli/cli_utils.py (9 hunks)
- metadata-ingestion/src/datahub/cli/config_utils.py (2 hunks)
- metadata-ingestion/src/datahub/cli/delete_cli.py (2 hunks)
- metadata-ingestion/src/datahub/cli/get_cli.py (2 hunks)
- metadata-ingestion/src/datahub/cli/ingest_cli.py (4 hunks)
- metadata-ingestion/src/datahub/cli/lite_cli.py (3 hunks)
- metadata-ingestion/src/datahub/cli/migrate.py (6 hunks)
- metadata-ingestion/src/datahub/cli/migration_utils.py (3 hunks)
- metadata-ingestion/src/datahub/cli/put_cli.py (1 hunks)
- metadata-ingestion/src/datahub/cli/timeline_cli.py (2 hunks)
- metadata-ingestion/src/datahub/entrypoints.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/graph/client.py (5 hunks)
- metadata-ingestion/src/datahub/ingestion/run/pipeline_config.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/sink/datahub_rest.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/metadata/lineage.py (2 hunks)
- metadata-ingestion/src/datahub/upgrade/upgrade.py (3 hunks)
- metadata-ingestion/tests/unit/test_cli_utils.py (5 hunks)
- smoke-test/tests/cli/datahub_cli.py (5 hunks)
- smoke-test/tests/cli/datahub_graph_test.py (2 hunks)
- smoke-test/tests/delete/delete_test.py (4 hunks)
- smoke-test/tests/lineage/test_lineage.py (3 hunks)
- smoke-test/tests/patch/test_dataset_patches.py (7 hunks)
- smoke-test/tests/telemetry/telemetry_test.py (3 hunks)
- smoke-test/tests/timeline/timeline_test.py (1 hunks)
- smoke-test/tests/utils.py (2 hunks)
Files skipped from review due to trivial changes (1)
- metadata-ingestion/src/datahub/entrypoints.py
Additional context used
Markdownlint
docs/how/updating-datahub.md
83-83: null
Bare URL used(MD034, no-bare-urls)
Additional comments not posted (66)
metadata-ingestion/src/datahub/cli/config_utils.py (2)
Line range hint
15-18
:
LGTM!The function correctly persists the configuration dictionary to a YAML file.
21-22
: LGTM!The function correctly checks the environment variable to determine if the configuration should be skipped.
smoke-test/tests/telemetry/telemetry_test.py (2)
13-22
: LGTM!The function correctly initializes a client and enhances the context in which aspects are fetched.
31-40
: LGTM!The function correctly initializes a client and enhances the context in which aspects are fetched.
metadata-ingestion/src/datahub/cli/get_cli.py (1)
48-58
: LGTM!The function correctly initializes a client and enhances the context in which aspects are fetched.
metadata-ingestion/tests/unit/test_cli_utils.py (8)
20-20
: LGTM!The change to use
get_details_from_env()
directly is correct.
27-27
: LGTM!The change to use
get_details_from_env()
directly is correct.
39-39
: LGTM!The change to use
get_details_from_env()
directly is correct.
52-52
: LGTM!The change to use
get_details_from_env()
directly is correct.
62-62
: LGTM!The change to use
get_details_from_env()
directly is correct.
Line range hint
29-34
: LGTM!The change to use
get_default_graph()
andclient.get_aspect()
is correct.
41-41
: LGTM!The change to use
client.get_aspect()
is correct.
61-62
: LGTM!The change to use
client.get_aspect()
is correct.metadata-ingestion/src/datahub/cli/put_cli.py (1)
49-50
: LGTM!The change to instantiate the client using
get_default_graph()
is correct.smoke-test/tests/cli/datahub_cli.py (8)
29-34
: LGTM!The change to use
get_default_graph()
andclient.get_aspect()
is correct.
41-41
: LGTM!The change to use
client.get_aspect()
is correct.
61-62
: LGTM!The change to use
client.get_aspect()
is correct.
74-76
: LGTM!The change to use
get_default_graph()
andclient.get_aspect()
is correct.
81-81
: LGTM!The change to use
client.get_aspect()
is correct.
90-90
: LGTM!The change to use
client.get_aspect()
is correct.
106-107
: LGTM!The change to use
client.get_aspect()
is correct.
109-109
: LGTM!The change to use
client.get_aspect()
is correct.smoke-test/tests/delete/delete_test.py (4)
69-73
: LGTM! Verify theget_aspects_for_entity
function usage.The repeated call to
get_aspects_for_entity
with the new parameters is consistent with the previous context.However, ensure that the
get_aspects_for_entity
function correctly handles these parameters.
88-99
: LGTM! Verify theget_aspects_for_entity
function usage.The final assertions using
get_aspects_for_entity
with the new parameters are consistent with the previous context.However, ensure that the
get_aspects_for_entity
function correctly handles these parameters.
41-43
: LGTM! Verify the new client object usage.The introduction of the new client object and deriving the session and host from it enhances encapsulation and clarity.
However, ensure that the
get_datahub_graph
function correctly initializes the client object and its attributes.Verification successful
LGTM! Verify the new client object usage.
The introduction of the new client object and deriving the session and host from it enhances encapsulation and clarity. The
get_datahub_graph
function correctly initializes the client object and its attributes.
smoke-test/tests/delete/delete_test.py
: Lines 41-43Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `get_datahub_graph` function initializes the client object correctly. # Test: Search for the function implementation. Expect: Correct initialization of the client object. rg --type python -A 10 $'def get_datahub_graph'Length of output: 79
Script:
#!/bin/bash # Description: Verify the `get_datahub_graph` function initializes the client object correctly. # Test: Search for the function implementation. Expect: Correct initialization of the client object. rg --type py -A 10 $'def get_datahub_graph'Length of output: 575
Script:
#!/bin/bash # Description: Verify the `get_default_graph` function initializes the client object correctly. # Test: Search for the function implementation. Expect: Correct initialization of the client object. rg --type py -A 10 $'def get_default_graph'Length of output: 901
Script:
#!/bin/bash # Description: Verify the `load_client_config` function provides the correct configuration for the `DataHubGraph`. # Test: Search for the function implementation. Expect: Correct configuration loading. rg --type py -A 10 $'def load_client_config'Length of output: 1116
Script:
#!/bin/bash # Description: Verify the `config_utils.get_client_config` function retrieves the correct configuration. # Test: Search for the function implementation. Expect: Correct configuration retrieval. rg --type py -A 10 $'def get_client_config'Length of output: 692
47-58
: LGTM! Verify theget_aspects_for_entity
function usage.The updated calls to
get_aspects_for_entity
now include thesession
andgms_host
parameters, which improves context.However, ensure that the
get_aspects_for_entity
function correctly handles these parameters.Verification successful
Verified: The
get_aspects_for_entity
function inmetadata-ingestion/src/datahub/cli/cli_utils.py
correctly handles thesession
andgms_host
parameters.
- The function passes these parameters to other functions like
get_entity
andget_latest_timeseries_aspect_values
, ensuring they are used appropriately within the function.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `get_aspects_for_entity` function handles the new parameters correctly. # Test: Search for the function implementation. Expect: Correct handling of `session` and `gms_host`. rg --type python -A 10 $'def get_aspects_for_entity'Length of output: 84
Script:
#!/bin/bash # Description: Verify the `get_aspects_for_entity` function handles the new parameters correctly. # Locate Python files and search for the function definition within them. fd -e py -x rg 'def get_aspects_for_entity' {} +Length of output: 48689
Script:
#!/bin/bash # Description: Extract and review the `get_aspects_for_entity` function definitions from the identified files. # Extract the function definition from the first file. rg -A 20 'def get_aspects_for_entity' metadata-ingestion/src/datahub/ingestion/graph/client.py # Extract the function definition from the second file. rg -A 20 'def get_aspects_for_entity' metadata-ingestion/src/datahub/cli/cli_utils.pyLength of output: 2370
smoke-test/tests/cli/datahub_graph_test.py (2)
1-2
: ImportOptional
fromtyping
.The import of
Optional
is necessary for the updated type annotation.
41-43
: LGTM! Verify the new client object usage and type annotation.The renaming of the
graph
variable toclient
enhances clarity. The updated type annotation forschema_metadata
handles optional return types.However, ensure that the
DataHubGraph
initialization andget_aspect_v2
method handle the new client object and optional return type correctly.Verification successful
Verified: The new client object usage and type annotation are correctly handled.
The
DataHubGraph
initialization andget_aspect_v2
method correctly handle the new client object and the optional return type.
DataHubGraph
class implementation:metadata-ingestion/src/datahub/ingestion/graph/client.py
get_aspect_v2
method definition and usage:metadata-ingestion/src/datahub/ingestion/graph/client.py
,smoke-test/tests/cli/datahub_graph_test.py
, and other test files.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `DataHubGraph` initialization and `get_aspect_v2` method handle the new client object and optional return type correctly. # Test: Search for the class implementation and method usage. Expect: Correct initialization and handling of optional return type. rg --type python -A 10 $'class DataHubGraph' rg --type python -A 10 $'def get_aspect_v2'Length of output: 152
Script:
#!/bin/bash # Description: Verify the `DataHubGraph` initialization and `get_aspect_v2` method handle the new client object and optional return type correctly. # Search for the class implementation of DataHubGraph rg 'class DataHubGraph' -A 10 # Search for the definition of the get_aspect_v2 method rg 'def get_aspect_v2' -A 10 # Search for the usage of the get_aspect_v2 method to ensure it handles the optional return type correctly rg 'get_aspect_v2' -A 10Length of output: 15064
metadata-ingestion/src/datahub/ingestion/run/pipeline_config.py (3)
3-7
: Importroot_validator
andvalidator
frompydantic
.The import of
root_validator
andvalidator
is necessary for the new validators.
106-121
: LGTM! Verify the new root validator implementation.The new root validator
default_sink_is_datahub_rest
sets a default sink configuration if none is provided, ensuring environment variables are resolved appropriately.However, ensure that the
load_client_config
function andconfig_loader.resolve_env_variables
method handle the configuration correctly.Verification successful
Verified: The
load_client_config
function andconfig_loader.resolve_env_variables
method handle the configuration correctly.
load_client_config
is responsible for loading and parsing the client configuration, and it handles validation errors appropriately.resolve_env_variables
recursively resolves environment variables within a configuration dictionary.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `load_client_config` function and `config_loader.resolve_env_variables` method handle the configuration correctly. # Test: Search for the function and method implementations. Expect: Correct handling of the configuration. rg --type python -A 10 $'def load_client_config' rg --type python -A 10 $'def resolve_env_variables'Length of output: 164
Script:
#!/bin/bash # Description: Verify the `load_client_config` function and `config_loader.resolve_env_variables` method handle the configuration correctly. # Test: Search for the function and method implementations. Expect: Correct handling of the configuration. rg -A 10 $'def load_client_config' rg -A 10 $'def resolve_env_variables'Length of output: 2292
123-133
: LGTM! Verify the new validator implementation.The new validator
datahub_api_should_use_rest_sink_as_default
ensures that a valid API configuration is always provided when using the default sink.However, ensure that the
DatahubClientConfig.parse_obj_allow_extras
method handles the configuration correctly.Verification successful
The
DatahubClientConfig.parse_obj_allow_extras
method correctly handles the configuration by allowing extra fields temporarily and then parsing the object.
- The method implementation ensures compatibility with both Pydantic version 2 and earlier versions.
- The configuration is correctly handled by temporarily modifying the model configuration to allow extra fields.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `DatahubClientConfig.parse_obj_allow_extras` method handles the configuration correctly. # Test: Search for the method implementation. Expect: Correct handling of the configuration. rg --type python -A 10 $'def parse_obj_allow_extras'Length of output: 84
Script:
#!/bin/bash # Description: Verify the `DatahubClientConfig.parse_obj_allow_extras` method handles the configuration correctly. # Test: Search for the method implementation. Expect: Correct handling of the configuration. rg -A 10 'def parse_obj_allow_extras'Length of output: 1099
Script:
#!/bin/bash # Description: Verify the `DatahubClientConfig.parse_obj_allow_extras` method handles the configuration correctly. # Test: Search for the complete method implementation. Expect: Correct handling of the configuration. rg -A 20 'def parse_obj_allow_extras'Length of output: 2120
smoke-test/tests/utils.py (1)
123-123
: LGTM! Simplifies the function implementation.The change to use
get_default_graph
simplifies the function by reducing dependency on explicit configuration.However, ensure that
get_default_graph
is correctly used throughout the codebase.Verification successful
LGTM! Simplifies the function implementation.
The change to use
get_default_graph
simplifies the function by reducing dependency on explicit configuration. The usage ofget_default_graph
is consistent and correct throughout the codebase.
smoke-test/tests/utils.py
smoke-test/tests/telemetry/telemetry_test.py
smoke-test/tests/lineage/test_lineage.py
smoke-test/tests/patch/test_dataset_patches.py
smoke-test/tests/cli/datahub_cli.py
smoke-test/tests/cli/user_groups_cmd/test_group_cmd.py
metadata-ingestion/src/datahub/entrypoints.py
metadata-ingestion/src/datahub/ingestion/graph/client.py
metadata-ingestion/src/datahub/ingestion/source/metadata/lineage.py
metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py
metadata-ingestion/src/datahub/ingestion/run/pipeline.py
metadata-ingestion/src/datahub/integrations/assertion/common.py
metadata-ingestion/src/datahub/cli/put_cli.py
metadata-ingestion/src/datahub/cli/migrate.py
metadata-ingestion/src/datahub/cli/ingest_cli.py
metadata-ingestion/src/datahub/cli/migration_utils.py
metadata-ingestion/src/datahub/cli/timeline_cli.py
metadata-ingestion/src/datahub/api/entities/forms/forms.py
metadata-ingestion/src/datahub/api/entities/structuredproperties/structuredproperties.py
metadata-ingestion/src/datahub/cli/delete_cli.py
metadata-ingestion/src/datahub/cli/state_cli.py
metadata-ingestion/src/datahub/cli/get_cli.py
metadata-ingestion/src/datahub/cli/specific/datacontract_cli.py
metadata-ingestion/src/datahub/cli/specific/structuredproperties_cli.py
metadata-ingestion/src/datahub/cli/specific/forms_cli.py
metadata-ingestion/src/datahub/cli/specific/assertions_cli.py
metadata-ingestion/src/datahub/cli/specific/user_cli.py
metadata-ingestion/src/datahub/cli/exists_cli.py
metadata-ingestion/src/datahub/cli/specific/group_cli.py
metadata-ingestion/src/datahub/cli/specific/dataset_cli.py
metadata-ingestion/src/datahub/cli/check_cli.py
metadata-ingestion/src/datahub/cli/specific/dataproduct_cli.py
metadata-ingestion/examples/perf/lineage_perf_example.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `get_default_graph` in the codebase. # Test: Search for the function usage. Expect: Consistent usage of `get_default_graph`. rg --type python -A 5 $'get_default_graph'Length of output: 72
Script:
#!/bin/bash # Description: Verify the usage of `get_default_graph` in the codebase. # Test: Search for the function usage. Expect: Consistent usage of `get_default_graph`. rg --glob '*.py' -A 5 'get_default_graph'Length of output: 52166
smoke-test/tests/timeline/timeline_test.py (1)
182-188
: LGTM! Improves session and server configuration management.The change to use
get_datahub_graph
improves the function's ability to manage sessions and server configurations dynamically.However, ensure that
get_datahub_graph
is correctly used throughout the codebase.Verification successful
LGTM! Improves session and server configuration management.
The change to use
get_datahub_graph
improves the function's ability to manage sessions and server configurations dynamically. The usage ofget_datahub_graph
is consistent and correct throughout the codebase.
smoke-test/tests/utils.py
smoke-test/tests/timeline/timeline_test.py
smoke-test/tests/delete/delete_test.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `get_datahub_graph` in the codebase. # Test: Search for the function usage. Expect: Consistent usage of `get_datahub_graph`. rg --type python -A 5 $'get_datahub_graph'Length of output: 72
Script:
#!/bin/bash # Description: Verify the usage of `get_datahub_graph` in the codebase. # Test: Search for the function usage. Expect: Consistent usage of `get_datahub_graph`. rg -t py -A 5 'get_datahub_graph'Length of output: 5723
metadata-ingestion/src/datahub/cli/timeline_cli.py (1)
66-68
: LGTM! Improves session and server configuration management.The change to use
get_default_graph
improves the function's ability to manage sessions and server configurations dynamically.However, ensure that
get_default_graph
is correctly used throughout the codebase.Verification successful
Verified: Consistent usage of
get_default_graph
The function
get_default_graph
is consistently used across the codebase, confirming its correct integration and usage.
- Lines in
metadata-ingestion/src/datahub/cli/timeline_cli.py
are correctly utilizingget_default_graph
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `get_default_graph` in the codebase. # Test: Search for the function usage. Expect: Consistent usage of `get_default_graph`. rg --type python -A 5 $'get_default_graph'Length of output: 72
Script:
#!/bin/bash # Description: Verify the usage of `get_default_graph` in the codebase. # Locate all Python files and search for the function usage. fd -e py -x rg 'get_default_graph' {}Length of output: 4470
metadata-ingestion/src/datahub/cli/migration_utils.py (3)
246-246
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured.
276-288
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured. The return type change toIterable[RelatedEntity]
improves type safety.
292-304
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured. The return type change toIterable[RelatedEntity]
improves type safety.metadata-ingestion/src/datahub/ingestion/source/metadata/lineage.py (1)
214-215
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured.smoke-test/tests/patch/test_dataset_patches.py (5)
75-84
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured. The additional assertions improve test robustness.
100-101
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured. The additional assertions improve test robustness.
119-120
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured. The additional assertions improve test robustness.
158-158
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured. The additional assertions improve test robustness.
219-219
: LGTM!The change to use
get_default_graph
ensures that the client instance is correctly configured. The additional assertions improve test robustness.metadata-ingestion/src/datahub/ingestion/sink/datahub_rest.py (1)
Line range hint
51-51
:
Verify the removal ofset_env_variables_override_config
.The removal of
set_env_variables_override_config
suggests a change in how environment variables are managed. Ensure that this functionality is no longer needed or is handled elsewhere.metadata-ingestion/src/datahub/cli/lite_cli.py (2)
48-50
: Verify the new usage ofget_client_config
.The change in the function call to
get_client_config
suggests a shift in how the client configuration is retrieved. Ensure that the new usage is appropriate and does not introduce any issues.
312-314
: Verify the new usage ofget_client_config
.The change in the function call to
get_client_config
suggests a shift in how the client configuration is retrieved. Ensure that the new usage is appropriate and does not introduce any issues.metadata-ingestion/src/datahub/cli/cli_utils.py (5)
97-100
: Verify the new parameters inpost_rollback_endpoint
.The addition of
session
andgms_host
parameters suggests a shift towards explicit dependency injection. Ensure that the new parameters are used appropriately and do not introduce any issues.
133-136
: Verify the new parameters inget_entity
.The addition of
session
andgms_host
parameters suggests a shift towards explicit dependency injection. Ensure that the new parameters are used appropriately and do not introduce any issues.
161-165
: Verify the new parameters inpost_entity
.The addition of
session
andgms_host
parameters suggests a shift towards explicit dependency injection. Ensure that the new parameters are used appropriately and do not introduce any issues.
207-210
: Verify the new parameters inget_latest_timeseries_aspect_values
.The addition of
session
andgms_host
parameters suggests a shift towards explicit dependency injection. Ensure that the new parameters are used appropriately and do not introduce any issues.
230-233
: Verify the new parameters inget_aspects_for_entity
.The addition of
session
andgms_host
parameters suggests a shift towards explicit dependency injection. Ensure that the new parameters are used appropriately and do not introduce any issues.metadata-ingestion/src/datahub/upgrade/upgrade.py (2)
130-132
: LGTM! But verify the function usage in the codebase.The changes to use
load_client_config
instead ofcli_utils.get_url_and_token
enhance maintainability and consistency.However, ensure that all function calls to
get_server_version_stats
handle the new approach correctly.Verification successful
LGTM! But verify the function usage in the codebase.
The changes to use
load_client_config
instead ofcli_utils.get_url_and_token
enhance maintainability and consistency.However, ensure that all function calls to
get_server_version_stats
handle the new approach correctly.
- The function
get_server_version_stats
is only used within the same file where it is defined (metadata-ingestion/src/datahub/upgrade/upgrade.py
).Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_server_version_stats` handle the new approach correctly. # Test: Search for the function usage. Expect: Only occurances of the new approach. rg --type python -A 5 $'get_server_version_stats'Length of output: 79
Script:
#!/bin/bash # Description: Verify all function calls to `get_server_version_stats` handle the new approach correctly. # Test: Search for the function usage. Expect: Only occurrences of the new approach. rg --type py -A 5 'get_server_version_stats'Length of output: 1052
103-113
: LGTM! But verify the function usage in the codebase.The changes to make the
token
parameter optional enhance flexibility. The conditional construction of headers is a good practice.However, ensure that all function calls to
get_server_config
handle the optionaltoken
parameter correctly.metadata-ingestion/src/datahub/cli/migrate.py (2)
152-162
: LGTM!The changes to use the
graph
object for retrieving URNs and aspects enhance maintainability and consistency.
406-435
: LGTM!The addition of the
batch_get_ids
function improves efficiency and enhances code clarity with type hints.metadata-ingestion/src/datahub/cli/ingest_cli.py (3)
430-432
: LGTM!The changes to use the
get_default_graph
method for obtaining the client instance enhance encapsulation and streamline the process of interacting with the DataHub API.
481-483
: LGTM!The changes to use the
get_default_graph
method for obtaining the client instance enhance encapsulation and streamline the process of interacting with the DataHub API.
Line range hint
531-549
:
LGTM!The changes to use the
get_default_graph
method for obtaining the client instance and the removal of the connectivity check enhance encapsulation and streamline the process of interacting with the DataHub API.metadata-ingestion/src/datahub/cli/delete_cli.py (2)
126-126
: Initialize the client correctly.The client initialization using
get_default_graph()
is correct and enhances modularity.
143-145
: Ensure the client session and server configuration are correctly used.The updated function call to
cli_utils.post_rollback_endpoint
correctly uses the client session and server configuration, enhancing configurability.docs/how/updating-datahub.md (1)
83-83
: Document the new configuration functionality clearly.The documentation accurately describes the new functionality, allowing settings in
~/.datahubenv
to be respected across all CLI calls. Ensure users are aware of the newdisable_ssl_verification
setting.Tools
Markdownlint
83-83: null
Bare URL used(MD034, no-bare-urls)
smoke-test/tests/lineage/test_lineage.py (2)
848-848
: Initialize the graph correctly.The graph initialization using
get_default_graph()
is correct and simplifies the code.
893-893
: Initialize the graph correctly.The graph initialization using
get_default_graph()
is correct and simplifies the code.metadata-ingestion/src/datahub/ingestion/graph/client.py (3)
93-97
: LGTM! Environment variable constants are correctly defined.The new environment variable constants are straightforward and correctly defined.
1838-1845
: LGTM! The function is straightforward and includes proper logging.The
ensure_datahub_config
function is well-structured and includes proper logging for user feedback.
1847-1864
: Review the merge logic for potential correctness issues.The merge logic for the configuration dictionary might hurt correctness. Ensure that the merging process does not overwrite necessary configurations.
# TODO: Replace with client.emit, requires figuring out the correct subsclass of _Aspect to create from the data | ||
status = post_entity( | ||
client._session, | ||
client.config.server, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: Address the TODO comment.
The TODO comment indicates that further improvements are planned.
Do you want me to generate the code to replace post_entity
with client.emit
or open a GitHub issue to track this task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, generate the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to process this comment: maximum number of iterations reached.
Analysis chain
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for subclasses of _Aspect in the datahub.metadata.schema_classes module.
ast-grep --lang python --pattern $'class $_(_Aspect):' --json
Length of output: 65
Script:
#!/bin/bash
# Search for classes in the datahub.metadata.schema_classes module to identify potential aspect classes.
ast-grep --lang python --pattern $'class $_:' --json
Length of output: 56
Script:
#!/bin/bash
# Search for all class definitions in the datahub directory to identify potential aspect classes.
ast-grep --lang python --pattern $'class $_:' --json
Length of output: 56
Script:
#!/bin/bash
# Use rg to search for class definitions in the datahub directory to identify potential aspect classes.
rg 'class ' --glob '**/*.py'
Length of output: 189687
Script:
#!/bin/bash
# Examine the datahub.metadata.schema_classes module and its imports to identify the correct aspect classes.
rg 'import' metadata-ingestion/src/datahub/metadata/schema_classes.py
rg 'from' metadata-ingestion/src/datahub/metadata/schema_classes.py
Length of output: 333
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range, codebase verification and nitpick comments (3)
metadata-ingestion/src/datahub/ingestion/graph/client.py (3)
1791-1806
: Improve the logger warning message.The logger warning should be more explicit about the potential issues with using the host variable as a URL directly.
- logger.warning(f"Do not use {ENV_METADATA_HOST} as URL. Use {ENV_METADATA_HOST_URL} instead") + logger.warning(f"Using {ENV_METADATA_HOST} as URL is deprecated and may lead to unexpected behavior. Please use {ENV_METADATA_HOST_URL} instead.")
595-597
: Remove the commented-out function.The commented-out function is not needed and should be removed to improve code readability.
- # def _session(self) -> Session: - # return super()._session
1809-1833
: Ensureconfig_override
is defined.The
config_override
dictionary is used without being defined in the function or elsewhere in the main codebase, which might lead to issues if it is not defined. Please define or importconfig_override
to avoid runtime errors.
- File:
metadata-ingestion/src/datahub/ingestion/graph/client.py
- Lines: 1809-1833
Analysis chain
Ensure
config_override
is defined.The
config_override
dictionary is used without being defined in the function, which might lead to issues if it is not defined elsewhere in the code.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that `config_override` is defined elsewhere in the codebase. # Test: Search for the definition of `config_override`. Expect: Definition of `config_override`. rg --type python 'config_override\s*='Length of output: 71
Script:
#!/bin/bash # Description: Verify that `config_override` is defined elsewhere in the codebase. # Test: Search for the definition of `config_override`. Expect: Definition of `config_override`. rg -t py 'config_override\s*='Length of output: 249
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/graph/client.py (5 hunks)
Additional comments not posted (3)
metadata-ingestion/src/datahub/ingestion/graph/client.py (3)
1836-1843
: LGTM!The code changes are approved.
1784-1786
: LGTM!The code changes are approved.
1778-1782
: LGTM!The code changes are approved.
def write_gms_config( | ||
host: str, token: Optional[str], merge_with_previous: bool = True | ||
) -> None: | ||
config = DatahubConfig(gms=DatahubClientConfig(server=host, token=token)) | ||
if merge_with_previous: | ||
try: | ||
previous_config = config_utils.get_client_config() | ||
assert isinstance(previous_config, dict) | ||
except Exception as e: | ||
# ok to fail on this | ||
previous_config = {} | ||
logger.debug( | ||
f"Failed to retrieve config from file {config_utils.DATAHUB_CONFIG_PATH}: {e}. This isn't fatal." | ||
) | ||
config_dict = {**previous_config, **config.dict()} | ||
else: | ||
config_dict = config.dict() | ||
config_utils.persist_datahub_config(config_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review the merge logic.
The merge logic might lead to unexpected behavior if the existing configuration contains conflicting settings. Consider overwriting the existing configuration by default.
- config_dict = {**previous_config, **config.dict()}
+ config_dict = {**config.dict(), **previous_config}
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def write_gms_config( | |
host: str, token: Optional[str], merge_with_previous: bool = True | |
) -> None: | |
config = DatahubConfig(gms=DatahubClientConfig(server=host, token=token)) | |
if merge_with_previous: | |
try: | |
previous_config = config_utils.get_client_config() | |
assert isinstance(previous_config, dict) | |
except Exception as e: | |
# ok to fail on this | |
previous_config = {} | |
logger.debug( | |
f"Failed to retrieve config from file {config_utils.DATAHUB_CONFIG_PATH}: {e}. This isn't fatal." | |
) | |
config_dict = {**previous_config, **config.dict()} | |
else: | |
config_dict = config.dict() | |
config_utils.persist_datahub_config(config_dict) | |
def write_gms_config( | |
host: str, token: Optional[str], merge_with_previous: bool = True | |
) -> None: | |
config = DatahubConfig(gms=DatahubClientConfig(server=host, token=token)) | |
if merge_with_previous: | |
try: | |
previous_config = config_utils.get_client_config() | |
assert isinstance(previous_config, dict) | |
except Exception as e: | |
# ok to fail on this | |
previous_config = {} | |
logger.debug( | |
f"Failed to retrieve config from file {config_utils.DATAHUB_CONFIG_PATH}: {e}. This isn't fatal." | |
) | |
config_dict = {**config.dict(), **previous_config} | |
else: | |
config_dict = config.dict() | |
config_utils.persist_datahub_config(config_dict) |
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <john@acryl.io> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <john@acryl.io> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <john@acryl.io> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <john@acryl.io> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <john@Johns-MBP.lan> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: John Joyce <john@Johns-MBP.lan> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro@acryl.io> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <pedro@acryl.io> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <chriscollins3456@gmail.com> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <john@acryl.io> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <jeff@wireform.io> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: John Joyce <john@acryl.io> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <hsheth2@gmail.com> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <chriscollins3456@gmail.com> Co-authored-by: John Joyce <john@acryl.io> Co-authored-by: John Joyce <john@Johns-MBP.lan> Co-authored-by: John Joyce <john@ip-192-168-1-200.us-west-2.compute.internal> Co-authored-by: dushayntAW <158567391+dushayntAW@users.noreply.github.com> Co-authored-by: sagar-salvi-apptware <159135491+sagar-salvi-apptware@users.noreply.github.com> Co-authored-by: Aseem Bansal <asmbansal2@gmail.com> Co-authored-by: Kevin Chun <kevin1chun@gmail.com> Co-authored-by: jordanjeremy <72943478+jordanjeremy@users.noreply.github.com> Co-authored-by: skrydal <piotr.skrydalewicz@gmail.com> Co-authored-by: Harshal Sheth <hsheth2@gmail.com> Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com> Co-authored-by: sid-acryl <155424659+sid-acryl@users.noreply.github.com> Co-authored-by: Julien Jehannet <80408664+aviv-julienjehannet@users.noreply.github.com> Co-authored-by: Hendrik Richert <github@richert.li> Co-authored-by: Hendrik Richert <hendrik.richert@swisscom.com> Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com> Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com> Co-authored-by: Pirry <158024088+chardaway@users.noreply.github.com> Co-authored-by: Hyejin Yoon <0327jane@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <chris.burroughs@gmail.com> Co-authored-by: ksrinath <ksrinath@users.noreply.github.com> Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com> Co-authored-by: Kunal-kankriya <127090035+Kunal-kankriya@users.noreply.github.com> Co-authored-by: Shirshanka Das <shirshanka@apache.org> Co-authored-by: ipolding-cais <155455744+ipolding-cais@users.noreply.github.com> Co-authored-by: Tamas Nemeth <treff7es@gmail.com> Co-authored-by: Shubham Jagtap <132359390+shubhamjagtap639@users.noreply.github.com> Co-authored-by: haeniya <yanik.haeni@gmail.com> Co-authored-by: Yanik Häni <Yanik.Haeni1@swisscom.com> Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com> Co-authored-by: Gabe Lyons <gabe.lyons@acryl.io> Co-authored-by: 808OVADOZE <52988741+shtephlee@users.noreply.github.com> Co-authored-by: noggi <anton.kuraev@acryl.io> Co-authored-by: Nicholas Pena <npena@foursquare.com> Co-authored-by: Jay <159848059+jayacryl@users.noreply.github.com> Co-authored-by: ethan-cartwright <ethan.cartwright.m@gmail.com> Co-authored-by: Ethan Cartwright <ethan.cartwright@acryl.io> Co-authored-by: Nadav Gross <33874964+nadavgross@users.noreply.github.com> Co-authored-by: Patrick Franco Braz <patrickfbraz@poli.ufrj.br> Co-authored-by: pie1nthesky <39328908+pie1nthesky@users.noreply.github.com> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <130968841+joelmataKPN@users.noreply.github.com> Co-authored-by: Ellie O'Neil <110510035+eboneil@users.noreply.github.com> Co-authored-by: Ajoy Majumdar <ajoymajumdar@hotmail.com> Co-authored-by: deepgarg-visa <149145061+deepgarg-visa@users.noreply.github.com> Co-authored-by: Tristan Heisler <tristankheisler@gmail.com> Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io> Co-authored-by: Davi Arnaut <davi.arnaut@acryl.io> Co-authored-by: Pedro Silva <pedro@acryl.io> Co-authored-by: amit-apptware <132869468+amit-apptware@users.noreply.github.com> Co-authored-by: Sam Black <sam.black@acryl.io> Co-authored-by: Raj Tekal <varadaraj_tekal@optum.com> Co-authored-by: Steffen Grohsschmiedt <gitbhub@steffeng.eu> Co-authored-by: jaegwon.seo <162448493+wornjs@users.noreply.github.com> Co-authored-by: Renan F. Lima <51028757+lima-renan@users.noreply.github.com> Co-authored-by: Matt Exchange <xkollar@users.noreply.github.com> Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com> Co-authored-by: Pedro Silva <pedro.cls93@gmail.com> Co-authored-by: Pinaki Bhattacharjee <pinakipb2@gmail.com> Co-authored-by: Jeff Merrick <jeff@wireform.io> Co-authored-by: skrydal <piotr.skrydalewicz@acryl.io> Co-authored-by: AndreasHegerNuritas <163423418+AndreasHegerNuritas@users.noreply.github.com> Co-authored-by: jayasimhankv <145704974+jayasimhankv@users.noreply.github.com> Co-authored-by: Jay Kadambi <jayasimhan_venkatadri@optum.com> Co-authored-by: David Leifker <david.leifker@acryl.io>
Deprecates
get_url_and_token()
in favor of a more complete option:load_graph_config()
that returns a fullDatahubClientConfig
.This change was then propagated across previous usages of
get_url_and_token
so that connections to DataHub server from the client respect the full breadth of configuration specified byDatahubClientConfig
.I.e: You can now specify
disable_ssl_verification: true
in your~/.datahubenv
file so that all cli functions to the server work when ssl certification is disabled.Fixes #9705
Checklist
Summary by CodeRabbit
New Features
~/.datahubenv
file.Bug Fixes
Refactor
Tests