-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/sigma): Sigma connector integration #10037
Merged
hsheth2
merged 11 commits into
datahub-project:master
from
shubhamjagtap639:Sigma-Connector-Integration
Apr 16, 2024
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
cfd23a8
Code for sigma source integration
shubhamjagtap639 652132a
Address review comments
shubhamjagtap639 1e34864
Add sigma dataset and workbook badge as tag
shubhamjagtap639 b3cb718
Merge branch 'master' into Sigma-Connector-Integration
shubhamjagtap639 42321a3
Modify sigma workspace_pattern config description
shubhamjagtap639 60fb768
Add sigma dataset upstream lineage code
shubhamjagtap639 284f33c
Merge branch 'master' into Sigma-Connector-Integration
shubhamjagtap639 174bcc9
Address review comments
shubhamjagtap639 5c0366b
Update metadata-ingestion/docs/sources/sigma/sigma_pre.md
hsheth2 c42df20
Update datahub-web-react/src/app/ingest/source/builder/sources.json
hsheth2 1f09f7d
Merge branch 'master' into Sigma-Connector-Integration
hsheth2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
## Integration Details | ||
|
||
This source extracts the following: | ||
|
||
- Workspaces and workbooks within that workspaces as Container. | ||
- Sigma Datasets as Datahub Datasets. | ||
- Pages as Datahub dashboards and elements present inside pages as charts. | ||
|
||
## Configuration Notes | ||
|
||
1. Refer [doc](https://help.sigmacomputing.com/docs/generate-api-client-credentials) to generate an API client credentials. | ||
2. Provide the generated Client ID and Secret in Recipe. | ||
|
||
## Concept mapping | ||
|
||
| Sigma | Datahub | Notes | | ||
|------------------------|---------------------------------------------------------------|----------------------------------| | ||
| `Workspace` | [Container](../../metamodel/entities/container.md) | SubType `"Sigma Workspace"` | | ||
| `Workbook` | [Container](../../metamodel/entities/container.md) | SubType `"Sigma Workbook"` | | ||
| `Page` | [Dashboard](../../metamodel/entities/dashboard.md) | | | ||
| `Element` | [Chart](../../metamodel/entities/chart.md) | | | ||
| `Dataset` | [Dataset](../../metamodel/entities/dataset.md) | SubType `"Sigma Dataset"` | | ||
| `User` | [User (a.k.a CorpUser)](../../metamodel/entities/corpuser.md) | Optionally Extracted | | ||
|
||
## Advanced Configurations | ||
|
||
### Chart source platform mapping | ||
If you want to provide platform details(platform name, platform instance and env) for chart's all external upstream data sources, then you can use `chart_sources_platform_mapping` as below: | ||
|
||
#### Example - For just one specific chart's external upstream data sources | ||
```yml | ||
chart_sources_platform_mapping: | ||
'workspace_name/workbook_name/chart_name_1': | ||
data_source_platform: snowflake | ||
platform_instance: new_instance | ||
env: PROD | ||
|
||
'workspace_name/folder_name/workbook_name/chart_name_2': | ||
data_source_platform: postgres | ||
platform_instance: cloud_instance | ||
env: DEV | ||
``` | ||
|
||
#### Example - For all charts within one specific workbook | ||
```yml | ||
chart_sources_platform_mapping: | ||
'workspace_name/workbook_name_1': | ||
data_source_platform: snowflake | ||
platform_instance: new_instance | ||
env: PROD | ||
|
||
'workspace_name/folder_name/workbook_name_2': | ||
data_source_platform: snowflake | ||
platform_instance: new_instance | ||
env: PROD | ||
``` | ||
|
||
#### Example - For all workbooks charts within one specific workspace | ||
```yml | ||
chart_sources_platform_mapping: | ||
'workspace_name': | ||
data_source_platform: snowflake | ||
platform_instance: new_instance | ||
env: PROD | ||
``` | ||
|
||
#### Example - All workbooks use the same connection | ||
```yml | ||
chart_sources_platform_mapping: | ||
'*': | ||
data_source_platform: snowflake | ||
platform_instance: new_instance | ||
env: PROD | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
source: | ||
type: sigma | ||
config: | ||
# Coordinates | ||
api_url: "https://aws-api.sigmacomputing.com/v2" | ||
# Credentials | ||
client_id: "CLIENTID" | ||
client_secret: "CLIENT_SECRET" | ||
|
||
# Optional - filter for certain workspace names instead of ingesting everything. | ||
# workspace_pattern: | ||
# allow: | ||
# - workspace_name | ||
|
||
ingest_owner: true | ||
|
||
# Optional - mapping of sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path. | ||
# chart_sources_platform_mapping: | ||
# folder_path: | ||
# data_source_platform: postgres | ||
# platform_instance: cloud_instance | ||
# env: DEV | ||
|
||
sink: | ||
# sink configs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
86 changes: 86 additions & 0 deletions
86
metadata-ingestion/src/datahub/ingestion/source/sigma/config.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
import logging | ||
from dataclasses import dataclass | ||
from typing import Dict, Optional | ||
|
||
import pydantic | ||
|
||
from datahub.configuration.common import AllowDenyPattern | ||
from datahub.configuration.source_common import ( | ||
EnvConfigMixin, | ||
PlatformInstanceConfigMixin, | ||
) | ||
from datahub.ingestion.source.state.stale_entity_removal_handler import ( | ||
StaleEntityRemovalSourceReport, | ||
) | ||
from datahub.ingestion.source.state.stateful_ingestion_base import ( | ||
StatefulIngestionConfigBase, | ||
) | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class Constant: | ||
""" | ||
keys used in sigma plugin | ||
""" | ||
|
||
# Rest API response key constants | ||
ENTRIES = "entries" | ||
FIRSTNAME = "firstName" | ||
LASTNAME = "lastName" | ||
EDGES = "edges" | ||
DEPENDENCIES = "dependencies" | ||
SOURCE = "source" | ||
WORKSPACEID = "workspaceId" | ||
PATH = "path" | ||
NAME = "name" | ||
URL = "url" | ||
ELEMENTID = "elementId" | ||
ID = "id" | ||
PARENTID = "parentId" | ||
TYPE = "type" | ||
DATASET = "dataset" | ||
WORKBOOK = "workbook" | ||
BADGE = "badge" | ||
NEXTPAGE = "nextPage" | ||
|
||
# Source Config constants | ||
DEFAULT_API_URL = "https://aws-api.sigmacomputing.com/v2" | ||
|
||
|
||
@dataclass | ||
class SigmaSourceReport(StaleEntityRemovalSourceReport): | ||
number_of_workspaces: int = 0 | ||
|
||
def report_number_of_workspaces(self, number_of_workspaces: int) -> None: | ||
self.number_of_workspaces = number_of_workspaces | ||
|
||
|
||
class PlatformDetail(PlatformInstanceConfigMixin, EnvConfigMixin): | ||
data_source_platform: str = pydantic.Field( | ||
description="A chart's data sources platform name.", | ||
) | ||
|
||
|
||
class SigmaSourceConfig( | ||
StatefulIngestionConfigBase, PlatformInstanceConfigMixin, EnvConfigMixin | ||
): | ||
api_url: str = pydantic.Field( | ||
default=Constant.DEFAULT_API_URL, description="Sigma API hosted URL." | ||
) | ||
client_id: str = pydantic.Field(description="Sigma Client ID") | ||
client_secret: str = pydantic.Field(description="Sigma Client Secret") | ||
# Sigma workspace identifier | ||
workspace_pattern: AllowDenyPattern = pydantic.Field( | ||
default=AllowDenyPattern.allow_all(), | ||
description="Regex patterns to filter Sigma workspaces in ingestion." | ||
"Mention 'User Folder' if entities of 'My documents' need to ingest.", | ||
) | ||
ingest_owner: Optional[bool] = pydantic.Field( | ||
default=True, | ||
description="Ingest Owner from source. This will override Owner info entered from UI", | ||
) | ||
chart_sources_platform_mapping: Dict[str, PlatformDetail] = pydantic.Field( | ||
default={}, | ||
description="A mapping of the sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.", | ||
) |
73 changes: 73 additions & 0 deletions
73
metadata-ingestion/src/datahub/ingestion/source/sigma/data_classes.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
from datetime import datetime | ||
from typing import Dict, List, Optional | ||
|
||
from pydantic import BaseModel, root_validator | ||
|
||
from datahub.emitter.mcp_builder import ContainerKey | ||
|
||
|
||
class WorkspaceKey(ContainerKey): | ||
workspaceId: str | ||
|
||
|
||
class WorkbookKey(ContainerKey): | ||
workbookId: str | ||
|
||
|
||
class Workspace(BaseModel): | ||
workspaceId: str | ||
name: str | ||
createdBy: str | ||
createdAt: datetime | ||
updatedAt: datetime | ||
|
||
|
||
class SigmaDataset(BaseModel): | ||
datasetId: str | ||
workspaceId: str | ||
name: str | ||
description: str | ||
createdBy: str | ||
createdAt: datetime | ||
updatedAt: datetime | ||
url: str | ||
path: str | ||
badge: Optional[str] = None | ||
|
||
@root_validator(pre=True) | ||
def update_values(cls, values: Dict) -> Dict: | ||
# As element lineage api provide this id as source dataset id | ||
values["datasetId"] = values["url"].split("/")[-1] | ||
return values | ||
|
||
|
||
class Element(BaseModel): | ||
elementId: str | ||
type: str | ||
name: str | ||
url: str | ||
vizualizationType: Optional[str] = None | ||
query: Optional[str] = None | ||
columns: List[str] = [] | ||
upstream_sources: Dict[str, str] = {} | ||
|
||
|
||
class Page(BaseModel): | ||
pageId: str | ||
name: str | ||
elements: List[Element] = [] | ||
|
||
|
||
class Workbook(BaseModel): | ||
workbookId: str | ||
workspaceId: str | ||
name: str | ||
createdBy: str | ||
updatedBy: str | ||
createdAt: datetime | ||
updatedAt: datetime | ||
url: str | ||
path: str | ||
latestVersion: int | ||
pages: List[Page] = [] | ||
badge: Optional[str] = None |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be better to keep this as a
List[str]
- that way if someone has a/
in their folder name, we still handle it correctlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are getting path attribute details from API in string format. If we later convert it to list of string with any one folder containing
/
in name, still that folder name will get split.