Skip to content

Commit

Permalink
[Gh-1032] Feature flags for topics and confidentiality and custom con…
Browse files Browse the repository at this point in the history
…fidentiality list (#1049)

### Feature or Bugfix
- Feature

### Detail

This features adds feature flags to enable / disable topics and
confidentiality fields while creating a dataset. Moreover, it adds ways
to provide custom confidentiality fields to show on the UI.
Please note - For the custom confidentiality fields you will have to
provide the mapping to the existing confidentiality levels ( i.e.
Unclassified, Secret, Official ) in data.all . Please refer to this
section about setting data.all for more details -
https://data-dot-all.github.io/dataall/deploy-aws/

### Testing 

1. Tested on local dev setup ( All Config Switches , Custom
confidentiality values and standard confidentiality values )
2. Tested on AWS account ( All Config Switches and custom
confidentiality / standard confidentiality values )
3. Unit test 

### Things to do 

- [ ] Readme document to let user know how to use this feature of
custom_confidentiality_mapping **( TODO When PR is APPROVED )**



### How to use this feature. 

The config.json is already updated with the new configs , if you want to
use the custom mapping , please add the following section in the
config.json under the `modules \ datasets \ features `
```json
"custom_confidentiality_mapping" : {
                    "Public" : "Unclassified",
                    "Custom Confidentiality" : "Official",
                    "Custom Confidential" : "Secret",
                    "Another Confidentiality" : "Official"
 }

```


### Relates
- #1032

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)? No
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization? No
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features? No
  - Do you use a standard proven implementations?
- Are the used keys controlled by the customer? Where are they stored?
No
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: trajopadhye <tejas.rajopadhye@yahooinc.com>
  • Loading branch information
TejasRGitHub and trajopadhye authored Feb 19, 2024
1 parent f58d878 commit b6449d1
Show file tree
Hide file tree
Showing 17 changed files with 362 additions and 241 deletions.
6 changes: 3 additions & 3 deletions backend/dataall/modules/datasets/api/dataset/input_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
gql.Argument(
name='businessOwnerDelegationEmails', type=gql.ArrayType(gql.String)
),
gql.Argument('confidentiality', gql.Ref('ConfidentialityClassification')),
gql.Argument('confidentiality', gql.String),
gql.Argument(name='stewards', type=gql.String),
gql.Argument(name='autoApprovalEnabled', type=gql.Boolean)
],
Expand All @@ -36,7 +36,7 @@
gql.Argument('businessOwnerDelegationEmails', gql.ArrayType(gql.String)),
gql.Argument('businessOwnerEmail', gql.String),
gql.Argument('language', gql.Ref('Language')),
gql.Argument('confidentiality', gql.Ref('ConfidentialityClassification')),
gql.Argument('confidentiality', gql.String),
gql.Argument(name='stewards', type=gql.String),
gql.Argument('KmsAlias', gql.NonNullableType(gql.String)),
gql.Argument(name='autoApprovalEnabled', type=gql.Boolean)
Expand Down Expand Up @@ -103,7 +103,7 @@
gql.Argument(
name='businessOwnerDelegationEmails', type=gql.ArrayType(gql.String)
),
gql.Argument('confidentiality', gql.Ref('ConfidentialityClassification')),
gql.Argument('confidentiality', gql.String),
gql.Argument(name='stewards', type=gql.String),
gql.Argument(name='autoApprovalEnabled', type=gql.Boolean)

Expand Down
3 changes: 2 additions & 1 deletion backend/dataall/modules/datasets/api/dataset/resolvers.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from dataall.base.db.exceptions import RequiredParameter, InvalidInput
from dataall.modules.dataset_sharing.db.share_object_models import ShareObject
from dataall.modules.datasets_base.db.dataset_models import Dataset
from dataall.modules.datasets_base.services.datasets_base_enums import DatasetRole
from dataall.modules.datasets_base.services.datasets_base_enums import DatasetRole, ConfidentialityClassification
from dataall.modules.datasets.services.dataset_service import DatasetService

log = logging.getLogger(__name__)
Expand Down Expand Up @@ -201,6 +201,7 @@ def validate_creation_request(data):
raise RequiredParameter('group')
if not data.get('label'):
raise RequiredParameter('label')
ConfidentialityClassification.validate_confidentiality_level(data.get('confidentiality', ''))
if len(data['label']) > 52:
raise InvalidInput(
'Dataset name', data['label'], 'less than 52 characters'
Expand Down
2 changes: 1 addition & 1 deletion backend/dataall/modules/datasets/api/dataset/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@
),
gql.Field(name='topics', type=gql.ArrayType(gql.Ref('Topic'))),
gql.Field(
name='confidentiality', type=gql.Ref('ConfidentialityClassification')
name='confidentiality', type=gql.String
),
gql.Field(name='language', type=gql.Ref('Language')),
gql.Field(
Expand Down
5 changes: 4 additions & 1 deletion backend/dataall/modules/datasets/cdk/dataset_stack.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
from dataall.modules.datasets.aws.lf_dataset_client import LakeFormationDatasetClient
from dataall.modules.datasets_base.db.dataset_models import Dataset
from dataall.base.utils.cdk_nag_utils import CDKNagUtil
from dataall.base.config import config


logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -535,7 +537,8 @@ def __init__(self, scope, id, target_uri: str = None, **kwargs):
)
trigger.node.add_dependency(job)

Tags.of(self).add('Classification', dataset.confidentiality)
if config.get_property('modules.datasets.features.confidentiality_dropdown', False):
Tags.of(self).add('Classification', dataset.confidentiality)

TagsUtil.add_tags(stack=self, model=Dataset, target_type="dataset")

Expand Down
4 changes: 3 additions & 1 deletion backend/dataall/modules/datasets/indexers/dataset_indexer.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""Indexes Datasets in OpenSearch"""
import re

from dataall.core.environment.services.environment_service import EnvironmentService
from dataall.core.organizations.db.organization_repositories import OrganizationRepository
from dataall.modules.vote.db.vote_repositories import VoteRepository
Expand Down Expand Up @@ -34,7 +36,7 @@ def upsert(cls, session, dataset_uri: str):
'source': dataset.S3BucketName,
'resourceKind': 'dataset',
'description': dataset.description,
'classification': dataset.confidentiality,
'classification': re.sub('[^A-Za-z0-9]+', '', dataset.confidentiality),
'tags': [t.replace('-', '') for t in dataset.tags or []],
'topics': dataset.topics,
'region': dataset.region.replace('-', ''),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def paginate_active_columns_for_table(uri: str, filter=None):
table: DatasetTable = DatasetTableRepository.get_dataset_table_by_uri(session, uri)
dataset = DatasetRepository.get_dataset_by_uri(session, table.datasetUri)
if (
dataset.confidentiality != ConfidentialityClassification.Unclassified.value
ConfidentialityClassification.get_confidentiality_level(dataset.confidentiality) != ConfidentialityClassification.Unclassified.value
):
ResourcePolicy.check_user_resource_permission(
session=session,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def _check_preview_permissions_if_needed(session, table_uri):
session, table_uri
)
dataset = DatasetRepository.get_dataset_by_uri(session, table.datasetUri)
if dataset.confidentiality != ConfidentialityClassification.Unclassified.value:
if ConfidentialityClassification.get_confidentiality_level(dataset.confidentiality) != ConfidentialityClassification.Unclassified.value:
ResourcePolicy.check_user_resource_permission(
session=session,
username=context.username,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def preview(table_uri: str):
)
dataset = DatasetRepository.get_dataset_by_uri(session, table.datasetUri)
if (
dataset.confidentiality != ConfidentialityClassification.Unclassified.value
ConfidentialityClassification.get_confidentiality_level(dataset.confidentiality) != ConfidentialityClassification.Unclassified.value
):
ResourcePolicy.check_user_resource_permission(
session=session,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
from dataall.base.api.constants import GraphQLEnumMapper
from dataall.base.config import config
from dataall.base.db.exceptions import InvalidInput

custom_confidentiality_mapping = config.get_property('modules.datasets.features.custom_confidentiality_mapping', {})


class DatasetRole(GraphQLEnumMapper):
Expand All @@ -22,6 +26,20 @@ class ConfidentialityClassification(GraphQLEnumMapper):
Official = 'Official'
Secret = 'Secret'

@staticmethod
def get_confidentiality_level(confidentiality):
return confidentiality if not custom_confidentiality_mapping else custom_confidentiality_mapping.get(
confidentiality, None)

@staticmethod
def validate_confidentiality_level(confidentiality):
if config.get_property('modules.datasets.features.confidentiality_dropdown', False):
confidentiality = ConfidentialityClassification.get_confidentiality_level(confidentiality)
if confidentiality not in [item.value for item in list(ConfidentialityClassification)]:
raise InvalidInput('Confidentiality Name', confidentiality,
'does not conform to the confidentiality classification. Hint: Check your confidentiality value OR check your mapping if you are using custom confidentiality values')
return True


class Language(GraphQLEnumMapper):
English = 'English'
Expand Down
4 changes: 3 additions & 1 deletion config.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@
}
},
"preview_data": true,
"glue_crawler": true
"glue_crawler": true,
"confidentiality_dropdown" : true,
"topics_dropdown" : true
}
},
"worksheets": {
Expand Down
44 changes: 25 additions & 19 deletions frontend/src/modules/Catalog/views/Catalog.js
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import {
useSettings
} from 'design';
import { GlossarySearchWrapper, GlossarySearchResultItem } from '../components';
import config from '../../../generated/config.json';

const useStyles = makeStyles((theme) => ({
mainSearch: {
Expand Down Expand Up @@ -171,7 +172,14 @@ const Catalog = () => {
const classes = useStyles();
const anchorRef = useRef(null);
const [openMenu, setOpenMenu] = useState(false);
const [filterItems] = useState([
const dataFieldList = ['label', 'name', 'description', 'region', 'tags'];

if (config.modules.datasets.features.topics_dropdown === true)
dataFieldList.push('topics');
if (config.modules.datasets.features.confidentiality_dropdown === true)
dataFieldList.push('classification');

const filterItemsInit = [
{
title: 'Type',
dataField: 'resourceKind',
Expand All @@ -184,25 +192,30 @@ const Catalog = () => {
componentId: 'TagSensor',
filterLabel: 'Tags'
},
{
title: 'Topics',
dataField: 'topics',
componentId: 'TopicSensor',
filterLabel: 'Topics'
},
{
title: 'Region',
dataField: 'region',
componentId: 'RegionSensor',
filterLabel: 'Region'
},
{
}
];

if (config.modules.datasets.features.topics_dropdown === true)
filterItemsInit.push({
title: 'Topics',
dataField: 'topics',
componentId: 'TopicSensor',
filterLabel: 'Topics'
});
if (config.modules.datasets.features.confidentiality_dropdown === true)
filterItemsInit.push({
title: 'Classification',
dataField: 'classification',
componentId: 'ClassificationSensor',
filterLabel: 'Classification'
}
]);
});

const [filterItems] = useState(filterItemsInit);
const [listClass, setListClass] = useState(
settings.theme === THEMES.LIGHT
? classes.lightListSearch
Expand Down Expand Up @@ -337,14 +350,7 @@ const Catalog = () => {
fuzziness="AUTO"
componentId="SearchSensor"
filterLabel="text"
dataField={[
'label',
'name',
'description',
'region',
'topics',
'tags'
]}
dataField={dataFieldList}
placeholder="Search"
/>
</Box>
Expand Down
56 changes: 31 additions & 25 deletions frontend/src/modules/Datasets/components/DatasetGovernance.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import {
} from '@mui/material';
import PropTypes from 'prop-types';
import { Label } from 'design';
import { isFeatureEnabled } from 'utils';

export const DatasetGovernance = (props) => {
const { dataset } = props;
Expand Down Expand Up @@ -48,31 +49,36 @@ export const DatasetGovernance = (props) => {
</Label>
</Box>
</CardContent>
<CardContent>
<Typography color="textSecondary" variant="subtitle2">
Classification
</Typography>
<Box sx={{ mt: 1 }}>
<Label color="primary">{dataset.confidentiality}</Label>
</Box>
</CardContent>
<CardContent>
<Typography color="textSecondary" variant="subtitle2">
Topics
</Typography>
<Box sx={{ mt: 1 }}>
{dataset.topics &&
dataset.topics.length > 0 &&
dataset.topics.map((t) => (
<Chip
sx={{ mr: 0.5, mb: 0.5 }}
key={t}
label={t}
variant="outlined"
/>
))}
</Box>
</CardContent>
{isFeatureEnabled('datasets', 'confidentiality_dropdown') && (
<CardContent>
<Typography color="textSecondary" variant="subtitle2">
Classification
</Typography>
<Box sx={{ mt: 1 }}>
<Label color="primary">{dataset.confidentiality}</Label>
</Box>
</CardContent>
)}
{isFeatureEnabled('datasets', 'topics_dropdown') && (
<CardContent>
<Typography color="textSecondary" variant="subtitle2">
Topics
</Typography>
<Box sx={{ mt: 1 }}>
{dataset.topics &&
dataset.topics.length > 0 &&
dataset.topics.map((t) => (
<Chip
sx={{ mr: 0.5, mb: 0.5 }}
key={t}
label={t}
variant="outlined"
/>
))}
</Box>
</CardContent>
)}

<CardContent>
<Typography color="textSecondary" variant="subtitle2">
Tags
Expand Down
Loading

0 comments on commit b6449d1

Please sign in to comment.