Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Response info dataset queries #1665

Merged
merged 6 commits into from
Oct 28, 2024
Merged

Conversation

noah-paige
Copy link
Contributor

@noah-paige noah-paige commented Oct 25, 2024

Feature or Bugfix

  • Refactor

Detail

  • Change Response Types for Dataset, Dashboard, and Shares Modules:
    • S3Datasets: getDataset, getDatasetTables, and getDatasetStorageLocation to only return Simplified Env / Org
      return types
    • ++ Redshift Datasets
    • ++ Dataset Shares

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@noah-paige
Copy link
Contributor Author

noah-paige commented Oct 25, 2024

Testing Completed:

  • All Dataset Tabs Load ++ getDataset returns with correct information on FE
  • All Dataset Tabs Load ++ getDatasetTables returns with correct information on FE
  • All Dataset Tabs Load ++ getDatasetStorageLocation returns with correct information on FE
  • All Dataset Tabs Load ++ getRedshiftDataset returns with correct information on FE
  • All Dataset Tabs Load ++ getRedshiftDatasetTable returns with correct information on FE
  • All Dashboard Tabs Load ++ getDashboard returns with correct information on FE
  • Shares List Tabs Load ++ getShareRequestsFromMe and getShareRequestsToMe returns with correct information on FE
  • ShareObjectView Loads + getShareObject returns with correct information

@SofiaSazonova
Copy link
Contributor

SofiaSazonova commented Oct 28, 2024

Error in query ListDatasets (/console/datasets) "Cannot query field 'organization' on type 'DatasetBase'."

@SofiaSazonova
Copy link
Contributor

Is it necessary to change Environment => SimplifiedEnvironment also in queries?

ListDataPipelines
GetDataPipeline
ListOmicsRun
getSagemakerNotebook
ListSagemakerNotebooks
ListSagemakerStudioUsers
getSagemakerStudioUser

Copy link
Contributor

@SofiaSazonova SofiaSazonova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error and a question

@SofiaSazonova
Copy link
Contributor

Also, a comment: in dataset e.g. if we change name, label doesn't change (or it was a bug). Will it work the same for environments? If so, we risk to have confusing data displayed if use label instead of name

@noah-paige
Copy link
Contributor Author

Error in query ListDatasets (/console/datasets) "Cannot query field 'organization' on type 'DatasetBase'."

Good catch resolved now

@noah-paige
Copy link
Contributor Author

For the other listed queries:

ListDataPipelines
GetDataPipeline
ListOmicsRun
getSagemakerNotebook
ListSagemakerNotebooks
ListSagemakerStudioUsers
getSagemakerStudioUser

The above could be addressed but I left out of this PR for the following reason:

  • The risk that this PR mitigates is for resources that other (non-Owner) teams can view (i.e. shareable resources)
    • This PR limits the amount of information that a team may be able to extract about the parent env/org of a dataset or dashboard that they have approved share access to
  • For other resources (pipelines, notebooks, omics runs, mlstudio) the user must be a part of the team which is invited to the env and org already - meaning there is no issue with the user also extracting information about parent env because they should already be able to

@noah-paige
Copy link
Contributor Author

Also, a comment: in dataset e.g. if we change name, label doesn't change (or it was a bug). Will it work the same for environments? If so, we risk to have confusing data displayed if use label instead of name

This is outside of the scope of this PR yes? I actually think we should prevent updates to label for datasets or envs or any resources in dataall unless we are certain they have no impact on provisioned resources via CDK

@SofiaSazonova
Copy link
Contributor

This is outside of the scope of this PR yes? I actually think we should prevent updates to label for datasets or envs or any resources in dataall unless we are certain they have no impact on provisioned resources via CDK

I guess so. I think it's quite rare case anyway

@noah-paige noah-paige self-assigned this Oct 28, 2024
@noah-paige noah-paige merged commit 8e947d9 into main Oct 28, 2024
9 checks passed
dlpzx pushed a commit that referenced this pull request Nov 6, 2024
<!-- please choose -->
- Refactor

- Change Response Types for Dataset, Dashboard, and Shares Modules:
- S3Datasets: `getDataset`, `getDatasetTables`, and
`getDatasetStorageLocation` to only return Simplified Env / Org
return types
  - ++ Redshift Datasets
  - ++ Dataset Shares

- <URL or Ticket>

Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@dlpzx dlpzx mentioned this pull request Nov 6, 2024
dlpzx added a commit that referenced this pull request Nov 8, 2024
### Feature or Bugfix
- Security

### Detail
* get-parameter CloudfrontDistributionDomainName from us-east-1 (#1687 )
* Added Token Validations (#1682)
* add warning to untrust data.all account when removing an environment
(#1685)
* add custom domain support for apigw (#1679)
* Lambda Event Logs Handling (#1678)
* Upgrade Spark version to 3.3 (#1675) -
a0c63a4
* ES Search Query Collect All Response  (#1631)
* Extend Tenant Perms Coverage (#1630)
* Limit Response info dataset queries (#1665)
* Add Removal Policy Retain to Bucket Policy IaC (#1660) 
* log API handler response only for LOG_LEVEL DEBUG. Set log level INFO
for prod deployments (#1662)
* Add permission checks to markNotificationAsRead + deleteNotification
(#1654)
* Added error view and unified utility to check tenant user (#1657
* Userguide signout flow (#1629)

### Relates
- Security release

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com>
Co-authored-by: Petros Kalos <kalosp@amazon.com>
@dlpzx dlpzx deleted the fix/restrict-dataset-gql-types branch November 22, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants