refactor data access part 1 models validators [Please donot merge] #2007
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context:
We are breaking the PR #1967 into smaller PR(easy to review and work on). This branch is expected to merge on the 1967, not dev.
This PR introduces the
DataAccess
model and validatorsQuick Summary about the model,
DataSource
model should be used to decide where the files arestored(determined by
data_location
) for project and how they can be accessed(determined byaccess_mechanism
).A single project can have multiple DataSource.
About the fields
files_available
- determines if the files can be viewed/downloaded for the given type of datasource.(@kshalot had notes about this field here #1967 (comment))email
- For GCP group access, this would store the email of the group.uri
- The URI for the data on the external service. For s3 this would be of the form s3://<bucket_name>, for gsutil this would be of the form gs://<bucket_name>Quick Summary about validators
The validation is based on four aspects: required fields, forbidden fields, required access mechanisms, and forbidden access mechanisms.
Required Fields: For each data location (such as Google BigQuery, Google Cloud Storage, AWS Open Data, and AWS S3), certain fields must be present. For instance, Google BigQuery requires an 'email', while Google Cloud Storage, AWS Open Data, and AWS S3 require a 'uri'. If a required field is missing, a validation error is raised.
Forbidden Fields: Conversely, for certain data locations, some fields must not be present. For example, for 'Direct' data location, 'uri' and 'email' fields should not be present. If they are found, a validation error is raised.
Required Access Mechanisms: Each data location may also require one of several specified access mechanisms. For instance, Google BigQuery and Google Cloud Storage can require either a 'Google Group Email' or a 'Research Environment' access mechanism, while AWS Open Data and AWS S3 require an 'S3' access mechanism. If none of the acceptable access mechanisms are found, a validation error is raised.
Forbidden Access Mechanisms: Finally, some data locations forbid certain access mechanisms. Specifically, the 'Direct' data location forbids the 'Google Group Email', 'S3', and 'Research Environment' access mechanisms. If any of these are present, a validation error is raised.
Quick Note about the interface
This is so that we can quickly test if the validators work. and create datasources.