Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain to alfred with post #384

Merged
merged 84 commits into from
Feb 1, 2024
Merged

Retain to alfred with post #384

merged 84 commits into from
Feb 1, 2024

Conversation

cccs-RyanK
Copy link
Member

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

url = request.url.replace(f"{request.host_url}api/v1/fission/", f"{API_HOST}/")
logger.info(f"Sending to {url}")
user = current_user
token = security_manager.get_on_behalf_of_access_token_with_cache(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to refactor this to avoid code duplication at some point, but not nooow!

data={{
email_ids: selectedData.advancedTypeData.harmonized_email_id,
dates: [
...(selectedData.advancedTypeData['TIMESTAMP WITH TIME ZONE'] ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know ALFRED retention but in our world TIMESTAMP WITH TIME ZONE is not an advanced data type. Just checking here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the var name so that it is just called type to be more generic

// format dates into datestrings that look like Y-m-d
const allDates = props.data.dates.map((d: string) => {
let date = new Date(Date.parse(d));
if (date.toLocaleDateString() === 'Invalid Date') {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks odd to me, there should be a better way to determine an invalid date but a typescript expert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right i changed it to use a more stable way i found online

allow_browser_login = True
include_route_methods = {"get"}
include_route_methods = {RouteMethod.GET, RouteMethod.POST}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still support GET?

}

url = request_url.replace(f'{request.host_url}api/v1/fission', f'{API_HOST}/')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this line should be combined with the block at line 72-73.

Copy link
Member

@cccs-rc cccs-rc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent you one question on Teams, but I'll hit the approve button!

Comment on lines +286 to +288
if i > 0:
sql += " OR "
sql += f"(time > TIMESTAMP '{datetimes[i].strftime('%Y-%m-%d')}' AND time < TIMESTAMP '{(datetimes[i] + timedelta(days=1)).strftime('%Y-%m-%d')}')"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An extra pair of brackets is required here to isolate all the date clauses together, otherwise you may end up with unexpected results if dates are different: IDs in (... , ...) AND (1st date where clause) OR (2nd date where clause) OR (3rd date where clause)

Comment on lines 203 to 216
# These lines were copied from the stage function in the RetentionInstance
# class in the alfred_client library, but does not block and wait for the
# retention to be in the 'ready' state. This async case will hopefully be
# added to the library in the future and this code can be replaced.
out_data_set: List[DataSetEntry] = DataSetMapper().map_data_set(data_set)
retention = alfred.create_retention(
f"Harmonized Email Dashboard Retention - {datetime.now()}"
)
retention = alfred.add_retention_dataset(
retention.uri,
out_data_set,
metadata_system,
mapping_name="hogwarts.harmonized.eml_metadata"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the lines that I added to return the retention url before it is in the ready state

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we good with that name, Harmonized Email Dashboard Retention - Timestamp? Technically, any dataset that has the advanced type can retain to Alfred, not necessary from the dashboard.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision was made to say: Retention initiated from Superset - Timestamp

FROM hogwarts_pb.harmonized.eml_metadata
WHERE id IN ('{ids_string}')
"""
# excluding v6 columns due to utf encoding errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can you raise a ticket to (eventually) fix this and include the ticket number here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did actually fix the function to properly encode the v6, they remain left out at the moment because they are not useful still. Should I still create a ticket for readding them when the useful src and dst fields become available?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesss, I think it'd be worthwhile.

@cccs-RyanK cccs-RyanK merged commit d15f00a into cccs-2.1 Feb 1, 2024
2 checks passed
@cccs-RyanK cccs-RyanK deleted the retain-to-alfred-with-post branch February 1, 2024 20:43
cccs-RyanK added a commit that referenced this pull request Feb 22, 2024
* changing alfred to use post instead of get and include dates

* Update Dockerfile

* small fixes to retain to alfred

* added content type header

* update base image

* bump timeout up

* fix var name

* better check for bad date

* taking out fission

* Update analytical-platform-requirements.txt

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update Dockerfile

* Update requirements.txt

* Update analytical-platform-requirements.txt

* remove path argument

* Update Dockerfile

* Update utils.py

* Update Dockerfile

* Update utils.py

* Update Dockerfile

* Update api.py

* Update Dockerfile

* Update api.py

* Update utils.py

* Update Dockerfile

* Update api.py

* Update Dockerfile

* moving to new api directory

* refactoring

* fixing schema

* update dockerfile image

* Update api.py

* Update superset/alfred/api.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/api.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* fixing logging statements where necessary

* bumping limit to 100

* removing cccs_formatting arg that is always true

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* cleaning up utils

* trino host env var

* Update superset/alfred/api.py

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update superset/alfred/api.py

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update superset-frontend/src/cccs-viz/plugins/plugin-chart-ag-grid/src/ag-grid/AGGridViz.tsx

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* remove unused variable

* limit is 100

* moved to constant

* syntax

* Update superset-frontend/src/cccs-viz/plugins/plugin-chart-ag-grid/src/ag-grid/AGGridViz.tsx

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update Dockerfile

* fixing date times

* adding more broad date column names and catching on errors parsing dates

* fixed date formatting

* Update Dockerfile

* fixed errors

* update dockerfile

* fixed list format

* update dockerfile

* fixed flattening array of arrays

* update docker

* count unique eml ids when retaining

* Update Dockerfile

* added extra brackets and fixed error message

* update docker file

* Update utils.py

* adding microsecond precision to json conversion

* updated dockerfile

* double quotes

* Update Dockerfile

* fix query

* added functionality to santize function

* Update Dockerfile

* testing async retention

* update docker

* fixed small issue

* update dockerfile

* added comments

* Update utils.py

---------

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>
Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>
cccs-RyanK added a commit that referenced this pull request Apr 9, 2024
* changing alfred to use post instead of get and include dates

* Update Dockerfile

* small fixes to retain to alfred

* added content type header

* update base image

* bump timeout up

* fix var name

* better check for bad date

* taking out fission

* Update analytical-platform-requirements.txt

* Update azure-pipelines.yml for Azure Pipelines

* Update azure-pipelines.yml for Azure Pipelines

* Update Dockerfile

* Update requirements.txt

* Update analytical-platform-requirements.txt

* remove path argument

* Update Dockerfile

* Update utils.py

* Update Dockerfile

* Update utils.py

* Update Dockerfile

* Update api.py

* Update Dockerfile

* Update api.py

* Update utils.py

* Update Dockerfile

* Update api.py

* Update Dockerfile

* moving to new api directory

* refactoring

* fixing schema

* update dockerfile image

* Update api.py

* Update superset/alfred/api.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/api.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* fixing logging statements where necessary

* bumping limit to 100

* removing cccs_formatting arg that is always true

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>

* cleaning up utils

* trino host env var

* Update superset/alfred/api.py

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update superset/alfred/api.py

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update superset/alfred/utils.py

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update superset-frontend/src/cccs-viz/plugins/plugin-chart-ag-grid/src/ag-grid/AGGridViz.tsx

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* remove unused variable

* limit is 100

* moved to constant

* syntax

* Update superset-frontend/src/cccs-viz/plugins/plugin-chart-ag-grid/src/ag-grid/AGGridViz.tsx

Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>

* Update Dockerfile

* fixing date times

* adding more broad date column names and catching on errors parsing dates

* fixed date formatting

* Update Dockerfile

* fixed errors

* update dockerfile

* fixed list format

* update dockerfile

* fixed flattening array of arrays

* update docker

* count unique eml ids when retaining

* Update Dockerfile

* added extra brackets and fixed error message

* update docker file

* Update utils.py

* adding microsecond precision to json conversion

* updated dockerfile

* double quotes

* Update Dockerfile

* fix query

* added functionality to santize function

* Update Dockerfile

* testing async retention

* update docker

* fixed small issue

* update dockerfile

* added comments

* Update utils.py

---------

Co-authored-by: cccs-rc <62034438+cccs-rc@users.noreply.github.com>
Co-authored-by: cccs-joel <62111795+cccs-joel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants