Auto ML assets #25466

MaksYermak · 2022-08-02T09:40:04Z

I have created links and updated system tests for Auto ML operators.

Co-authored-by: Wojciech Januszek januszek@google.com
Co-authored-by: Lukasz Wyszomirski wyszomirski@google.com
Co-authored-by: Maksim Yermakou maksimy@google.com

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

potiuk · 2022-08-02T19:46:11Z

Errors :(

potiuk · 2022-08-04T15:19:49Z

Rebased to acount for Flask 2.2 errors fixed yesterday.

airflow/providers/google/cloud/links/automl.py

tests/system/providers/google/cloud/automl/example_automl_dataset.py

raphaelauv · 2022-08-09T22:19:09Z

there is a csv file tests/system/providers/google/cloud/automl/resources/bank-marketing.csv

of 45K char , is that normal ?

potiuk · 2022-08-10T11:48:19Z

Yeah 45K lines of .csv file is NOT something we want. Few options:

what happens when you zip the file ? how big it is going to get
Do we REALLY need as big of a file?
We could easily place it it in our Amazon S3 bucket to download it for the test when needed, we could make it publicly available

MaksYermak · 2022-08-11T13:02:13Z

Yeah 45K lines of .csv file is NOT something we want. Few options:

what happens when you zip the file ? how big it is going to get

Do we REALLY need as big of a file?

We could easily place it it in our Amazon S3 bucket to download it for the test when needed, we could make it publicly available

This .csv is needed for training an AutoML model, in order to start the training .csv should consist more then 1000 rows. For our test I can reduce the file to 2100 rows. @potiuk what do you think about reducing the file size?

bhirsz · 2022-08-25T10:25:59Z

Yeah 45K lines of .csv file is NOT something we want. Few options:

what happens when you zip the file ? how big it is going to get

Do we REALLY need as big of a file?

We could easily place it it in our Amazon S3 bucket to download it for the test when needed, we could make it publicly available

This .csv is needed for training an AutoML model, in order to start the training .csv should consist more then 1000 rows. For our test I can reduce the file to 2100 rows. @potiuk what do you think about reducing the file size?

@potiuk Catching attention :) I think 2100 is okayish (not the best but certainly better than 50k). Please comment if you still think it should be stored in the external storage.

potiuk · 2022-08-25T12:55:49Z

Can we compress it (and dynamically decompress during test?). Just zipping it is 20K instead of 160K. This file is unlikely to ever change and it is cimpletely uninteresting to see what's in when you review the cod, so there is no particular reason to keep text file in Git.

It's not only the size that matters in this case. Keeping it plain text has this really nasty effect that it when you search something in the source code in your IDE, you will find some matching words here likely, so keeping the file uncompressed make it very prone to falling search&replace victim,

MaksYermak · 2022-09-01T07:36:20Z

Can we compress it (and dynamically decompress during test?). Just zipping it is 20K instead of 160K. This file is unlikely to ever change and it is cimpletely uninteresting to see what's in when you review the cod, so there is no particular reason to keep text file in Git.

It's not only the size that matters in this case. Keeping it plain text has this really nasty effect that it when you search something in the source code in your IDE, you will find some matching words here likely, so keeping the file uncompressed make it very prone to falling search&replace victim,

@potiuk I have done it

potiuk · 2022-09-09T02:12:04Z

Sorry for delay - been a bit busy.

No, It's not compressed - it's just bundled in .tar now not .zipped (.tar-ing single file kinda make no sense) . Stil takes 170 instead of 20K (and this PR needs rebase anyway).

potiuk · 2022-10-24T22:54:27Z

conflicts need to be resolved after string normalisation

potiuk · 2022-10-31T04:08:33Z

Rebased to rebuild.

potiuk · 2022-12-04T23:01:05Z

Tests failing.

potiuk · 2023-01-17T12:56:20Z

static check failures.

This reverts commit 7f0305d80ad162ee4e17a85870e88bdad5f27b18.

…e-commit checks

potiuk · 2023-01-18T09:17:43Z

REbased - static checks fixed in main (mysql python connector release breaking mypy)

MrGeorgeOwl · 2023-01-23T10:33:38Z

@potiuk I think that PR can be merged. I can't do that because I am not the author of PR and I don't have write access

MaksYermak requested review from turbaszek and mik-laj as code owners August 2, 2022 09:40

boring-cyborg bot added area:providers area:system-tests kind:documentation provider:google Google (including GCP) related issues labels Aug 2, 2022

MaksYermak force-pushed the automl-assets branch from 5a2d3b5 to 6f56cfe Compare August 2, 2022 12:24

eladkal mentioned this pull request Aug 3, 2022

Extra links for Google Cloud operators #9941

Closed

34 tasks

MaksYermak force-pushed the automl-assets branch from cf3d1c1 to 93ef028 Compare August 3, 2022 13:03

potiuk force-pushed the automl-assets branch from 93ef028 to fdb3996 Compare August 4, 2022 15:19

josh-fell reviewed Aug 4, 2022

View reviewed changes

airflow/providers/google/cloud/links/automl.py Outdated Show resolved Hide resolved

josh-fell reviewed Aug 4, 2022

View reviewed changes

tests/system/providers/google/cloud/automl/example_automl_dataset.py Show resolved Hide resolved

MaksYermak force-pushed the automl-assets branch from fdb3996 to 7f0305d Compare August 9, 2022 14:49

josh-fell mentioned this pull request Aug 11, 2022

Life Science assets & system tests migration (AIP-47) #25548

Merged

MaksYermak force-pushed the automl-assets branch from 107f390 to 89c2f7c Compare August 23, 2022 09:06

MaksYermak force-pushed the automl-assets branch from 89c2f7c to f498f06 Compare August 31, 2022 14:29

MrGeorgeOwl force-pushed the automl-assets branch 3 times, most recently from 6adb962 to 076d91a Compare October 14, 2022 09:03

MrGeorgeOwl force-pushed the automl-assets branch from b1dbfa0 to 3616fd2 Compare October 25, 2022 15:17

MrGeorgeOwl force-pushed the automl-assets branch from 3616fd2 to a0c5ee8 Compare October 26, 2022 07:38

potiuk force-pushed the automl-assets branch 2 times, most recently from cfb9f8b to 9b863c6 Compare November 2, 2022 05:11

MrGeorgeOwl force-pushed the automl-assets branch from 9b863c6 to 4c1abb2 Compare November 29, 2022 17:07

MrGeorgeOwl force-pushed the automl-assets branch 2 times, most recently from c1bc806 to 0c94ca8 Compare December 19, 2022 15:40

potiuk force-pushed the automl-assets branch from 5ce3a67 to 6382186 Compare January 17, 2023 12:21

potiuk approved these changes Jan 17, 2023

View reviewed changes

MaksYermak and others added 10 commits January 18, 2023 10:17

Add links for Google AutoML operators

9430260

Delete 'trigger_rule' option from doc marker in example_dag

7f89459

Update unit tests for Google AutoML operators

82dd804

Delete Optional for project_id type

44b17fa

Revert "Delete Optional for project_id type"

52002b6

This reverts commit 7f0305d80ad162ee4e17a85870e88bdad5f27b18.

Make project_id required for links

2436ab7

Add compression for AutoML resources

6e3b256

Change dags to take data from resource data, refactor code to pass pr…

278944b

…e-commit checks

Reformat files with latest black and flake8 config

2232216

Remove old system tests for Auto ML

bda145d

potiuk force-pushed the automl-assets branch from 6382186 to bda145d Compare January 18, 2023 09:17

potiuk approved these changes Jan 23, 2023

View reviewed changes

potiuk merged commit 90e6277 into apache:main Jan 23, 2023

eladkal mentioned this pull request Feb 8, 2023

Status of testing Providers that were prepared on February 08, 2023 #29424

Closed

58 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto ML assets #25466

Auto ML assets #25466

MaksYermak commented Aug 2, 2022

potiuk commented Aug 2, 2022

potiuk commented Aug 4, 2022

raphaelauv commented Aug 9, 2022 •

edited

Loading

potiuk commented Aug 10, 2022

MaksYermak commented Aug 11, 2022

bhirsz commented Aug 25, 2022

potiuk commented Aug 25, 2022 •

edited

Loading

MaksYermak commented Sep 1, 2022

potiuk commented Sep 9, 2022

potiuk commented Oct 24, 2022

potiuk commented Oct 31, 2022

potiuk commented Dec 4, 2022

potiuk commented Jan 17, 2023

potiuk commented Jan 18, 2023

MrGeorgeOwl commented Jan 23, 2023

Auto ML assets #25466

Auto ML assets #25466

Conversation

MaksYermak commented Aug 2, 2022

potiuk commented Aug 2, 2022

potiuk commented Aug 4, 2022

raphaelauv commented Aug 9, 2022 • edited Loading

potiuk commented Aug 10, 2022

MaksYermak commented Aug 11, 2022

bhirsz commented Aug 25, 2022

potiuk commented Aug 25, 2022 • edited Loading

MaksYermak commented Sep 1, 2022

potiuk commented Sep 9, 2022

potiuk commented Oct 24, 2022

potiuk commented Oct 31, 2022

potiuk commented Dec 4, 2022

potiuk commented Jan 17, 2023

potiuk commented Jan 18, 2023

MrGeorgeOwl commented Jan 23, 2023

raphaelauv commented Aug 9, 2022 •

edited

Loading

potiuk commented Aug 25, 2022 •

edited

Loading