Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-47 Migrate ElasticSearch into new system tests #22811

Merged
merged 1 commit into from
Apr 13, 2022

Conversation

Bowrna
Copy link
Contributor

@Bowrna Bowrna commented Apr 7, 2022

closes: #22445
relates: #22445


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@Bowrna
Copy link
Contributor Author

Bowrna commented Apr 7, 2022

I get the following error when I run the test using pytest in breeze environment for Elasticsearch. Could you share your insights on where I am going wrong? I have configured to run the ES in local env and updated the username and password for elasticsearch_default conn_id
@potiuk @mnojek

root@20f758b954e9:/opt/airflow/tests/system/providers/elasticsearch# pytest --system elasticsearch example_elasticsearch_query.py
============================= test session starts ==============================
platform linux -- Python 3.7.13, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: timeouts-1.2.1, httpx-0.20.0, asyncio-0.18.3, instafail-0.4.2, anyio-3.5.0, requests-mock-1.9.3, cov-3.0.0, flaky-3.7.0, rerunfailures-9.1.1, forked-1.4.0, xdist-2.5.0
asyncio: mode=strict
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collecting ... collected 1 item

example_elasticsearch_query.py::test_run <- tests/system/utils/__init__.py FAILED [100%]

=================================== FAILURES ===================================
___________________________________ test_run ___________________________________
ValueError: None is not a valid DagRunState

During handling of the above exception, another exception occurred:

    def test_run():
>       dag.clear(dag_run_state=State.NONE)

../../utils/__init__.py:22:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../../airflow/utils/session.py:71: in wrapper
    return func(*args, session=session, **kwargs)
../../../../airflow/models/dag.py:1856: in clear
    dag_run_state=dag_run_state,
../../../../airflow/models/taskinstance.py:289: in clear_task_instances
    dag_run_state = DagRunState(dag_run_state)  # Validate the state value.
/usr/local/lib/python3.7/enum.py:315: in __call__
    return cls.__new__(cls, value)
/usr/local/lib/python3.7/enum.py:569: in __new__
    raise exc
/usr/local/lib/python3.7/enum.py:553: in __new__
    result = cls._missing_(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <enum 'DagRunState'>, value = None

    @classmethod
    def _missing_(cls, value):
>       raise ValueError("%r is not a valid %s" % (value, cls.__name__))
E       ValueError: None is not a valid DagRunState

/usr/local/lib/python3.7/enum.py:582: ValueError
---------------------------- Captured stdout setup -----------------------------
========================= AIRFLOW ==========================
Home of the user: /root
Airflow home /root/airflow
Skipping initializing of the DB as it was initialized already.
You can re-initialize the database by adding --with-db-init flag when running tests.
=============================== warnings summary ===============================
../../../../airflow/configuration.py:398
  /opt/airflow/airflow/configuration.py:398: FutureWarning: The 'dag_default_view' setting in [webserver] has the old default value of 'tree'. This value has been changed to 'grid' in the running config, but please update your config before Apache Airflow 3.0.
    FutureWarning,

../../../../airflow/configuration.py:398
  /opt/airflow/airflow/configuration.py:398: FutureWarning: The 'log_filename_template' setting in [logging] has the old default value of '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'. This value has been changed to 'dag_id={{ ti.dag_id }}/run_id={{ ti.run_id }}/task_id={{ ti.task_id }}/{%% if ti.map_index >= 0 %%}map_index={{ ti.map_index }}/{%% endif %%}attempt={{ try_number }}.log' in the running config, but please update your config before Apache Airflow 3.0.
    FutureWarning,

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED example_elasticsearch_query.py::test_run - ValueError: None is not a v...
======================== 1 failed, 2 warnings in 1.15s =========================

@Bowrna Bowrna force-pushed the elasticsearch-new-system-test branch 2 times, most recently from 2ae1af1 to 7913827 Compare April 7, 2022 17:29
@Bowrna Bowrna marked this pull request as ready for review April 7, 2022 17:29
@Bowrna Bowrna requested a review from mik-laj as a code owner April 7, 2022 17:29
Comment on lines +29 to +30
DAG_ID = 'elasticsearch_dag'
CONN_ID = 'elasticsearch_default'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason you moved these globals? They aren't being pulled from ENV and are each only used once, seems perfectly reasonable to me to keep them inlined as they were in the sample DAG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. I forgot the AIP suggested moving them to globals. Carry on 😛

@Bowrna
Copy link
Contributor Author

Bowrna commented Apr 9, 2022

I am not sure why static check fails here. If anyone knows how to fix this, can you tell me?

@Bowrna Bowrna changed the title AIP-47 es new system tests AIP-47 Migrate ElasticSearch into new system tests Apr 10, 2022
@joppevos
Copy link
Contributor

joppevos commented Apr 10, 2022

I get the following error when I run the test using pytest in breeze environment for Elasticsearch. Could you share your insights on where I am going wrong?

First of all, good that you mention this issue! 👍
I got the same error, but only after running the tests for the second time. I think there is some state being written somewhere, but I have not figured out where yet.
My hacky solution is to run the tests within breeze. Then every iteration you restart the container and re-run the tests.

This is far from ideal, so I'd also be interested in hearing others' opinions before I set out.

@Bowrna Bowrna force-pushed the elasticsearch-new-system-test branch from 7913827 to 69f77a7 Compare April 11, 2022 03:53
@bhirsz
Copy link
Contributor

bhirsz commented Apr 11, 2022

This is far from ideal, so I'd also be interested in hearing others' opinions before I set out.

It was also noticed in my team last week unfortunately - one of the workaround was to restart the Breeze environment like you mention. The other workaround is to use a unique DAG id between runs (something along DAG_ID = f"name_{uuid.uuid4()}". I will look into this issue and see how it can be fixed.

tags=["example", "elasticsearch"],
) as dag:
# [START howto_elasticsearch_query]
execute_query = show_tables()
Copy link
Contributor

@bhirsz bhirsz Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that sphinx will only take content between START and END markers. So your documentation will show:

execute_query = show_tables()

which is not telling much. Unless it's what you intendent consider moving the markers to where you define the show_tables:

 [START howto_elasticsearch_query]
@task(task_id='es_print_tables')
def show_tables():
(method body)
 [END howto_elasticsearch_query]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Moving them to inside of the method works even better.

@bhirsz
Copy link
Contributor

bhirsz commented Apr 11, 2022

@Bowrna @joppevos We used State.None to clear the state of the DAG but it's restricted only to tasks:

# These are TaskState only
NONE = None

I think default state for the DAG should be used instead -> DagRunState.QUEUED. I will update my PR with this change.

@ferruzzi
Copy link
Contributor

I am not sure why static check fails here. If anyone knows how to fix this, can you tell me?

Looks like it's a problem with main, not your code. You can make it retry by closing and reopening the PR, or it will try again when you add a new commit.

@Bowrna Bowrna closed this Apr 12, 2022
@Bowrna Bowrna reopened this Apr 12, 2022
@Bowrna Bowrna force-pushed the elasticsearch-new-system-test branch 2 times, most recently from 9415648 to f5fc2cf Compare April 12, 2022 16:18
@Bowrna Bowrna force-pushed the elasticsearch-new-system-test branch from f5fc2cf to 6ec75ff Compare April 13, 2022 06:51
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @bhirsz ?

@bhirsz
Copy link
Contributor

bhirsz commented Apr 13, 2022

LGTM. @bhirsz ?

@potiuk LGTM too. Good PR @Bowrna!

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Apr 13, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit a801ea3 into apache:main Apr 13, 2022
@Bowrna Bowrna mentioned this pull request Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Elasticsearch example DAGs to new design
5 participants