Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3561] Improve queries #4368

Merged
merged 18 commits into from
Dec 27, 2018
Merged

[AIRFLOW-3561] Improve queries #4368

merged 18 commits into from
Dec 27, 2018

Conversation

ffinfo
Copy link
Contributor

@ffinfo ffinfo commented Dec 23, 2018

Make sure you have checked all steps below.

Jira

Description

This pull request will reduce interaction with the DagBag en improves database queries.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@codecov-io
Copy link

codecov-io commented Dec 23, 2018

Codecov Report

Merging #4368 into master will increase coverage by 0.02%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4368      +/-   ##
==========================================
+ Coverage   78.14%   78.17%   +0.02%     
==========================================
  Files         202      204       +2     
  Lines       16486    16530      +44     
==========================================
+ Hits        12883    12922      +39     
- Misses       3603     3608       +5
Impacted Files Coverage Δ
airflow/models/__init__.py 92.81% <100%> (+0.04%) ⬆️
airflow/www_rbac/views.py 72.59% <100%> (+0.16%) ⬆️
airflow/www/views.py 69.31% <80%> (-0.08%) ⬇️
airflow/utils/db.py 32.28% <0%> (-0.26%) ⬇️
airflow/models/connection.py 88.6% <0%> (ø) ⬆️
airflow/models/dagpickle.py 94.11% <0%> (ø)
airflow/models/base.py 85.71% <0%> (ø)
airflow/jobs.py 77.41% <0%> (+0.02%) ⬆️
airflow/bin/cli.py 64.52% <0%> (+0.04%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c014324...4db22b1. Read the comment docs.

return self._default_view

def get_default_view(self):
"""This is only there for backward compatible jina2 templates"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something here. Why can't the templates continue using the default_view property? Also, s/jina2/jinja2/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to do with https://issues.apache.org/jira/browse/AIRFLOW-3562.
Currently DAG is still used in other templates, can't change all those at once. Once all views has switched to DagModel this method can be removed again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps: fixing typo ;)

@@ -4201,6 +4222,8 @@ def sync_to_db(self, owner=None, sync_time=None, session=None):
orm_dag.owners = owner
orm_dag.is_active = True
orm_dag.last_scheduler_run = sync_time
orm_dag.default_view = self._default_view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test this in tests.models:DagTest.test_sync_to_db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, added a test

@@ -3249,6 +3259,17 @@ def __exit__(self, _type, _value, _tb):

# /Context Manager ----------------------------------------------

@property
def default_view(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test this property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, added a test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior how differs from the get_default_view method. Maybe we can remove the get_default_view method completely: https://stackoverflow.com/questions/49134141/how-do-i-reference-an-object-property-in-a-method-call-in-jinja2

This would reduce the number of methods on the DAG object. Besides that, I believe that dag.default_view should return the same as dag.get_default_view() since the names are so similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko Agree there. I did now remove all usage of dag.default_view and switch it to dag.get_default_view()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko: Can't use just default_view because in DagModel this can be None and get_default_view has a fall back to the config when not set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@jmcarp
Copy link
Contributor

jmcarp commented Dec 26, 2018

Until #4339 is merged, we have to make duplicate changes to airflow/www/views and airflow/www_rbac/views.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestions, apart from that it looks great!

@@ -3249,6 +3259,17 @@ def __exit__(self, _type, _value, _tb):

# /Context Manager ----------------------------------------------

@property
def default_view(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior how differs from the get_default_view method. Maybe we can remove the get_default_view method completely: https://stackoverflow.com/questions/49134141/how-do-i-reference-an-object-property-in-a-method-call-in-jinja2

This would reduce the number of methods on the DAG object. Besides that, I believe that dag.default_view should return the same as dag.get_default_view() since the names are so similar.

qry = (
session.query(ds.dag_id, ds.state, ds.count)
)
states = session.query(dr.dag_id, dr.state, sqla.func.count(dr.state)).group_by(dr.dag_id, dr.state)

data = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give this a bit more meaningful name? Something like dag_state_stats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Fokko
Copy link
Contributor

Fokko commented Dec 26, 2018

Two related CI issues:


======================================================================
3) ERROR: test_sync_to_db (tests.DagTest)
----------------------------------------------------------------------
   Traceback (most recent call last):
    .tox/py27-backend_mysql-env_docker/lib/python2.7/site-packages/mock/mock.py line 1305 in patched
      return func(*args, **keywargs)
    tests/models.py line 749 in test_sync_to_db
      self.assertIsNone(orm_dag._default_view)
   AttributeError: 'DagModel' object has no attribute '_default_view'
   -------------------- >> begin captured stdout << ---------------------
   [2018-12-26 08:08:11,955] {__init__.py:4219} INFO - Creating ORM DAG for dag
   [2018-12-26 08:08:11,963] {__init__.py:4219} INFO - Creating ORM DAG for dag.subtask
   
   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   airflow.models.DAG: INFO: Creating ORM DAG for dag
   airflow.models.DAG: INFO: Creating ORM DAG for dag.subtask
   --------------------- >> end captured logging << ---------------------
======================================================================
4) ERROR: test_sync_to_db_default_view (tests.DagTest)
----------------------------------------------------------------------
   Traceback (most recent call last):
    .tox/py27-backend_mysql-env_docker/lib/python2.7/site-packages/mock/mock.py line 1305 in patched
      return func(*args, **keywargs)
    tests/models.py line 781 in test_sync_to_db_default_view
      self.assertIsNotNone(orm_dag._default_view)
   AttributeError: 'DagModel' object has no attribute '_default_view'

@Fokko Fokko changed the title AIRFLOW-3561 - improve queries [AIRFLOW-3561] Improve queries Dec 26, 2018
@ffinfo
Copy link
Contributor Author

ffinfo commented Dec 26, 2018

Rbac is now also fixed

aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
* improve queries

* Adding field to the database

* Set length of field

* remove dagbag use in xcom call

* Fixing typo

* Adding test

* Remove default_view

* fixing test

* rename var

* Fixing rbac dag_stats

* Fixing rbac task_stats

* Fixing rbac code

* Fixing rbac xcom

* Fixing template

* Fixing default view call

* Added timezone to DagModel

* Fixing timezone

def __repr__(self):
return "<DAG: {self.dag_id}>".format(self=self)

@property
def timezone(self):
return settings.TIMEZONE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait up - this looks incorrect. Why was this removed?

The timezone of a DAG was previously defined as the timezone of the start_date. Is that still the case and if we do self.timezone = tz then this property accessor won't be used anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean removed, I mean added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is on the DagModel, not the DAG. Right.

In which case my question is: when/where is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a bit tricky. I have added this because in the html templates this variable is used. Some templates now takes a DagModel instead of a DAG. In the end I would like this to be in the complete webserver

kaxil pushed a commit that referenced this pull request Jan 19, 2019
* improve queries

* Adding field to the database

* Set length of field

* remove dagbag use in xcom call

* Fixing typo

* Adding test

* Remove default_view

* fixing test

* rename var

* Fixing rbac dag_stats

* Fixing rbac task_stats

* Fixing rbac code

* Fixing rbac xcom

* Fixing template

* Fixing default view call

* Added timezone to DagModel

* Fixing timezone
@ashb
Copy link
Member

ashb commented Mar 5, 2019

We reverted this from master? Did it ever make it back in in any form?

@ffinfo
Copy link
Contributor Author

ffinfo commented Mar 7, 2019

@ashb Seems to be still in there:
https://github.com/apache/airflow/blame/master/airflow/www/views.py#L299

Because of the switch to the rbac interface it's was moved I think

ashb pushed a commit to ashb/airflow that referenced this pull request Mar 20, 2019
* improve queries

* Adding field to the database

* Set length of field

* remove dagbag use in xcom call

* Fixing typo

* Adding test

* Remove default_view

* fixing test

* rename var

* Fixing rbac dag_stats

* Fixing rbac task_stats

* Fixing rbac code

* Fixing rbac xcom

* Fixing template

* Fixing default view call

* Added timezone to DagModel

* Fixing timezone
ashb pushed a commit that referenced this pull request Apr 17, 2019
#5111)

We used to key on `safe_dag_id` but in that got changed in #4368 to use a 
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.
ashb pushed a commit that referenced this pull request Apr 17, 2019
#5111)

We used to key on `safe_dag_id` but in that got changed in #4368 to use a
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.
amichai07 pushed a commit to amichai07/incubator-airflow that referenced this pull request Apr 22, 2019
apache#5111)

We used to key on `safe_dag_id` but in that got changed in apache#4368 to use a
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.

(cherry picked from commit c696dd4)
amichai07 pushed a commit to bluevine-dev/airflow that referenced this pull request Apr 22, 2019
apache#5111)

We used to key on `safe_dag_id` but in that got changed in apache#4368 to use a 
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.
krzysztof-indyk pushed a commit to FlyrInc/apache-airflow that referenced this pull request Jun 3, 2019
apache#5111)

We used to key on `safe_dag_id` but in that got changed in apache#4368 to use a
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.

(cherry picked from commit c696dd4)
andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
apache#5111)

We used to key on `safe_dag_id` but in that got changed in apache#4368 to use a 
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
* improve queries

* Adding field to the database

* Set length of field

* remove dagbag use in xcom call

* Fixing typo

* Adding test

* Remove default_view

* fixing test

* rename var

* Fixing rbac dag_stats

* Fixing rbac task_stats

* Fixing rbac code

* Fixing rbac xcom

* Fixing template

* Fixing default view call

* Added timezone to DagModel

* Fixing timezone
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
apache#5111)

We used to key on `safe_dag_id` but in that got changed in apache#4368 to use a 
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.
dharamsk pushed a commit to postmates/airflow that referenced this pull request Aug 8, 2019
apache#5111)

We used to key on `safe_dag_id` but in that got changed in apache#4368 to use a
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.
kkmueller pushed a commit to postmates/airflow that referenced this pull request Aug 23, 2019
apache#5111)

We used to key on `safe_dag_id` but in that got changed in apache#4368 to use a 
more efficient query, which broke for non-sub-DAGs that contain a `.` in their id.

Arguably this escaping is a front-end concern anyway, so handling the escaping in the front end makes sense anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants