refactor: Ensure Celery leverages the Flask-SQLAlchemy session #26186

john-bodley · 2023-12-05T20:17:06Z

SUMMARY

As part of SIP-99 (specifically SIP-99A and SIP-99B) in order to ensure a consistent "unit of work"—via a single atomic unit—all operations should be associated with the same Flask-SQLAlchemy session.

The Flask-SQLAlchemy extension provides a scoped session (on a per request basis) with the necessary oversight, i.e., the session is closed after the request is complete which aids with connection pool management.

Historically Celery tasks have defined their own scoped session (with the option to use connection pooling) which needed to managed independently which added unnecessary code bloat and complexity and likely violated the "unit of work" construct if operations were leveraging both the Flask-SQLAlchemy and Celery sessions. Per this post it seems like Celery can piggyback off of the Flask-SQLAlchemy session so long as setup/teardown is handled correctly. There is actually an example of this in the official Flask documentation per the Celery with Flask document where it references the use of the Flask-SQLAlchemy session (db.session).

This PR removes the need for a Celery specific session which is a step towards the goal of having all database operations (outside of the Alembic migrations) handled by the global db.session which is a necessary requirement in order for us to achieve the goal of an atomic unit of work.

The one wrinkle with this approach is #10819 which explicitly leveraged a NullPool (as opposed to QueuePool—if defined via the SQLALCHEMY_ENGINE_OPTIONS configuration) to avoid using same session across multiple celery workers, however since we're tearing down the session a per task basis where, per here, it states,

Flask SQLAlchemy will automatically create new sessions for you from scoped session factory, given that we are maintaining the same app context, this ensures tasks have a fresh session (e.g. session errors won't propagate across tasks).

i.e., we should no longer be experiencing the connection bleeding issue and thus can leverage the efficiency of connection pooling which standardizing the code.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI. Additionally I wasn't able to repo the issue mentioned in #10530.

ADDITIONAL INFORMATION

Has associated issue: [SIP-99A] Primer on managing SQLAlchemy sessions #25107
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2023-12-05T23:27:23Z

Codecov Report

Attention: 38 lines in your changes are missing coverage. Please review.

Comparison is base (aaa4a7b) 69.08% compared to head (d244d32) 66.84%.

Files	Patch %	Lines
superset/tasks/celery_app.py	0.00%	8 Missing ⚠️
superset/sql_lab.py	73.91%	6 Missing ⚠️
superset/commands/report/log_prune.py	69.23%	4 Missing ⚠️
superset/db_engine_specs/trino.py	50.00%	4 Missing ⚠️
superset/commands/report/execute.py	84.21%	3 Missing ⚠️
superset/db_engine_specs/hive.py	25.00%	3 Missing ⚠️
superset/db_engine_specs/impala.py	40.00%	3 Missing ⚠️
superset/db_engine_specs/presto.py	40.00%	3 Missing ⚠️
superset/daos/report.py	33.33%	2 Missing ⚠️
superset/db_engine_specs/ocient.py	50.00%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #26186      +/-   ##
==========================================
- Coverage   69.08%   66.84%   -2.25%     
==========================================
  Files        1931     1930       -1     
  Lines       75351    75294      -57     
  Branches     8429     8429              
==========================================
- Hits        52056    50330    -1726     
- Misses      21148    22817    +1669     
  Partials     2147     2147

Flag	Coverage Δ
hive	`?`
mysql	`77.88% <68.85%> (-0.04%)`	⬇️
postgres	`77.98% <68.85%> (-0.04%)`	⬇️
presto	`?`
python	`78.11% <68.85%> (-4.66%)`	⬇️
sqlite	`77.56% <68.85%> (-0.04%)`	⬇️
unit	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

john-bodley · 2023-12-06T18:45:59Z

superset/sql_lab.py

-                stats_logger.incr("error_sqllab_unhandled")
-                query = get_query(query_id, session)
-                return handle_query_error(ex, query, session)
+    with override_user(security_manager.find_user(username)):


Pretty much the same code as previously without the outer with block.

john-bodley · 2023-12-06T18:46:31Z

superset/tasks/cache.py

@@ -130,28 +123,24 @@ def __init__(self, top_n: int = 5, since: str = "7 days ago") -> None:
        self.since = parse_human_datetime(since) if since else None

    def get_payloads(self) -> list[dict[str, int]]:
-        payloads = []
-        session = db.create_scoped_session()
+        records = (


Pretty much the same code as previously without the try block.

john-bodley · 2023-12-06T18:46:53Z

superset/tasks/cache.py

-                        TaggedObject.object_type == "dashboard",
-                        TaggedObject.tag_id.in_(tag_ids),
-                    )
+        tags = db.session.query(Tag).filter(Tag.name.in_(self.tags)).all()


Pretty much the same code as previously without the try block.

john-bodley · 2023-12-06T18:47:11Z

superset/tasks/scheduler.py

-        for active_schedule in active_schedules:
-            for schedule in cron_schedule_window(
-                triggered_at, active_schedule.crontab, active_schedule.timezone
+    active_schedules = ReportScheduleDAO.find_active()


Pretty much the same code as previously without the outer with block.

john-bodley · 2024-01-02T20:11:54Z

ping @michael-s-molina @villebro

michael-s-molina · 2024-01-03T12:53:51Z

@john-bodley I think it would be great to label this PR with v4.0 and merge it during the breaking window to reuse the test/stabilization efforts that will occur during that period.

john-bodley · 2024-01-03T21:55:00Z

Per,

@john-bodley I think it would be great to label this PR with v4.0 and merge it during the breaking window to reuse the test/stabilization efforts that will occur during that period.

as discussed with @michael-s-molina, though this is technically a non-breaking change, it seems prudent (from a safety perspective) to hold off merging this until the v4.0 breaking window.

michael-s-molina

LGTM. I'm assuming that removing unnecessary commit operations is not in the scope of this PR.

superset/tasks/celery_app.py

…e#26186)

pull-request-size bot added the size/XL label Dec 5, 2023

john-bodley force-pushed the john-bodley--sip-99-celery-session branch 7 times, most recently from e73ec92 to a9b38d9 Compare December 5, 2023 23:10

john-bodley changed the title ~~chore: Ensure Celery tasks levergae Flask-SQLAlchemy session~~ chore: Ensure Celery tasks leverage Flask-SQLAlchemy session Dec 6, 2023

john-bodley requested review from betodealmeida, eschutho, villebro and michael-s-molina December 6, 2023 18:34

john-bodley marked this pull request as ready for review December 6, 2023 18:34

john-bodley commented Dec 6, 2023

View reviewed changes

john-bodley changed the title ~~chore: Ensure Celery tasks leverage Flask-SQLAlchemy session~~ chore: Ensure Celery leverages Flask-SQLAlchemy session Dec 6, 2023

john-bodley mentioned this pull request Dec 6, 2023

refactor: Ensure Flask framework leverages the Flask-SQLAlchemy session (Phase I) #26200

Merged

9 tasks

michael-s-molina changed the title ~~chore: Ensure Celery leverages Flask-SQLAlchemy session~~ refactor: Ensure Celery leverages Flask-SQLAlchemy session Dec 7, 2023

john-bodley changed the title ~~refactor: Ensure Celery leverages Flask-SQLAlchemy session~~ refactor: Ensure Celery leverages the Flask-SQLAlchemy session Dec 11, 2023

john-bodley mentioned this pull request Dec 11, 2023

refactor: Ensure Flask-Migrate leverages the Flask-SQLAlchemy session #26172

Closed

9 tasks

michael-s-molina added hold! On hold v4.0 Label added by the release manager to track PRs to be included in the 4.0 branch labels Jan 3, 2024

michael-s-molina approved these changes Jan 4, 2024

View reviewed changes

superset/tasks/celery_app.py Show resolved Hide resolved

michael-s-molina removed the hold! On hold label Jan 16, 2024

fix: Migration order due to cherry which went astray (apache#26160)

d244d32

john-bodley force-pushed the john-bodley--sip-99-celery-session branch from 21272ca to d244d32 Compare January 17, 2024 00:40

pull-request-size bot added size/XXL and removed size/XL labels Jan 17, 2024

john-bodley merged commit 7af82ae into apache:master Jan 17, 2024
33 checks passed

john-bodley deleted the john-bodley--sip-99-celery-session branch January 17, 2024 04:07

dpgaspar mentioned this pull request Jan 17, 2024

fix: remove possible unnecessary file 1 #26649

Merged

9 tasks

sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024

refactor: Ensure Celery leverages the Flask-SQLAlchemy session (apach…

0a8c430

…e#26186)

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.0.0 labels Apr 17, 2024

vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024

refactor: Ensure Celery leverages the Flask-SQLAlchemy session (apach…

3a359ad

…e#26186)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Ensure Celery leverages the Flask-SQLAlchemy session #26186

refactor: Ensure Celery leverages the Flask-SQLAlchemy session #26186

john-bodley commented Dec 5, 2023 •

edited

Loading

codecov bot commented Dec 5, 2023 •

edited

Loading

john-bodley Dec 6, 2023

john-bodley Dec 6, 2023

john-bodley Dec 6, 2023

john-bodley Dec 6, 2023

john-bodley commented Jan 2, 2024

michael-s-molina commented Jan 3, 2024

john-bodley commented Jan 3, 2024

michael-s-molina left a comment

refactor: Ensure Celery leverages the Flask-SQLAlchemy session #26186

refactor: Ensure Celery leverages the Flask-SQLAlchemy session #26186

Conversation

john-bodley commented Dec 5, 2023 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Dec 5, 2023 • edited Loading

Codecov Report

john-bodley Dec 6, 2023

Choose a reason for hiding this comment

john-bodley Dec 6, 2023

Choose a reason for hiding this comment

john-bodley Dec 6, 2023

Choose a reason for hiding this comment

john-bodley Dec 6, 2023

Choose a reason for hiding this comment

john-bodley commented Jan 2, 2024

michael-s-molina commented Jan 3, 2024

john-bodley commented Jan 3, 2024

michael-s-molina left a comment

Choose a reason for hiding this comment

john-bodley commented Dec 5, 2023 •

edited

Loading

codecov bot commented Dec 5, 2023 •

edited

Loading