-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sync performance #2129
Comments
Comment Author: @mathjazz We're no longer hitting the problem since we started using the Performance-M worker. |
Comment Author: Anand <github@anandthakker.net> I was just trying out a Fluent migration on my Pontoon instance, and even using a Standard 2X dyno for the worker, I got some "R14 - Memory quota exceeded (degraded performance)" errors during the sync. More interestingly: I noticed that even after the sync was complete, I'm still seeing those errors every few seconds:
Which makes me think that there may be a leak somewhere. |
When syncing large projects with project configuration files (e.g. Mozilla.org), we sometimes need to upgrade Heroku to Performance-L dynos for the task to complete. |
See https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.iterator
|
This issue was created automatically by a script.
Bug 1460348
Bug Reporter: @mathjazz
CC: @lonnen, @flodolo, github@anandthakker.net
Note:
This is not a new bug. It's been hitting us since we migrated Pontoon to Heroku and has been tracked under bug 1214411 initially.
--
Details:
Heroku dyno used as a worker for the sync process often runs out of memory. We see two types of errors in the logs:
R14 - Memory quota exceeded (degraded performance):
https://devcenter.heroku.com/articles/error-codes#r14-memory-quota-exceeded
R15 - Memory quota vastly exceeded (dyno is killed, sync breaks):
https://devcenter.heroku.com/articles/error-codes#r15-memory-quota-vastly-exceeded
--
Previous attempts at fixing the problem:
To address the problem, we made several optimizations to the sync process in the past, two of which stand out:
We fixed bug 1214411 (which tracked this problem initially) by detecting which files changed in VCS and only syncing those. That stopped aforementioned error messages from appearing constantly and only showing up when a bigger changeset is synced.
We fixed bug 1383252 by greatly reducing the costly hg clone operations. That reduced the average sync time from 20 to 2 minutes and allowed us to switch from using 3 Standard-2X dynos to 1 Standard-1X (also reducing the worker dyno cost by a factor of 6).
--
Current status:
We mostly see the error when bigger changeset are processed, e.g. when we run Fluent migrations or when projects that store translations in big bilingual files are synced (e.g. SUMO, AMO, MDN).
To avoid losing the worker (and damaging the sync process), we manually upgrade the sync worker to Performance-M before we run Fluent migrations, but that makes the process more manual than it could be and doesn't scale. We don't know for example when new SUMO strings will land.
--
Plan:
We should investigate what's the root cause of the problem and figure out if we can fix it programatically. A possible suspect is that the increased memory consumption is caused by the reduced number of DB queries (which are now bigger) and extensive use of prefetching, which are needed for performance reasons.
The other solution is to permanently upgrade the sync worker to a more expensive Performance-M (https://www.heroku.com/pricing), which works reliably. It's also dedicated.
The text was updated successfully, but these errors were encountered: