Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow dev/build process on large datasets due to CHECK TABLE being enabled by default. #9966

Open
brettt89 opened this issue Jun 4, 2021 · 4 comments

Comments

@brettt89
Copy link
Contributor

brettt89 commented Jun 4, 2021

Affected Version

All

Description

While investigating performance issues with dev/build processes on larger applications with large datasets, it has been identified that performing a CHECK TABLE on every Silverstripe table for each release causes the process to take a large amount of time to complete.

Reference: https://dev.mysql.com/doc/refman/5.7/en/check-table.html

Steps to Reproduce

Most noticeable on tables using silverstripe-versioning with large datasets in those tables.
E.g. ChangeSetItem table from Silverstripe Versioning with 5 Million row records.

CHECK TABLE takes 5+ minutes to complete on MySQL with 2vCPU and 4GB RAM.

mysql> CHECK TABLE `ChangeSetItem`;
+---------------------------------+-------+----------+----------+
| Table                           | Op    | Msg_type | Msg_text |
+---------------------------------+-------+----------+----------+
| SS_mysite.ChangeSetItem         | check | status   | OK       |
+---------------------------------+-------+----------+----------+
1 row in set (312.81 sec)

Workaround

Silverstripe 4.1+

Developers can opt-in to disabling this currently by using the following YML configuraiton.

# Disable CHECK TABLE and REPAIR TABLE commands during dev/build
SilverStripe\ORM\Connect\MySQLSchemaManager:
  check_and_repair_on_build: false
@brettt89
Copy link
Contributor Author

brettt89 commented Jun 4, 2021

Recommend disabling this setting by default and making it opt-in. AND/OR changing logic so that CHECK TABLE / REPAIR TABLE is part of a different flow (e.g. Is not executed on every dev/build call, but only when required / wanted or perhaps through different dev/ call all together.).

Alternatively it could be removed entirely in favor of CHECK TABLE being a Database Administrator task rather than an application task (E.g. Scheduled checks of Tables in database outside of Application). This could work as CHECK TABLE calls to a database do not require any context of PHP Code.

@brettt89
Copy link
Contributor Author

brettt89 commented Jun 4, 2021

Example performance improvements observed on project.

Database Size: 3.1GB
Largest table size: 5+ million rows

Changes

SilverStripe\ORM\Connect\MySQLSchemaManager:
  check_and_repair_on_build: false

dev/build execution time

before changes: ~21 minutes
after changes: ~4 minutes

@brynwhyman
Copy link
Contributor

brynwhyman commented Jun 27, 2021

Talking about this internally, it sounds like there's a few more options here. Instead of changing the default value, we could also:

a. Document this workaround as a valid option for those who are experiencing performance issues during dev/build. Feels like if we don't have one already, an existing dev docs page for sites with very large sites could be worthwhile.
b. Have a URL paramater on dev/build that disable this behaviour, this would allow deployment tools to enable or disable this on request on a more global state.
c. Add an environment variable which if present would disable this check behaviour. Would support project developers or operations teams to add this for projects on a case by case basis.

Before changing the default value in framework we'd want to spend more time researching what the risks of doing so would be, and why it was added in the first place.

@brettt89 do you have any thoughts on the above options?

@brettt89
Copy link
Contributor Author

@brynwhyman Agree that are probably more options here than my initial thoughts.

CHECK TABLE has a purpose within Database Administration and is used to ensure stability of Tables within a database. As such it can make sense for some applications to want to do this on dev/build (E.g. Perhaps they have critical data that needs to have this functionality).

So removing it entirely may be counter productive, and its only a massive performance issue for 'Large Database Tables'. So perhaps a combination of ideas in that it can be "enabled/disabled 'per-table'" + heavier documentation around performance considerations on large datasets / websites.

e.g. Ability for developers to disable CHECK TABLE for all _Versions tables or specific tables ONLY, etc.

  • Better documentation on how developers can use this functionality to help with performance during dev/build and considerations that should be made (e.g. Data integrity).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants