Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: switch unique key and column logic to declarative schemadiff analysis #16229

Open
shlomi-noach opened this issue Jun 19, 2024 · 0 comments
Assignees
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Enhancement Logical improvement (somewhere between a bug and feature)

Comments

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Jun 19, 2024

vitess migrations require some analysis on participating columns (type, nullability, etc.) and of the table's unique keys (fiding an appropriate iteration key for the migration).

That analysis takes place today via information_schema:

sqlSelectColumnTypes = `
select
*,
COLUMN_DEFAULT IS NULL AS is_default_null
from
information_schema.columns
where
table_schema=%a
and table_name=%a
`

sqlSelectUniqueKeys = `
SELECT
COLUMNS.TABLE_SCHEMA as table_schema,
COLUMNS.TABLE_NAME as table_name,
COLUMNS.COLUMN_NAME as column_name,
UNIQUES.INDEX_NAME as index_name,
UNIQUES.COLUMN_NAMES as column_names,
UNIQUES.COUNT_COLUMN_IN_INDEX as count_column_in_index,
COLUMNS.DATA_TYPE as data_type,
COLUMNS.CHARACTER_SET_NAME as character_set_name,
LOCATE('auto_increment', EXTRA) > 0 as is_auto_increment,
(DATA_TYPE='float' OR DATA_TYPE='double') AS is_float,
has_subpart,
has_nullable
FROM INFORMATION_SCHEMA.COLUMNS INNER JOIN (
SELECT
TABLE_SCHEMA,
TABLE_NAME,
INDEX_NAME,
COUNT(*) AS COUNT_COLUMN_IN_INDEX,
GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX ASC) AS COLUMN_NAMES,
SUBSTRING_INDEX(GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX ASC), ',', 1) AS FIRST_COLUMN_NAME,
SUM(SUB_PART IS NOT NULL) > 0 AS has_subpart,
SUM(NULLABLE='YES') > 0 AS has_nullable
FROM INFORMATION_SCHEMA.STATISTICS
WHERE
NON_UNIQUE=0
AND TABLE_SCHEMA=%a
AND TABLE_NAME=%a
GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS UNIQUES
ON (
COLUMNS.COLUMN_NAME = UNIQUES.FIRST_COLUMN_NAME
)
WHERE
COLUMNS.TABLE_SCHEMA=%a
AND COLUMNS.TABLE_NAME=%a
ORDER BY
COLUMNS.TABLE_SCHEMA, COLUMNS.TABLE_NAME,
CASE UNIQUES.INDEX_NAME
WHEN 'PRIMARY' THEN 0
ELSE 1
END,
CASE has_nullable
WHEN 0 THEN 0
ELSE 1
END,
CASE has_subpart
WHEN 0 THEN 0
ELSE 1
END,
CASE IFNULL(CHARACTER_SET_NAME, '')
WHEN '' THEN 0
ELSE 1
END,
CASE DATA_TYPE
WHEN 'tinyint' THEN 0
WHEN 'smallint' THEN 1
WHEN 'int' THEN 2
WHEN 'bigint' THEN 3
ELSE 100
END,
COUNT_COLUMN_IN_INDEX
`

sqlShowColumnsFrom = "SHOW COLUMNS FROM `%a`"

sqlGetAutoIncrement = `
SELECT
AUTO_INCREMENT
FROM INFORMATION_SCHEMA.TABLES
WHERE
TABLES.TABLE_SCHEMA=%a
AND TABLES.TABLE_NAME=%a
AND AUTO_INCREMENT IS NOT NULL
`

We want to move away from information_schema based analysis and into programmatic and declarative schemadiff analysis. We already ask schemadiff for instant-ddl capabilities and we generally want it to own as much of schema analysis as possible.

At the resolution of this issue, schemadiff should be able to tell, given two before and after tables, which unique keys ar ebest to use as iteration keys (if any) and what specific details we should know about the columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
1 participant