Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration PostgreSQL to MariaDB: Shortening Columns #1166

Closed
bschmalhofer opened this issue Aug 6, 2021 · 5 comments
Closed

Migration PostgreSQL to MariaDB: Shortening Columns #1166

bschmalhofer opened this issue Aug 6, 2021 · 5 comments
Assignees
Milestone

Comments

@bschmalhofer
Copy link
Contributor

Another issue. When migrating from OTRS(Pg) to OTOBO(MySQL) I see many of the following message in the logfile:

[Fri Aug 6 16:53:36 2021] -e: Use of uninitialized value in numeric gt (>) at /opt/otobo/Kernel/System/MigrateFromOTRS/CloneDB/Driver/Base.pm line 505.

This is likely nothing to worry about, as the migrated database seems to work fine. But let's investigate. The line in question is:

            next SOURCE_COLUMN unless $SourceColumnInfos->{LENGTH} > $TargetColumnInfos->{LENGTH};

Dumping the infos about $SourceColumnInfos and $TargetColumnInfos we get:


[Fri Aug  6 16:53:36 2021] -e: $VAR1 = {
[Fri Aug  6 16:53:36 2021] -e:           'IS_NULLABLE' => 'YES',
[Fri Aug  6 16:53:36 2021] -e:           'COLUMN' => 'a_from',
[Fri Aug  6 16:53:36 2021] -e:           'DATA_TYPE' => 'character varying',
[Fri Aug  6 16:53:36 2021] -e:           'LENGTH' => undef
[Fri Aug  6 16:53:36 2021] -e:         };
[Fri Aug  6 16:53:36 2021] -e: $VAR2 = {
[Fri Aug  6 16:53:36 2021] -e:           'LENGTH' => 16777215,
[Fri Aug  6 16:53:36 2021] -e:           'DATA_TYPE' => 'mediumtext',
[Fri Aug  6 16:53:36 2021] -e:           'COLUMN' => 'a_from',
[Fri Aug  6 16:53:36 2021] -e:           'IS_NULLABLE' => 'YES'
[Fri Aug  6 16:53:36 2021] -e:         };

This makes sense. The source is PostgreSQL with the datatype 'character varying' and the target is MySQL with the data type 'mediumtext'. The length on the MySQL side checks out. A mediumtext holds 16777215 bytes, about 16MB. But why is there no length on the PostgreSQL side?

The declaration of article_data_mime.a_from is in otobo-schema.xml:

Checking the functions _TypeTranslation() in mysql.pm and in postgresql.pm we get confirmation that the size 1_800_000 becomes MEDIUMTEXT on MySQL and VARCHAR on PostgreSQL. It just so happens that MEDIUMTEXT (MySQL) has a size restritction while VARCHAR (Pg) has no restriction.

MySQL:

    if ( $Tag->{Type} =~ /^VARCHAR$/i ) {
        if ( $Tag->{Size} > 16777215 ) {
            $Tag->{Type} = 'LONGTEXT';
        }
        elsif ( $Tag->{Size} > 65535 ) {
            $Tag->{Type} = 'MEDIUMTEXT';
        }
        elsif ( $Tag->{Size} > 255 ) {
            $Tag->{Type} = 'TEXT';
        }
        else {
            $Tag->{Type} = 'VARCHAR (' . $Tag->{Size} . ')';
        }
    }

PostgreSQL:

    elsif ( $Tag->{Type} =~ /^VARCHAR$/i ) {
        $Tag->{Type} = 'VARCHAR (' . $Tag->{Size} . ')';
        if ( $Tag->{Size} >= 10000 ) {
            $Tag->{Type} = 'VARCHAR';
        }
    }

This means that we have an interesting mix. There are length differences that happen basically by accident, because of differences in data types. On the other hand, there are columns that actually should be shortened.

I tend to take the most simple approach. Treat an undefined length as infinite. Another question is how this plays with encodings. The LENGTH appears to be in bytes but the shortening is done by calling SUBSTRING(). How can this possibly work?

Originally posted by @bschmalhofer in #1149 (comment)

@bschmalhofer
Copy link
Contributor Author

See also #542 and #836

@bschmalhofer
Copy link
Contributor Author

_TypeTranslation() in mysql.pm is in principle broken. When e.g. VARCHAR(60_000) is declared, then it would be mapped into a TEXT. But 60_000 4-Byte characters do not fit into a TEXT field. In practice it's not really relevant, the largest TEXT fleld requires bytes that accomodate 10_000 characters. For the longest MEDIUMTEXT field its 1_800_000 characters. There is no conflict.

For postgresql.pm everything is fine. All columns declared as VARCHAR in otobo-schema.xml are either mapped into the appropriate VARCHAR(n) type or mapped into VARCHAR. And VARCHAR has no size restriction in PostgreSQL.

So, I propose to:

  • fix _TypeTranslation() in Kernel/System/DB/mysql,pm , causing no changes when the present otobo-schema.xml is used
  • Do not shorted unless the types in source and target are both VARCHAR, or 'character varying' with a size up 255 characters.

The latter bullet point is more pragmatic than correct.

@bschmalhofer
Copy link
Contributor Author

Moved the adaption of TypeTranslation() to a separate issue. This change is not trivial and thus should not be done in a patchlevel release.

bschmalhofer added a commit that referenced this issue Aug 10, 2021
bschmalhofer added a commit that referenced this issue Aug 10, 2021
about which columns are shortened.
bschmalhofer added a commit that referenced this issue Aug 10, 2021
@bschmalhofer
Copy link
Contributor Author

Adapted the check for when columns are shortened. The warnings in the error log are gone now. The messages looked sane. Did a testmigration from OTRS (Pg) to OTBOO (MySQL). Some sanity checks on the migrated database looked find.

@bschmalhofer
Copy link
Contributor Author

Let's close this issue, as the test last week looked fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant