Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MC-21637: Adds proposal for refactoring of product import #324

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jean-bernard-valentaten
Copy link
Contributor

Problem

Product import currently relies on MySQL AUTO_INCREMENT which leads to issues when using MySQL 8

Solution

Either make use of a construct similar to sequences or change primary key to use UUIDs instead of a serial key

Requested Reviewers

@paliarush
@akaplya


Although using UUIDs instead of serial ids would be a great option, migrating existing data could become troublesome and
the overall performance of using a `char` field for primary key needs to be benchmarked before even attempting to use
UUIDs. Additionally this is probably a BIC, thus it is not an option for a patch release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to describe expected (if any) BiC changes for each approach.

@buskamuza
Copy link
Contributor

Considering all aspects, I'd suggest switching to generic sequences. In the long run I'd suggest to support sequences in
the declarative schema and let the database abstraction layer decide how these are implemented for arbitrary RDBMS (e.g.
native sequences in Oracle, PostgreSQL and MSSQL, generic sequence table in MySQL / MariaDB etc.). The `identity`
attribute of columns in the declarative schema could then be used to indicate which column is fed by the sequence and
Copy link
Contributor

@tariqjawed83 tariqjawed83 Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For distributed systems, it will not be possible to scale this kind of solution, where for generating ids we are relying on the database itself; we need to control at the program level how we want to generate the IDs. Twitter was using similar sequence for MySQL using AUTO_INCREMENT; but they realized it was not possible to scale this approach.

I would suggest looking into Twitter Snowflake
https://github.com/twitter-archive/snowflake/tree/snowflake-2010

Another article around this subject is this
https://www.callicoder.com/distributed-unique-id-sequence-number-generator/

Although currently many of Magento customers running on single-node database, but in the cloud we would need to support large scale bulk operations in distributed style, in which case we would be able to allow multiple threads pushing the data; so an optimized service that can handle this scale would be better approach; which doesn't rely on hitting database for Ids.

Snowflake uses 8 byte unsigned integers, currently Magento using 4 byte unsigned integer, not-sure whether it will be BIC, it seems like a PATCH? according to https://devdocs.magento.com/guides/v2.3/extension-dev-guide/versioning/codebase-changes.html

But the idea is to centralize the ID generation and take it outside of database itself; the service should be similarly scalable as other services to support the load.

This service can provide Bulk IDs as well, and many implementations I have seen, uses Redis or similar Cache to store that Bunch of IDs, in order to avoid unnecessary calls to ID Generation Service as well. We may still be able to use Magento SequenceInterface, while underlying calling this new Distributed ID Service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jean-bernard-valentaten I reviewed the proposal in detail, please review my comments above (proposed direction); and let me know your thoughts/feedback?

this is somewhat similar to #323 (@larsroettig) because underlying root issue is same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants