Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlPipeline to DataPipleine migration is not completing as expected at dataall-dev-dbmigration-stage stage #637

Closed
MATTT-P opened this issue Aug 8, 2023 · 6 comments
Labels
priority: high status: in-progress This issue has been picked and is being implemented type: bug Something isn't working

Comments

@MATTT-P
Copy link

MATTT-P commented Aug 8, 2023

Describe the bug

When running through the build process, the sqlpipeline to datapipleine migration, is not completing as expected at the dataall-dev-dbmigration-stage stage.

There two new column added to data pipeline which are devstages and devstrategy which cannot be null, but during creation no default values are set.

  1. In the first script, db migration script is creating two new columns devStrategy and devStage which can be null.

image

  1. In the next script, db migration script is altering the db columns devStratgey and devStage which can not be null.

image

The initial script executed without issues, but upon running the second script, an error occurred indicating that the 'devStage' column cannot have a null value. This error occurred because no default value was assigned to the column.

Error message from cloudwatch logs:

image

How to Reproduce

When pushing the 1.5.5 code and triggering the build process it is failing at the dataall-dev-dbmigration-stage stage.

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

n/a

Python version

3.7

AWS data.all version

1.5.5

Additional context

No response

@dlpzx
Copy link
Contributor

dlpzx commented Aug 9, 2023

Hi @MATTT-P, thanks for raising the issue. Strangely it has not come up in our deployments or in other customer deployments, but I think you are right. I do not think that the default value is going to solve your issue though. The problem is the back-filling of existing rows (which would be unaffected if you set the default value).

Option 1: backfilling
it requires changes in the migration scripts. Something like the following:

image

Option 2: allowing null values
We modify the migration script to allow null values:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###
    op.add_column(
        'datapipeline',
        sa.Column('template', sa.String(), nullable=True)
    )
    op.alter_column(
        'datapipeline',
        'devStages',
        existing_type=postgresql.ARRAY(sa.VARCHAR()),
        nullable=True
    )
    op.alter_column(
        'datapipeline',
        'devStrategy',
        existing_type=sa.VARCHAR(),
        nullable=True
    )

We modify the frontend code to be able to handle null values.

I am more inclined towards this second option, what do you think @MATTT-P @noah-paige ?

@dlpzx dlpzx added type: bug Something isn't working status: in-progress This issue has been picked and is being implemented priority: high labels Aug 9, 2023
@MATTT-P
Copy link
Author

MATTT-P commented Aug 9, 2023

Thanks for getting back to me @dlpzx, if we allow null values, are there any dependencies on these values? Were they notnull by design as values from these fields are needed elsewhere?

@dlpzx
Copy link
Contributor

dlpzx commented Aug 10, 2023

Hi @MATTT-P, that is what I am checking at the moment. I am already able to see a couple of places in the frontend where non-null values are expected. I am going to investigate a little more option 1 to see which one is better.

@dlpzx
Copy link
Contributor

dlpzx commented Aug 10, 2023

Hi @MATTT-P, after checking in the code, I am most certain that updating the migration scripts looks like a better solution. I opened a PR for this. I still need to do some more testing on it, but you can have a look and if in a hurry give it a try on your end.

@dlpzx
Copy link
Contributor

dlpzx commented Aug 11, 2023

Hi @MATTT-P. I finalized the PR for the fix in the migration script, it will be reviewed and probably merged by today/Monday. You might need to do some manual steps if your migration scripts failed. Let us know if you need support on this, otherwise we can connect with your team internally

dlpzx added a commit that referenced this issue Aug 14, 2023
### Feature or Bugfix
- Bugfix

### Detail
- migration script for upgrade to V1.2 had a mistake and is affecting
one customer. Basically the `devStrategy` and `devStages` values were
not backfilled which causes nulls in the RDS table that are not allowed
as this column should contain only non-null values.

In this PR we modify that script for customers that have not updated
yet. It is not 100% clear to me whether we should merge it, but I wanted
to raise awareness of this issue here.

### Relates
- #637 

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@dlpzx
Copy link
Contributor

dlpzx commented Aug 14, 2023

Completed!

@dlpzx dlpzx closed this as completed Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high status: in-progress This issue has been picked and is being implemented type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants