-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Amplify Push Workflow #1406
Comments
It's a huge concern. We're into the Amplify stack but I'm currently thinking if there is a way we can extricate ourselves out and possibly just use AppSync and bypass Amplify - except for the codegen feature. There's certain errors that basically seem to be completely irrecoverable from, which as you state, in production is unacceptable.
This is key. |
We've also started to look for alternatives. We're still hoping to use Amplify for auth, hosting, functions and REST/API-GW, but remove the GraphQL part and implement that ourselves using API-GW + Lambda + RDS. We're hoping that would remove the majority of error cases. |
This is really critical. The amplify push seems very fragile. A couple that I'm running into are:
The schema.graphql is valid, but all these errors seem to occur because its getting pushed to DynamoDB concurrently. It seems to me that if there was a way for Amplify to do this sequentially a lot of these push fails could disappear. MY QUESTION |
Echoing the sentiment above, this is definitely one of the most concerning parts of Amplify. The DX for provisioning resources is great when it works, but once one of the dreaded CFN error messages appear, good luck. Some of the messages make you think you might be able to resolve it within the AWS console at which point your settings drift apart and the least frustrating/least work option is to blow it all up and provision from scratch. I'm also really not sure how else I would handle this if I were in a production environment at this time. Amplify could offer the coolest transform directives and best codegen in the world, but if it can't be deployed without hitting these painpoints, it's moot. Even some way to revert the local schema to the format which CFN would accept during a deploy would be better than leaving things in an unusable middle state. Related: #1030 |
A small step forward in this - my theory above was that CloudFormation was pushing everything to DynamoDB at once and that if you could control that then this might be more reliable. I compiled my graphql (amplify api gql-compile) then edited the cloudformation-template.json in the build directory. Under the Resources section of this json you have GraphQLAPI, GraphQLAPIKey, GraphQLSchema then you have all your tables, then you have ConnectionStack, SearchableStack, CustomResources. Each, except the first 3 GraphQL resources, has a DependsOn variable set. For tables it is set to GraphQLSchema and for the last 3 its set to all GraphQL resources and all tables. I've been experimenting (painful since each test takes well over an hour), but I'm getting further by telling each table that it also DependsOn the previous table before going ahead in the cloudformation-template.json - then using amplify push --no-gql-override so that it doesn't compile again and overwrite the cloudformation-template.json I just modified. I think this will work for tables, but all connections and therefore GSI's, are defined in the ConnectionStack and it doesn't appear any way to do them one by one and there is a limitation that you can only update/delete 1 GSI per push. Echoing Dave's sentiments above - the CloudFormation settings being auto-generated by amplify are undeployable. These sort of things should be the defaults as being able to reliably deploy to dev, test, prod is what its all about - if you can't reliably do that then it kind of defeats the entire purpose. |
So far, using environments, we're been able to achieve a level of stability, where we do the tricky stuff on a branch, and when everything is stable, "merge" it back in with prod. I have still gotten the Another trick I'll throw out is to actually package up amplify where it in its own separate directory, and then publish each push as a private NPM package to be consumed by other team members. That way, you have one source of truth where other team members don't need to worry about |
There are currently 3 (!) RFC's for enhanced Amplify API features (the @auth directive #1043 and custom indexes #1062 and local testing #1433 ). However, the current state of Instead of adding new functionality, would the Amplify team please focus on making the core service reliable so that it can be used confidently in a production environment? |
We have now replaced the Amplify AppSync GraphQL autogeneration with a Postgres RDS and Postgraphile running in a Lambda function. The VPC + RDS is setup using manually-installed cloudonaut.io CF stacks (waiting on #1426 to get them deployed using Amplify cli), and the rest is using Amplify-controlled API Gateway + Lambda functions. During a week of development I haven't had a single failure in Amplify pushing. It seems that the vast majority of problems were related to the AppSync/GraphQL autogeneration, and the rest of Amplify fulfills its promise. A major issue (though not the only one) was the DB and API being coupled in the same resource, meaning you couldn't remove and redeploy the API without losing all your data. It's not to say that the setup is completely without issues. We have CF stacks exporting values which are used by other stacks, in which case the base stacks often cannot be updated without removing the other stacks first. But now this is manageable and in our control, and not just hoping the Amplify black magic works. I still wouldn't use Amplify in scenarios requiring zero-downtime deployments, but in non-critical areas it seems to work fine. |
As @hew mentioned the As of writing, almost all of the recent deployment failures seem to fall into three categories:
The first point is being solved by the @key directive which will remove the black box around key structures and will give you more control. The @connection directive will then go through some sort of deprecation and re-introduction process because the CloudFormation limits on DynamoDB GSIs make the current implementation difficult to use in practice. A similar concept to @connection that leverages @key will be introduced that does not suffer the same issues as @connection. The second point has been solved by a recent PR that always outputs relevant exports that are used by downstream stacks. Thanks @kstro21 The third point seems to be an issue with the AppSync CFN implementation and is being looked into, but removing failure cases such as the @connection issues will make this much less likely to occur. You can get around this issue today by adding back resolvers to the fields that are expected to have resolvers and then re-running In addition to the above changes we can do the following:
|
That is all great news. Would there be an easy way to decouple the API from the Databases, as @plaa suggests? Destroying the API and rebuilding wouldn't be much of an issue if the DB tables weren't deleted, too. This would also, theoretically, make it easy to add a canary (blue / green) API deployment that points to the same backend data (although there are some challenges there, too) You would not be able to destroy databases and keep the API, but I believe you could destroy the API while maintaining databases and use existing databases as datasources when redeploying the API. Of course, some @key changes would necessitate the creation of new tables |
@ajhool This is an interesting idea and is something we can definitely look into. One obvious way to allow this is to enable referencing external tables from within the API category similarly to how we do with the If we went down this route, you would be able to use type User @model(table: "MyTable-${env}") { ... }
# Or alternatively with a new directive that specifies the model should use an existing table.
type User @model @table(name: "MyTable-${env}") { ... } The CLI can then offer options to automatically import an existing table (deployed through Amplify or not) into an API project. This would allow you to change the API at will without worrying about data integrity. I'll also mention that the goal is to eventually empower the community to build their own reproducible patterns. We are working through PR #1396 that will allow you to write your own transformers to encapsulate those behaviors. The process is reasonably simple and hopefully will unblock all the good ideas you guys have. For example writing a transformer that tells a @model to use an existing table is as simple as removing the AWS::DynamoDB::Table record and replacing an ARN in the generated AppSync data source & IAM role. This will require docs but the goal is to allow you to write transformers like this (https://github.com/aws-amplify/amplify-cli/blob/master/packages/graphql-function-transformer/src/FunctionTransformer.ts) for your own custom workflows. |
Looks like a workable concept to me. Amplify's autocreation of resolvers is fantastic, so as long as Amplify can recognize that @table is a DynamoDB table and autocreate resolvers effectively, then it would be very nice to have that option. |
Hi there! I am adding myself in the list of users having the amplify push error:
I have frankly say that unfortunately this is becoming a really nasty issue for my company, we are going in production very soon and no matter what i do at the moment i cannot fix it, i have to do the api again, and this really scary me in a prod scenario, i prob will opt to follow a workflow like @hew suggested. We start nice and smooth but when the schema become more complex things become hard to solve. In here works fine : type Recipe @model {
id: ID!
user: User @connection(name: "UserRecipe")
name: String!
countries: [RecipeCountry] @connection(name: "RecipeCountry")
leads: [Lead] @connection(name: "RecipeLeads")
industries: [RecipeIndustry] @connection(name: "RecipeIndustry")
customIndustries: [CustomIndustry] @connection(name: "RecipeCustomIndustry")
createdAt: String
updatedAt: String
}
type RecipeIndustry @model {
id: ID!
recipe: Recipe @connection(name: "RecipeIndustry")
industry: Industry @connection(name: "IndustryRecipe")
createdAt: String
updatedAt: String
}
type Industry @model {
id: ID!
name: String
code: String
recipes: [RecipeIndustry] @connection(name: "IndustryRecipe")
}
type CustomIndustry @model {
name: String
recipe: Recipe @connection(name: "RecipeCustomIndustry")
createdAt: String
updatedAt: String
}
type Lead @model {
id: ID!
leadsType: String!
maxQuota: Int
generatedQuotas: [Quota] @connection(name: "LeadsQuota")
recipe: Recipe @connection(name: "RecipeLeads")
createdAt: String
updatedAt: String
}
type RecipeCountry @model {
id: ID!
recipe: Recipe @connection(name: "RecipeCountry")
country: Country @connection(name: "CountryRecipe")
createdAt: String
updatedAt: String
}
type Country @model {
id: ID!
name: String
code: String
user: [User] @connection(name: "UserCountry")
company: [Company] @connection(name: "CompanyCountry")
recipes: [RecipeCountry] @connection(name: "CountryRecipe")
}
type Company @model {
name: String
industry: String
number_employee: Int
website: String
phoneNumber: Int
email: String
address: String
user: [User] @connection(name: "UserCompany")
country: Country @connection(name: "CompanyCountry")
} when i tried to modify the Recipe model and add all in once: ...
departments: [RecipeDepartment] @connection(name: "RecipeDepartment")
seniorities: [RecipeSeniority] @connection(name: "RecipeSeniority")
customJob: [CustomJob] @connection(name: "RecipeCustomJob")
... with the relative models : type RecipeSeniority @model {
id: ID!
recipe: Recipe @connection(name: "RecipeSeniority")
seniority: Seniority @connection(name: "SeniorityRecipe")
createdAt: String
updatedAt: String
}
type Seniority @model {
id: ID!
name: String
code: String
recipes: [RecipeSeniority] @connection(name: "SeniorityRecipe")
}
type RecipeDepartment @model {
id: ID!
recipe: Recipe @connection(name: "RecipeDepartment")
department: Department @connection(name: "DepartmentRecipe")
createdAt: String
updatedAt: String
}
type Department @model {
id: ID!
name: String
code: String
recipes: [RecipeDepartment] @connection(name: "DepartmentRecipe")
}
type CustomJob @model {
id: ID!
name: String
recipe: Recipe @connection(name: "RecipeCustomJob")
createdAt: String
updatedAt: String
}
i got a nice
so I try to follow the @mikeparisstuff suggestions , delete all i even deleted all new models and @connection related to old ones, but i got always same error, ...
customJob: [CustomJob] @connection(name: "RecipeCustomJob")
... and the relative model: type CustomJob @model {
id: ID!
name: String
recipe: Recipe @connection(name: "RecipeCustomJob")
createdAt: String
updatedAt: String
} i got error, i even try change the names of models and connections, so instead "custom jobs i used customProfessions", but no luck.
I am stuck, cannot add and connect single new model with old model withouth a push failureoutcome, only solution now is erase the database and start over again, i will lose some data and that make all this even more annoying, please try fix this issue asap cause people like me are quite stuck and we got project to run in short time. Thanks |
Ok say you get one of these errors you cannot recover from, and you are on
NOTE: if there is a way to rename an env, that would eliminate some of the steps above. It's not pretty, but it's probably faster than trying to debug, make a push, wait, debug, etc. It will sort of depend how many other services and permissions you need to set up on the new env. |
Thanks @hew, i think is a workoround very usefull for who like me is stuck and need go in prod. Being a very sensitive issue i hope the team will try fix it asap |
My team has been working on the challenges we've been facing around getting this working - in particular being able to deploy multiple GSI's. I'll put this out there and would welcome feedback on the validity of this approach. Essentially they are saying to abandon amplify for serverless (https://serverless.com/). That serverless is able to do all of this and as proof of that they deployed 180 GSI's across 30 tables (something we were flat out unable to get CloudFormation to do by any method). There are a couple of references they provide for serverless graphql My team are very keen to move forward with Serverless and put amplify behind us for now. Any thoughts on the validity of this approach? |
Just keep in mind that you will lose all the settings you have for permissions, lambda configs, etc. If you have a project of any decent size, it's still going to take a while to get everything working again. I entertained that flow yesterday and I'm still fixing different things. I think this weekend I'm going to explore Serverless, or literally anything else, as @sacrampton suggested. I honestly cannot take the pain of this anymore. |
As an aside, has anyone ever had AppSync simply stop updating, but Amplify pushes succeed? |
Still testing the best approach here, but my team has successfully used amplify to generate compiled AppSync schemas (ie. amplify api gql-compile - schema/resolvers) then use serverless to deploy that stack. As I said, a work in progress, but it seems this might be a valid workflow - essentially using serverless to replace the amplify push / cloudfront usage. For what its worth, separate to amplify push, we were unable to get standalone CloudFront to deploy a stack that had multiple GSI's on a table. The DependsOn attribute will pause for resources, but GSI's are contained within resources so there is no way to pause it. We invested weeks trying to get this to work, but in the end serverless got it working straight up. And given Amplify Push sends all of this to CloudFront there is always going to be a problem unless there is a way for CloudFront to deploy GSI's successfully. At this time its looking like a hybrid of amplify transform to generate the schema - the serverless to replace push. This of course may change as we get further into it. |
Hello everyone - We're currently working on this issue. For some clarity while you may be seeing this issue in the Amplify CLI, we believe the majority of the issues in this thread are related to a CloudFormation issue in AppSync and not how Amplify is doing a deployment. It occurs when there is a race condition between removing types and adding/modifying others. We're currently working with the AppSync team on resolving this and for clarity, you could still see this issue if you were using an alternative method of deployment such as Serverless or hand rolled CloudFormation deployment. In the meantime if you see an error saying "No resolver found" then you can workaround the problem by attaching an empty resolver to that type in the AppSync client and then attempting a push again. |
Hi Richard (@undefobj ) - thanks for taking the time to respond on here, but I'd like to politely point out that a comment like "We're currently working on the issue" doesn't really help anyone. Everyone on here is trying to solve issues related to reliably deploying their solutions in dev/test/prod. We have timelines we have to meet on this - in my case, deploying our AppSync app into test/prod so I can start getting paid for it. This is not something that I can do whenever a solution appears - it is something that has to be done and if I can't do it this way then we have to find another way. At the moment if you are using GSI's then Aplify/Cloudfront is unusable and there is no work around in the interim that I'm aware of (serverless is looking promising however). There are also valid concerns raised by others in the wisdom of not separating the app and underlying databases (ie. stories of how easy it is to have cloudfront blow away entire databases). I fully realize that you can't promise specific dates, but you can provide non-committal guidance (ie. we are working on an this issue and anticipate having a fix in place next week / next month / next year). That guidance would help us decided whether to hold tight and see what might be coming, or abandon and go another route. Giving approximate time line guidance would be helpful - and given this is a "dead in the water" issue, interim workarounds are really critical. |
@sacrampton we needed a bit more time to investigate before giving any timelines. My response was to give you clarity on the situation from a technical standpoint so that you understood you can have this problem with any CloudFormation technology. It's independent of the Amplify CLI. That being said we were able to dive deeper into the issue today and identify the root cause with the CloudFormation update process and are working on an AppSync deployment to resolve this. ETA is currently end of this week but if we can get it sooner I will reply back. |
All - The fix to AppSync CloudFormation for the "No resolver error" has now been deployed to all regions. If this was the root cause of your error then you should be able to run |
here is my CLI.json {
"features": {
"graphqltransformer": {
"enableiterativegsiupdates": true
}
}
} and here is the model we are updating and getting the GSI error type UserCampaign
@model
@key(name: "byUser", fields: ["userId", "createdAt"], queryField: "userCampaignByUser")
@key(name: "byCampaign", fields: ["campaignId", "createdAt"], queryField: "userCampaignByCampaign")
@key(name: "byCampaignByStatus", fields: ["campaignId", "campaignStatus"], queryField: "userCampaignByCampaignByStatus")
@key(name: "byStatusByUpdatedAt", fields: ["campaignStatus", "updatedAt"], queryField: "userCampaignByStatusByUpdatedAt")
@key(name: "byStatusByCreatedAt", fields: ["campaignStatus", "createdAt"], queryField: "userCampaignByStatusByCreatedAt")
@key(name: "byCampaignByUser", fields: ["campaignId", "userId"], queryField: "userCampaignByCampaignByUser")
@key(name: "byCardId", fields: ["cardId"], queryField: "userCampaignByCardId")
@key(name: "byScore1", fields: ["campaignId_campaignStatus", "userScore_1"], queryField: "userCampaignByScore1")
@key(name: "byCampaignByCard", fields: ["campaignId", "cardId"], queryField: "userCampaignByCampaignByCard")
@key(name: "byReportingGroupV2", fields: ["reportingGroupV2"], queryField: "userCampaignByReportingGroupV2")
@key(name: "byReportingGroupV2ByCoarseCampaignStatus", fields: ["reportingGroupV2", "coarseCampaignStatus"], queryField: "userCampaignByReportingGroupV2ByCoarseCampaignStatus")
@key(name: "byCoarseCampaignStatus", fields: ["coarseCampaignStatus"], queryField: "userCampaignByCoarseCampaignStatus")
@key(name: "byUserByCoarseCampaignStatus", fields: ["userId", "coarseCampaignStatus"], queryField: "byUserByCoarseCampaignStatus")
@key(name: "byCampaignByCoarseCampaignStatus", fields: ["campaignId", "coarseCampaignStatus"], queryField: "byCampaignByCoarseCampaignStatus")
@key(name: "byNormalLaunchCampaignScore", fields: ["normalLaunchCampaignScore"], queryField: "userCampaignByNormalLaunchCampaignScore")
@key(name: "byBestCandidateAwaitingApproval", fields: ["campaignId_campaignStatus", "normalLaunchCampaignScore"], queryField: "userCampaignByBestCandidateAwaitingApproval")
@auth(rules: [
{allow: groups, groups: ["admin"], operations: [create, read, update, delete]},
{allow: owner, ownerField: "userId", operations: [read]},
{allow: private, provider: iam, operations: [create, read, update, delete]}
])
{
id: ID!
campaignId: ID!
userId: ID!
...[lots of stuff here]
} |
and then what I get in the output is basically this:
|
@renebrandel is the lastest version 4.43 or 4.44? also, I posted the requested logs above |
@tomhirschfeld 4.43 should have the latest error codes but skimming it through, it just seems like the feature flag isn't detected. Just to over-communicate, can you confirm that this is the path for your |
@renebrandel yes, this is the contents of our project not sure if this influences things at all but we initialized the project in march 2020, using some much earlier version of the CLI. |
@renebrandel I'm now on 4.44.0 and I get the same error as previously without anything new:
|
I think this one could be connected to the s3 deployment bucket deployment-state.json file. Iterative GSI updates (enableiterativegsiupdates) locked it for me in WAITING_FOR_ROLLBACK state. I've manually updated deployment-state.json file to: {
"version": "1",
"startedAt": "2021-02-23T09:00:30.085Z",
"status": "DEPLOYED",
"currentStepIndex": 0,
"steps": [
{
"status": "DEPLOYED"
}
]
} It seems that the problem is gone after that |
Adding an update here we were to able to find that some deployments were hanging due to not carrying over the previous deployment vars. We have added a PR on this which is now in the latest release. PR: #6486 Following up on few comments @sacrampton @mormsbee @dsypniewski It looks like this might be an issue depending on how many secondary indexes are changed per |
@SwaySway in my experience once this error starts showing up it doesn't matter what changes even is it's completely unrelated to the GSIs like adding a new field it still will error out with the rate limit. |
@dsypniewski If it's unrelated to how many model changes you do can you specify a minimal schema change in stages that can repro this for you? (ex. @tomhirschfeld Are you still running into this with the latest version? If so I'd like to setup some time to see what the issue is here, you can ping me in the Amplify discord server |
Hello Some days ago i pushed some schemas, and an error occurred (Syntax error, i did a dumb one there but ... happens) "A deployment is already in progress for the project, cannot push resources until it finishes." I do not know how strong is it related to this issue but it seems to be at least a little ... If someone can provide my any help, that would be nice |
@JosephVasse - I believe this error is related to the status of the CloudFormation Stack - I have had similar experiences. In that case I go to the CloudFormation console and use commands on the console to roll back the Stack so that its status is ROLLBACK COMPLETE then it is ready for amplify push. That has worked for me. |
I tried this but it doesn't seems to have worked ... looks like i'm doomed |
@JosephVasse Have you tried running: I had an issue once where I named something and either appsync or cloudfront didn't like it. I got stuck in a rollback state like you. I had to carefully navigate through the Cloudformation stack using the console and delete the step the rollback was getting hung up on and then re-execute the rollback. It was a while back and I don't recall the exact steps. Now, understand that this could create new, unforeseen problems, and I take no responsibility for what might happen. This worked for me, no guarantee it will work for you, YMMV and all that. Another route you might consider is seeking help from the Cloudformation team or a Cloudformation forum. Good luck. |
I tried your idea but nothing seems to happen, i still got the :
I managed to "fix" my problem by cloning the env but i had to repopulate my tables in dynamoDB ... I feel really unconfident since i dont even understand why did this happen ... but lets move on .. |
Hello @JosephVasse what kind of syntax error happened? Did the CLI attempt a rollback deployment? |
It was a syntax error on the model names (which were ok for the "mock api" ... is it normal ?) To be honest i dont remember exactly since it was like 3+ weeks ago, but i dont think any rollback happened. |
@JosephVasse Are you still experiencing this issue with iterative deployments? |
I swapped on the mobile-app developpement for the moment so i'm not for now, thanks for asking |
We've got another batch of fixes and improvements for this on the way. You can track the PR progress here: #6990 We're still working through this exception from here https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-api:
|
I get this error when Why it happened?
Then the about error appear, wont go away, no matter what resource I add, it still appear the error. Then I check So the error I have no idea to solve, somebody please give some hint I submit a issue here |
@kenchoong We've got an update incoming that'll detect this state inconsistency and give you an option to recover from the failed state. Here's the tracking PR #6990 |
@renebrandel Not mean to pressure you guys, but approximately when the #6990 will be live? Cause I now cant do anything with |
⭐ ⭐ ⭐ LAUNCH ANNOUNCEMENT ⭐ ⭐ ⭐ We've officially launched the ability to make multiple GSI updates in a single deployment for all new Amplify projects starting from CLI v4.49+. A big THANK YOU to everyone here helping us ironing this out during the experimental phase! Learn more about the workflow in our documentation. I'll be closing this issue. If you run into any bugs or errors using the iterative deployment, please open a new issue. 5c804bbc-390b-4ee4-ad44-dd7b74a05afd.mp4 |
This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs. Looking for a help forum? We recommend joining the Amplify Community Discord server |
** Which Category is your question related to? **
Amplify cli
** What AWS Services are you utilizing? **
AppSync, Lambda, Auth
** Provide additional details e.g. code snippets **
During a few weeks of development I've encountered many occasions when changes I've made fail to be updated using
amplify push
. I don't seem to be alone.Thus far I've resolved these by deleting and recreating the API+DB, but in production that would not be an option. I have no idea how I could perform the updates if we already had a production system. We're now seriously considering can we proceed to production using Amplify or do we need to switch to other solutions.
Cases I've faced include for example:
ApplicationArea
in the latest changes:Export with name bjlounrbanderkj4wu4gnr63my:GetAtt:ApplicationAreaDataSource:Name is already exported by stack xyzzy-20190502163009-apixyzzy-OEWIC881QYZP-ApplicationArea-1T2QUAK2OVDYG
and several others.
One of the issues seems to be that the API+DB are coupled. It wouldn't be a problem for us to delete and recreate the API (with a short outage), but the DB is lost in the same operation.
Every time I've tried to make changes directly in CF (for example deleting some stack), the local Amplify state has become out-of-sync with the cloud and the only way I've been able to resolve it has been to completely delete Amplify and start
amplify init
from scratch.Can you provide any general guidance how these kinds of situations could be resolved? For example is there some way to have Amplify delete + create changed stacks instead of updating them? Or is it safe to delete or modify some stack manually in order to resolve such conflicts?
I'm not looking for specific instructions for a particular error case, but more general guidelines of a) what kind of things can be manually altered in order to make an update go through, and b) what can be tried to resolve cases where the Amplify local and cloud states have become out-of-sync.
The text was updated successfully, but these errors were encountered: