Does your CloudFormation Stack Update require cleaning up resources before the update?
Does your CloudFormation Stack Update require a Clean Up of existing resources first to be able to create new ones, but the Clean Up cannot happen before the creation of the new resources is complete? Circular dependency? Catch-22? Here’s how to solve it.
I admit, this is quite rare; but surely a few others ran into it like me. So, here is my scenario:
We have a CloudFormation template which has an Auto Scaling Group that launches an EC2 instance from a Launch Template. The Launch Template, besides defining essential properties, attaches a specific Elastic Network Interface to the launched instance. We do this because this specific ENI is referenced somewhere else so we want to keep it even if the instance attached to it was updated or replaced. So, a few days later, we updated the AMI ID in the Launch Template in CloudFormation, which generated a new version of the Launch Template so now CloudFormation wants to update the Auto Scaling Group to launch a new instance from the new version
Do you see the problem?
In a normal workflow, CloudFormation will create the new resources then clean up the obsolete ones. However, in this case, it cannot launch a new instance because the Elastic Network Interface is still in use (by the old instance); and it cannot terminate the old instance (which will free the Elastic Network Interface) because the stack update is not complete yet!
While my problem was with an Auto Scaling Group and EC2, yours could be with other AWS services. Nevertheless, the circular dependency will be the same and, the good news is, you can solve it the same way.
The Solution:
Use a Lambda-Backed Custom Resource to clean up and delete the old resources then signal back to CloudFormation to continue the updates.
In my case, I wrote the Custom Resource to take in the Launch Template’s ID and Version as parameters, look up the old Auto Scaling Group and set its capacity to 0 (to start terminating any old instances), then send a success to CloudFormation so it starts updating the Auto Scaling Group.
Confused? Don’t worry, see this YAML template example on GitHub, which has inline comments to explain it.
In summary, here is what happens now - when we push updates:
● The Launch Template is updated, CloudFormation generates new Version;
● CloudFormation sends a request to the Custom Resource Lambda containing the Request Type (e.g. Update), a Response URL, the Launch Template ID and the new Version;
● The Custom Resource Lambda looks up the Auto Scaling Group with the old Version of the Launch Template;
● The Custom Resource Lambda sets the capacity of the ASG to 0, and returns success response to CloudFormation;
● The ASG starts terminating the old instance to meet the desired 0 capacity;
● CloudFormation receives the success response and continues;
● CloudFormation updates the ASG with the new Launch Template Version resetting the desired capacity as defined in the template;
● The ASG starts launching a new instance to meet the desired capacity.
Hope that was helpful. Feel free to comment or ask below!