IaC + Practical AWS CloudFormation and CloudFront CDN stuff


2017-02-24, 08:49 Posted by: Aki Karjalainen

Infrastructure as Code.

AWS's take on IaC is mostly CloudFormation and 3rd party tools around it. Of course there's other related offerings like OpsWorks Chef based configuration management but in my opinion CloudFormation lays the foundation of IaC as it manages creating and updating the resources at lower level across all AWS service offerings whereas OpsWorks is more EC2 instance configuration management specific service. That is, CloudFormation for provisioning resources and AWS managed Chef for managing instances and continuous deployments.

But why IaC. IaC is essential to your DevOps practices. Infrastructure as code is the concept of managing your operations environment in the same way you do applications or code. Rather than manually making configuration changes, the operations infrastructure is managed through a configuration engine and rules and they in turn are governed by the same principles as application code - there's core principles like version control, traceability, change management, peer reviews etc. Too often you find infrastructure which consists of snowflakes - unique servers and environments. Snowflake environments have several issues with them:

  • Problems are difficult to reproduce. Your software might run fine in one environment but in another environment it might have e.g. performance issues.
  • Difficult (and expensive) to introduce changes as the environments become hard to understand and modify. Naming is not consistent or you may not have resources tagged and named at all.
  • Snowflake environment is hard to audit. Changes are not logged.
Snowflake

Quoting Boyd Hemphill, cloud evangelist from Stackengine (which was acquired by Oracle a while ago...): "The basic principle is that operators (admins, system engineers, etc.) should not log in to a new machine and configure it from documentation. Rather, code should be written to describe the desired state of the new machine. That code should run on the machine to converge it to the desired state. The code should execute on a cadence to ensure the desired state of the machine over time, always bringing it back to convergence. This IaC thinking, more than any other single thing, is what enabled the cloud revolution, because a single ops person can start 100 machines at the press of a button, and also have them properly configured. The elasticity of the cloud paradigm and disposability of cloud machines could truly be leveraged".

But beware, with IaC and automation one can also do an incredible amount of damage in a short amount of time. Just imagine one accidentally running a single command to delete a Cloudformation stack which is accountable of spinning up 10 EC2 instances and related firewall rules in production - all instances terminated. Oops. Well, there's ways to protect stacks from being deleted accidentally.

Another good principle is "Avoid written documentation", since the code itself will document the state of the system. This is particularly powerful because it basically means that infrastructure documentation is always up to date. How often you've seen up to date infrastructure documentation? Well, managers do not like to peek into code but they do keep asking for documentation. To tackle that, keep your infrastructure constants like IP addressing, firewall rules etc. defined in compact blocks of code which anyone can understand and which you could reference easily. Instead of very detailed documentation, keep only high level infrastructure documentation up to date.

While we are at it, one neat very useful new feature that AWS released a while ago to CloudFormation is ability to export values and import them to another stack. No more cross-stack Lambda function workarounds. For an example, consider the following simple stacks and how they are organized:

Stacks

Obviously you want to attach your VPC and subnets to the VPN. Next you need to spin up a database using RDS. What you need for that is the id of the VPC which was created and a list of subnet ids to be able to attach your database instance to it and create a subnet group for the database instance. A real-life scenario would be a bit more complicated but let's stick to this one for the sake of the argument. In real life, you would throw in couple of IAM roles, NAT gateway, security groups and route tables with some route entries, network ACLs etc. Of course your setup might be provisioned using one monolithic stack or even nested stacks, your mileage might vary. Right, let's have a look at our example. The following JSON snippets are the beef of leveraging export and import functions:

First, you may want to set up your VPN tunnel. Tunnel id is a good candidate for exporting:

        "VpnConnectionId": {
            "Description": "The id of the VPN connection",
            "Export": {
                "Name": {
                    "Fn::Sub": "${AWS::StackName}-VpnConnectionId"
                }
            }
        }
        
Let's attach the VPC to the VPN tunnel. What you want to do is import the VPN tunnel id:

  	"VpnGatewayId": {
          "Fn::ImportValue": {
            "Fn::Sub": "<your VPN stack name>-VpnGatewayId"
          }
        }

Let's export the id of the VPC:

        "Export": {
          "Name": {
            "Fn::Sub": "${AWS::StackName}-VpcId"
          }
        }
            
... Which will be imported when creating subnets, route table entries etc.

The function Fn::ImportValue returns the value of an output exported by another stack and you typically use this function to create cross-stack references. A couple of important notes about naming the exports: 

  1. Export names must be unique within the account and a region.
  2. You can't export and import across regions.
  3. You can't delete a stack if its export is imported by another stack.

Last but not least. It seems there's an undocumented feature. If you add an export to your existing stack Cloudformation does not consider that as a "change" to the stack and refuses to execute an update. This is unfortunate, and what it means is that you should always export an output if there's even a slight chance that you need to import an output later in another stack because adding an export later on is a bit cumbersome.

CloudFront and WAF, bad guys bypassing your CDN distribution

BlockBadGuys

CloudFront and WAF related topic which you might find interesting is how to block bad guys who bypass your CloudFront and WAF by accessing the origin directly. In some cases that might not be a problem, as generally accessing the origin may be perfectly fine. After all, the origin is there to serve requests and CloudFront is there to provide performance boost by serving the content from the closest location to the client. There's two scenarios here, either your origin is served from S3 OR you are using custom origin (e.g. serving the content from your HTTP server located in your on-premises DC). If users access your objects directly in S3, they bypass the controls provided by CloudFront. In addition, if users access objects both through CloudFront and directly by using Amazon S3 URLs, CloudFront access logs are not very useful because they're incomplete as they will miss the users bypassing CloudFront altogether.

Let's discuss these two origin types and how you can prevent bad guys from accessing the origin server.

In case your CloudFront distribution is using S3 buckets as origin:

 

1. Leverage Origin Access Identity (OAI). Prevent access to your bucket based on OAI by creating Origin Access Identity CloudFront user and associate that with the CloudFront distribution.

2. Edit S3 policy and restrict GetObject access only the origin access identity.

  {

   ...

    "Principal":{"CanonicalUser":"<your origin access identity which you created with the CDN distribution or attached to the distribution>"},

    "Action":"s3:GetObject",

    "Resource":"arn:aws:s3:::mybucket/*"

   }

 

If you are using custom origin with your CloudFront distribution:

 

1. Whitelist only CloudFront origin i.e. CloudFront public IP ranges.

2. Whitelist pre-shared secret header. Whitelist CloudFront, deny everyone else.

You can add a custom request header to your distribution and you could whitelist only the requests containing the header.

Beware! One needs to build a mechanism to update the list of whitelisted IP addresses as they WILL change, in case you whitelist only CloudFront IP ranges. A nice serverless design is to use Lambda function to periodically update the whitelisted IP range.

And yes, you can spin up your CloudFront distribution with CloudFormation, as well as set up your S3 policy so only allow CloudFront to access your content and thus blocking the bad guys:

"Type" : "AWS::CloudFront::Distribution",

 "Properties" : {

  "DistributionConfig" : {

   "Origins" : [ {

    "DomainName" : "cloudit.s3.amazonaws.com",

     "Id" : "myS3Origin",

      "S3OriginConfig" : {

       "OriginAccessIdentity" : "origin-access-identity/cloudfront/SDSJJK43423K"

      }

     }],

     "Enabled" : "true",

     "Logging" : {

     },

    "Aliases" : [ "cloudit.aki.com", 

...

"Type" : "AWS::S3::BucketPolicy",

 "Properties" : {

  "Bucket" : String,

   "PolicyDocument": {

...

"Principal":{"CanonicalUser":"<your origin access identity which you created with the CDN distribution or attached to the distribution>"},

    "Action":"s3:GetObject",

There you go, just syncronize to S3 and create the stack.


comments powered by Disqus