Building Windows 2016 AMI on AWS with Packer and Chef

There are a few different ways to setup and configure a new EC2 instance on Amazon. The most basic is launching an instance manually either through the CLI or the web console, then remote accessing the server, installing and configuring any necessary software by hand. While this method might work for some usages, it's far from optimal. It's time consuming and if you ever need to reproduce the setup in the event of the loss of the instance, launching additional similar instances etc. you'll have to do it all again and hope you took good notes the first time. In comes configuration managers such as Chef, Puppet and Ansible, that can automate the installation and configuration through recipes or desired state declarations.

All of the tools allow you to do the work once and can then take a blank instance or server and bring it up to a desired configuration, but even a fairly simple setup still takes a bit of time to bootstrap with the configuration management software and complete the installation, especially for Windows instances. If you're just setting up a server once in a while, this might not be a problem, but if your instance is meant to be part of an auto-scaling group that automatically adds resources as a response to increased load or visitors to your site, any delay is obviously bad.

Pre-Baked server images

In order to quickly respond to increased resource demands, you can pre-bake or configure an image so that all that is needed is to launch it and wait for it to boot, this is where we'll use Packer, a great tool from HashiCorp, that enables you to build your own machine image. Packer supports generating images for many different virtualization technologies, but in this post we'll focus on building an Amazon EC2 AMI and provisioning it with Chef.

Install Packer

Step one is to install packer, follow the guide here.

Create a template

The key components in a template is the builder and the provisioning sections. The builder specifies the target of the image, in this case Amazon and the the provisioners what to use to setup and configure it, in this case Chef.

Let's start with the end result first and then I'll walk you through the meaning behind each section:

{
  "variables": {
    "aws_access_key":     "{{env `AWS_ACCESS_KEY_ID`}}",
    "aws_secret_key":     "{{env `AWS_SECRET_ACCESS_KEY`}}",
    "aws_session_token":  "{{env `AWS_SESSION_TOKEN`}}",
    "aws_ami":      "{{env `aws_ami`}}",
    "aws_vpc_id":   "{{env `aws_vpc_id`}}",
    "aws_subnet":   "{{env `aws_subnet`}}",
    "aws_instance_profile":   "{{env `aws_instance_profile`}}",
    "chef_run_list":    "{{env `chef_run_list`}}",
    "chef_server_url":   "{{env `chef_server_url`}}",
    "chef_environment": "{{env `chef_environment`}}",
    "chef_validationpem": "{{env `chef_validationpem`}}"
  },
  "builders": [{
    "type": "amazon-ebs",
    "access_key":               "{{user `aws_access_key`}}",
    "secret_key":               "{{user `aws_secret_key`}}",
    "token":                    "{{user `aws_session_token`}}",
    "region": "us-east-1",
    "source_ami": "{{user `aws_ami`}}",
    "vpc_id":   "{{user `aws_vpc_id`}}",
    "subnet_id":  "{{user `aws_subnet`}}",
    "instance_type": "t2.large",
    "iam_instance_profile": "{{user `aws_instance_profile`}}",
    "ami_block_device_mappings": [{
      "volume_type": "gp2",
      "device_name": "sdh",
      "volume_size": "50"
    }],
    "disable_stop_instance": "false",
    "ami_name": "mypacker-ami {{timestamp}}",
    "user_data_file": "{{template_dir}}/setup_winrm.txt",
    "communicator": "winrm",
    "winrm_username": "Administrator",
    "winrm_timeout": "60m"
  }],
  "provisioners": [
    {
      "type": "chef-client",
      "server_url": "{{user `chef_server_url`}}",
      "guest_os_type": "windows",
      "ssl_verify_mode": "verify_none",
      "run_list": [ "{{user `chef_run_list`}}" ],
      "validation_key_path" : "/path/to/{{user `chef_validationpem`}}",
      "validation_client_name": "pivotal",
      "chef_environment": "{{user `chef_environment`}}"
    },
    {
      "type": "windows-restart"
    },
    {
      "type": "powershell",
      "inline": [
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
      ]
    }
  ]
}

If we look at the variables, you'll see a mapping between environment variables and variables used in the template, any environment variable needs to mapped in this way before you can use it:

"variables": {
    "aws_access_key":     "{{env `AWS_ACCESS_KEY_ID`}}",
    "aws_secret_key":     "{{env `AWS_SECRET_ACCESS_KEY`}}",
    "aws_session_token":  "{{env `AWS_SESSION_TOKEN`}}",
    "aws_ami":      "{{env `aws_ami`}}",
    "aws_vpc_id":   "{{env `aws_vpc_id`}}",
    "aws_subnet":   "{{env `aws_subnet`}}",
    "aws_instance_profile":   "{{env `aws_instance_profile`}}",
    "chef_run_list":    "{{env `chef_run_list`}}",
    "chef_server_url":   "{{env `chef_server_url`}}",
    "chef_environment": "{{env `chef_environment`}}",
    "chef_validationpem": "{{env `chef_validationpem`}}"
  },

I'm using this template in a CI system to build images for several roles and environments, so it takes user input, in this case it's TeamCity parameters passed as environment variables, to customize it. Of special note is the AWS access, secret and token variables. They are set by this script before execution:

#!/bin/bash
### Assume a role that is allowed to launch instance and prep variables for packer
aws sts assume-role --role-arn "$assume_role_arn" --role-session-name "$role_session_name" > assume-role-output.txt
export AWS_ACCESS_KEY_ID="$(cat assume-role-output.txt | jq -c '.Credentials.AccessKeyId' | tr -d '"' | tr -d ' ')"
export AWS_SECRET_ACCESS_KEY="$(cat assume-role-output.txt | jq -c '.Credentials.SecretAccessKey' | tr -d '"' | tr -d ' ')"
export AWS_SESSION_TOKEN="$(cat assume-role-output.txt | jq -c '.Credentials.SessionToken' | tr -d '"' | tr -d ' ')"

The builders section takes these variables and uses them to launch an EC2 instance (type: amazon-ebs") with the specified instance size, disk configuration etc.

"builders": [{
    "type": "amazon-ebs",
    "access_key":               "{{user `aws_access_key`}}",
    "secret_key":               "{{user `aws_secret_key`}}",
    "token":                    "{{user `aws_session_token`}}",
    "region": "us-east-1",
    "source_ami": "{{user `aws_ami`}}",
    "vpc_id":   "{{user `aws_vpc_id`}}",
    "subnet_id":  "{{user `aws_subnet`}}",
    "instance_type": "t2.large",
    "iam_instance_profile": "{{user `aws_instance_profile`}}",
    "ami_block_device_mappings": [{
      "volume_type": "gp2",
      "device_name": "sdh",
      "volume_size": "50"
    }],
    "disable_stop_instance": "false",
    "ami_name": "ils-ami {{timestamp}}",
    "user_data_file": "{{template_dir}}/setup_winrm.txt",
    "communicator": "winrm",
    "winrm_username": "Administrator",
    "winrm_timeout": "60m"
  }]

As you can see we launch the instance with userdata (code that is executed on first launch of an instance), defined in the file setup_winrm.txt. In our example it contains the following powershell script that configures winrm on the instance so packer can communicate with it:

<powershell>
winrm quickconfig -q
winrm set winrm/config/winrs '@{MaxMemoryPerShellMB="2200"}'
winrm set winrm/config '@{MaxTimeoutms="1800000"}'
winrm set winrm/config/client/auth '@{Basic="true"}'
winrm set winrm/config/service '@{AllowUnencrypted="true"}'
winrm set winrm/config/service/auth '@{Basic="true"}'

netsh advfirewall firewall add rule name="WinRM 5985" protocol=TCP dir=in localport=5985 action=allow
netsh advfirewall firewall add rule name="WinRM 5986" protocol=TCP dir=in localport=5986 action=allow

net stop winrm
sc config winrm start=auto
net start winrm

Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope LocalMachine
</powershell>

Remember that you securitygroup also needs to allow winrm.

The last part of the script is the provisioners.

"provisioners": [
    {
      "type": "chef-client",
      "server_url": "{{user `chef_server_url`}}",
      "guest_os_type": "windows",
      "ssl_verify_mode": "verify_none",
      "run_list": [ "{{user `chef_run_list`}}" ],
      "validation_key_path" : "/path/to/{{user `chef_validationpem`}}",
      "validation_client_name": "pivotal",
      "chef_environment": "{{user `chef_environment`}}"
    },
    {
      "type": "windows-restart"
    },
    {
      "type": "powershell",
      "inline": [
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
        "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
      ]
    }
  ]

Here we actually have 3 provisioners: chef-client, windows-restart and powershell.

chef-client provisions the instance according to the run_list

windows-restart restarts the instance and continues the packer run after it comes up again

powershell is used to execute the amazon sysprep scripts to prepare an instance for AMI creation. It cleans the eventlogs, removes the computer name and basicly makes the instance look like new. We run it with the -NoShutdown option as packer itself triggers a shutdown at the end of the build.
Before Windows 2016, a service called EC2Config.exe was used for the same task.

That is all, to actually build your AMI run packer build and watch as it launches a new instance, provisions it according to your chef run_list, creates an AMI and terminates the temporary instance again. The resulting AMI can now be used in autoscaling launch configurations or to manually launch new instances.

You can find the scripts and templates referenced in this post in the following github repository: https://github.com/brianlund/packer-templates