How to secure thousands of websites with Let's Encrypt certificates

The world is shifting towards HTTPS encryption everywhere, as evident by Google's announcement that Chrome would start labeling all HTTP websites as insecure beginning with Chrome 68 in July 2018. It's not a new development as Google has been encouraging the move for years by boosting search engine rankings for HTTPS sites and marking input fields on HTTP websites as insecure.

HTTPS everywhere is a noble idea and a move towards increased security and privacy on the Internet, but it presents an interesting problem for hosting providers that often do not only have one or two domains and subdomains to secure, but instead thousands or even hundreds of thousands of unique domains, registered and configured in an automated work-flow.

The solution comes in the form of a little piece of software called lua-resty-auto-ssl which offers "On the fly (and free) SSL registration and renewal inside OpenResty/nginx with Let's Encrypt."

The concept is straight forward: OpenResty (Nginx + luajit and a handful of useful modules and libraries) functions as a proxy in front of your websites. Whenever a new HTTPS request is received lua-resty-auto-ssl will check if you already have a certificate and else request a new one from Let's Encrypt and upon success serve the client, all in one go. The initial request will take a second or two longer while the domain ownership is validated by Let's Encrypt and the certificate is installed on the proxy.

autossl_lets-encrypt

Lua-resty-autossl is easily installed if you follow the instructions in the README and you can be up and running with a proof of concept in minutes.

In practice running this for tens of thousands of domains, we discovered a few things you might need to take care of:

  • Let's Encrypt rate limits
  • High availability
  • Dynamic resolution of backend servers

Let's encrypt rate limits

Let's Encrypt have a varity of rate limits to ensure fair usage by everyone, but most importantly they limit the amount of failed requests you can have per hour. This means we need to ensure that we don't request certificates for domains that we don't handle and thus can't validate.

To prevent this, lua-resty-auto-ssl uses an allow_domain function (configured in your nginx config) that is called before a certificate request is made to Let's Encrypt. By default this function returns false for all domains and you need to change this to something more useful for your specific setup before it will work.

There are quite a few options here, such as a whitelist of domains, calls to an API or database that will OK the domains for you etc.
The most basic one listed here would just return true for everything, but also opens you up to abuse and spamming Let's Encrypt with invalid requests.

auto_ssl:set("allow_domain", function (domain)
  return true
end)

The setup we've gone with is a DNS check to certify that the domain in question points to our OpenResty servers (replace <Our backend FQDN> with your own)

auto_ssl:set("allow_domain", function (domain)

     local DNS_Cache = require("resty.dns.cache")

     local dns, err = DNS_Cache.new({
       dict = "dns_cache",
       negative_ttl = 5,
       max_stale = 300,
       resolver = {
       nameservers = 8.8.8.8, 8.8.4.4
       }
     })

     local answers, err, stale = dns:query(domain)
     if err then
       if stale then
         ngx.header["Warning"] = "110: Response is stale"
         answer = stale
         ngx.log(ngx.ERR, err)
       else
         ngx.status = 500
         ngx.log(ngx.ERR,err)
         return ngx.exit(ngx.status)
       end
     end
       if not answers then
         ngx.log(ngx.ERR,"failed to query the DNS server for "..domain.." : ", err)
         return false
       end

     if answers.errcode then
       ngx.log(ngx.ERR,"checking "..domain.." server returned error code: ", answers.errcode,
         ": ", answers.errstr)
         return false
     end

     for i, ans in ipairs(answers) do
     -- If the result is a CNAME to our backend, request a SSL certificate.
       if ans.cname == "<Our backend FQDN>" then
         ngx.log(ngx.STDERR, "domain "..domain.." verified by dns found ", ans.cname)
         return true
       end
     end
     ngx.log(ngx.STDERR, "domain "..domain.." rejected by dns ")
     return false

  end)

If the domain is found to be served from your servers and certificate request is made, otherwise the request is served with a fallback certificate.

High Availability

Running a single auto-ssl server is an obvious single point of failure. We wanted to be able to run multiple servers in an AWS autoscaling group based on load, while avoiding provisioning certificates again for each new server, so we needed a shared certificate storage backend. Lua-resty-autossl supports file and redis storage adapters out of the box, with redis being the choice for shared storage. We're using a multi-az Amazon Elasticache for our implementation.

Dynamic resolution of backend servers

Configuration of the destination to send the requests after SSL termination, is configured like this in NGiNX:

location / {
    proxy_pass http://destination-hostname.example.com;
}

NGiNX resolves this hostname when it loads the configuration, which provides a challenge in an environment where the destination isn't static.

Since we are AWS based we proxy the terminated SSL requests to an ELB backed by an autoscaling group. This is great for scalability, but elastic loadbalancers should always be accessed by their DNS name, the IPs behind change quite often and you're getting a multivalue answer back to provide traffic to all the availability zones you have backends in.

The way we resolved this was by defining an upstream with the jdomain module to force resolution at an interval, then use that for our proxy destination:

upstream backend {
  jdomain ourbackend-elb-45472253.eu-west-1.elb.amazonaws.com interval=15 max_ips=3;
}

location / {
    proxy_pass http://backend;
}

Other things to be aware of:

Transparency records on initial access
Since certificates are provisioned on first access, certificate transparency records might not be available yet for the first visitor to a domain. This will cause Chrome to complain that the website isn't secure. It'll work on the second attempt, but we've taken to add a GET of the website to our automation workflow when we first setup a customers domain.

Certificate renewals
lua-resty-autossl defaulted to try to renew all certificates every 86400 seconds. This caused us some interruption to the service when one of our proxys would try to renew all our certificates at once. We developed a fix to store expiry dates with the certificates and only attempt renewal on those that are about to expire. This has been included in the main repository, but if you have a sufficient number of domains as we do it can still cause problems. We've mitigated this by having a separate instance that is dedicated to performing renewals and doesn't receive user traffic.