IAM Roles Anywhere – now for everyone with Let's Encrypt

IAM Roles Anywhere – now for everyone with Let's Encrypt
Exploring Let's Encrypt's interoperability with AWS IAM Roles Anywhere

This post is an accompaniment to my talk at fwd:cloudsec Denver. It contains the SystemD files, Trust Policy for the IAM Role that was referred to, and other miscellaneous scripts. Towards the end, under FAQ, it has answers to questions that were either raised in the talk's Slack thread, received via direct messages, or asked in-person after the talk.

SystemD Files

lego-new-cert.service

Purpose of this SystemD service is to obtain a certificate on boot. AWS IAM creds are written to /etc/iam_aws_dns/username and /etc/iam_aws_dns/password by the bootstrap script, which in turn gets it from Terraform at runtime.

Note the user this script runs as: lego.

The file /etc/my_stuff is populated with some useful key-value pairs, formatted as environment variables, that are useful for us. MY_FQDN is the fully qualified hostname of the server itself since interpolation of subshell output isn't possible in a service file.

Let's Encrypt staging endpoint is not used because the certificates are used for internet-facing applications and/or intra-VPC connections over TLS as well.

Don't forget to parameterise or change the email address embedded in ExecStart.

[Unit]
Description=Obtain new Lets Encrypt certificate using `lego`
After=network.target

[Service]
Type=oneshot
User=lego

EnvironmentFile=/etc/my_stuff

EnvironmentFile=/etc/iam_aws_dns/username
EnvironmentFile=/etc/iam_aws_dns/password
Environment="AWS_DEFAULT_REGION=eu-west-2"

ExecStart=lego --path /etc/lego --accept-tos --email someone@changeme.com --dns route53 --domains ${MY_FQDN} run

[Install]
WantedBy=multi-user.target

lego-renew-cert.service

This service renews certificates before they expire. It is triggered by the next file, which is a SystemD Timer (contemporary replacement for cron.)

The only difference from the previous file is use of the renew subcommand to lego rather than run.

[Unit]
Description=Renew the Lets Encrypt certificate using `lego`

[Service]
Type=oneshot
User=lego

EnvironmentFile=/etc/my_stuff

EnvironmentFile=/etc/iam_aws_dns/username
EnvironmentFile=/etc/iam_aws_dns/password
Environment="AWS_DEFAULT_REGION=eu-west-2"

ExecStart=lego --path /etc/lego --accept-tos --email someone@changeme.com --dns route53 --domains ${MY_FQDN} renew

[Install]
WantedBy=multi-user.target

lego-renew-cert.timer

This Unit triggers a Service of the same name per schedule. It's weekly here so should allow four attempts at renewal in the last 30 days of a Let's Encrypt issued certificate's lifetime. There is seven minutes of randomised delay so all our servers are not hitting Let's Encrypt's APIs at the same time.

[Unit]
Description=Renew the Lets Encrypt certificate using `lego`

[Timer]
OnCalendar=weekly
RandomizedDelaySec=420

[Install]
WantedBy=multi-user.target

Permissions

Filesystem permissions play a key role in the safety of this scheme.

Files under /etc/iam_aws_dns/ are writable by root and readable by user lego.

Certificates and keys generated under /etc/lego/certificates/ are writable by lego and only readable by the application user, say app.

User app does not, and must not, have read access to anything under /etc/iam_aws_dns/. This prevents an application compromise from accessing static creds that could enable the attacker to interact with the Zone in Route 53 (and issue certificates.)

IAM Creds Policy

Best sourced from Lego's Amazon Route 53 docs: https://go-acme.github.io/lego/dns/route53/index.html#least-privilege-policy-for-production-purposes

IAM Role Trust Policy

{ "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "rolesanywhere.amazonaws.com"
      },
      "Action": [
        "sts:TagSession",
        "sts:SetSourceIdentity"
      ],
      "Condition": {
        "StringLike": {
          "aws:PrincipalTag/x509Subject/CN": "*.not-your-co.com"
        },
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:rolesanywhere:eu-west-2:123456789012:trust-anchor/086719f2-2612-4c3a-8fe3-changeme0064"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "rolesanywhere.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringLike": {
          "aws:PrincipalTag/x509Subject/CN": "*.not-your-co.com",
          "sts:RoleSessionName": "??????"
        },
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:rolesanywhere:eu-west-2:123456789012:trust-anchor/086719f2-2612-4c3a-8fe3-changeme0064"
        }
      }
    },
    {
      "Effect": "Deny",
      "Principal": {
        "Service": "rolesanywhere.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEqualsIgnoreCase": {
          "aws:PrincipalTag/x509Subject/CN": "foo-is-revoked.not-your-co.com"
}}}]}

gen-creds.sh

#!/bin/bash
TOTP=$(oathtool --totp $(sha1sum /opt/monkey.jpg | cut -d' ' -f1))
/opt/aws_signing_helper credential-process \
  --region eu-west-2 \
  --certificate /etc/lego/certificates/foo.not-your-co.com.crt \
  --private-key /etc/lego/certificates/foo.not-your-co.com.key \
  --role-arn arn:aws:iam::123456789012:role/my_custom_role_01 \
  --trust-anchor-arn arn:aws:rolesanywhere:eu-west-2:123456789012:trust-anchor/9fc0e3fb-c8a5-48dc-b457-f72d1c5f3cc5 \
  --profile-arn arn:aws:rolesanywhere:eu-west-2:123456789012:profile/603ae19d-ec74-405d-85d1-dcd5f1ab5f17 \
  --role-session-name $TOTP

Mismatch Detection Script

#!/bin/bash
set -uo pipefail

 who=$(jq -r .responseElements.x509Subject "$1")
when=$(jq -r .eventTime "$1")
with=$(jq -r .requestParameters.roleSessionName "$1")

env_db_file="env-sha1sum-db.json"

seed=$(jq -r ".[] | .\"$who\" // empty" $env_db_file)

when_minus_30s=$(date -R -u -d "$when - 30 seconds")

found=$(oathtool --window 1 --now "$when_minus_30s" \
        --totp $seed $with)

if [ "$found" -le 1 ] 2> /dev/null; then
  echo "legit"
  _ec=0
else
  echo "oh no"
  _ec=1
fi

exit $_ec

FAQ

  1. Benefit of this scheme over simply IAM Users?
    1. This post is mostly about using a free CA. ACME-enabled PKI enables workloads to bootstrap themselves with minimal provenancial identity.
    2. Since some form of credentials are needed, it is important to follow the advice in the Permissions section above – which safeguards such creds.
    3. This design scales well as asymmetric creds are issued by the CA and do not require changes at the cloud control plane (Credit: Ben Bridts).
  2. Why use Let's Encrypt Staging?
    1. The honest answer is we don't. We use their production API and use certificates that will be trusted by other clients without having to insert the LE Staging CA in their trust stores. This is because the servers are running public (and private) TLS ports and this enables us to use the same identity for AWS authn.
    2. If our use case wasn't any listening sockets, we'd stick to Staging since compromise of a certificate would not result in something that would be trusted by clients worldwide by default falling into the wrong hands.
  3. Can one workload easily impersonate another?
    1. Yes. They can claim to be any subdomain under the main domain name and LE would happily issue them a certificate.
    2. If the workload trying to impersonate doesn't have the same crucial factor(s) in their environment (as discussed in the talk), the detection mechanism will detect the anomaly.
    3. If the workload trying to impersonate has the same crucial factor(s), what is it to say it isn't legitimate? The answer to that should become a part of your exact environmental sensing equation.
  4. When should one move to a different CA?
    1. When you need an SLA, custom T&Cs.
    2. Or when you'd rather prefer the CRL approach for revocation.
    3. Maybe you've outgrown Let's Encrypt, are comfortable with ACME for bootstrapping identity, and you need dashboards, audit logs, etc from your CA.
  5. What impact could the upcoming shortlived 6-day certificates profile from Let's Encrypt have?
    1. The SystemD Timer above would have to run more frequently, ideally once every day.
    2. The theoretical window of risk is reduced since the creds are now valid for less of a time.
    3. Monitoring for failures will definitely need to be implemented.
  6. What if there's a renewal failure?
    1. Indeed, a renewal could fail for a number of reasons: network connectivity, IAM creds invalidated, etc.
    2. There would be at least four attempts in the last 30 days of the certificate with the SystemD Timer running weekly, so the risk is lowered since the task will be retried four times over a span of 28 days.
    3. If syslog (or Journal) is monitored, renewal failures would show up as errors from the lego-renew-cert service.
  7. How many IAM Roles per IAMRA Profile to go with?
    1. AWS have tried to answer this question in this blog under "IAM Roles Anywhere profiles and session policies".
    2. Although their view is "We recommend that customers use the one-profile-per-role approach to achieve more operational flexibility.", my general approach is not to create more objects than needed. One can always create a new IAMRA Profile, attach a Session Policy with it and then move Roles to it to minimise risk of adverse operational impact.
  8. Why did I do this?
    1. I love interoperability and this is my interop dispatch for June '25!

Further Reading

Sign in with your eID: Using AWS IAM Roles Anywhere with a SmartCard Reader by Ben Bridts

Talk's Slack thread at the time in Cloud Security Forum

Slides are attached to the talks page