Continuous Deployment (part 1)

2024/02/18

What we have

homeslice, the dumb home automation platform, runs on a local Kubernetes cluster. I deploy changes by building the containers on my dev machine, pushing them to a local registry, and using Pulumi to deploy from the local registry to the local Kubernetes cluster.

This is OK for a development workflow but less than ideal for a vital production system like the one that powers that one light switch in my living room. Some of the problems:

The only way to have any idea what’s deployed is to manually inspect the internals of the running containers. There’s no guarantee that the image timestamp matches any particular code change. I could make local code changes, deploy, then revert my changes without comitting - leaving completely untracked code running in production.
Rollbacks are difficult. My local registry is space limited, and since there’s no particular relation between what’s in the registry and what’s in the code (see above), even if I have a previous image, there’s no easy way to tell what’s in it when I need to rollback.
It’s all manual. If there’s a security update affecting one of my apps, I have to hear about it, merge it in, manually rebuild my images, push them to the registry, update my Pulumi code, apply it, and remember to commit and push all those changes back up to GitHub.
Speaking of security updates, how would I even get them?

What we want

I’d like an external container registry to host my images and some automation to update my Kubernetes deployment when new images are pushed to the registry. Continuous Deployment (CD), in other words.

Also, I’d like automated security updates of my application. That’s the easiest part: GitHub’s dependabot will automatically apply security updates to my applications and their Docker base images.

I can build my images in GitHub actions and push them to GitHub’s Docker container registry. If I tag the images with the commit hash that built them, then I can trace deployed images back to specific Git commits. I can roll forward or back to known commits.

CD + dependabot will solve all my problems! My lightswitch and chimes will be more secure, robust, and professional.

What we did

Dependabot security updates

Enabling dependabot security updates is easy and well documented.

I also want regular non-security updates, and while that is well documented, it’s less easy than enabling security updates. I decided to defer that until after Continuous Deployment.

Build and tag images

Now that I have dependabot security updates, it’s time to iterate toward continuous deployment of those security updates.

We need to build our app images. We can use docker/build-push-action to build, tag, and push them to GitHub’s docker registry, from which any authenticated docker client can pull them.

But, there’s a hitch - the homeslice application is a monorepo that builds five docker images, so we can’t just drop in docker/build-push-action.

We can use a GitHub action matrix to run the docker/build-push-action for each of the apps.

  strategy:
      matrix:
        app: [backup-todoist, buttons, chime, clocktime, switches]
  
# other stuff      

      - name: Build and push Docker images
        uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
        with:
          context: apps/${{ matrix.app }}

Ideally we’d auto-discover the apps instead of hard-coding them. Or maybe define them clearly as env vars at the top of the file, like this:

env:
  IMAGE_NAME: ${{ github.repository }}
  REGISTRY: ghcr.io
  APPS: [backup-todoist, buttons, chime, clocktime, switches]

Unfortunately, while that part is nice and readable, actually making it work gets messy real fast, adding so much complexity to the pipeline that we’re better off with the status quo.

We’ll leave it as-is for now and loop back to this fine point later, once the main continuous deployment automation is working. The priority here is to remove humans from the update loop, not produce the most beautiful possible GitHub workflow YAML.

We want to encode useful information in the docker image tags. We want to trace back from what is running in production to a specific commit, so we need the git commit hash in the tags. It would also be nice to have a timestamp so we can quickly tell how old the images in our system are.

We’ll use docker/metadata-action to build the tags, it provides all the options we need.

We build tags with the format {{ git branch name }}.{{ timestamp }}.{{ git sha }}. By encoding the branch name in the tag, we can configure Continuous Deployment to only consider main branch images for running in production. In the case of homeslice, tests and code review must pass before commits are merged to main, so we have some quality gate for deployment to production. Great!

Create the tag:

    - name: Extract metadata (tags, labels) for Docker
      id: meta
      uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}/${{ matrix.app }}
        tags: |
          type=raw,enable=true,priority=5000,value={{branch}}.{{date 'YYYYMMDD-HHmmss' tz='Pacific/Los Angeles'}}.{{sha}}

Then, use the tag:

    - name: Build and push Docker images
      uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
      with:
        context: apps/${{ matrix.app }}
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}

See the full change in this PR.

What’s next

This gets us enough to manually deploy from traceable images on ghcr.io, a big improvement over manual deploys from whatever random code I built on my dev machine. The image tagging scheme means we can reason about what’s deployed to production.

The next step is to automate deploys, and now we have everything we need:

We build and tag our containers in a GitHub action.
We push those containers to a secure, external registry.
The tags include a branch name that we can use to filter for production eligibility (we want to deploy main branch builds only).
Thew tags include a timestamp that we can use to install only the latest image.

As a bonus, we also have automated security PRs with dependabot, which we can iterate on until we have routine non-security updates also.

We have lots of options for CD, including great tools like flux. But since we’re already using Pulumi for IaC, why not use Pulumi Operator for CD?

That’s next!

Who is we?

I have a mouse in my pocket.