Upstart

'ipcat' - An auto-generated list of datacenter IP address ranges

Chris Hickman — Tue, 18 Oct 2022 23:00:26 GMT

Looking for an accurate list of datacenter IP address ranges? After 3+ years of inactivity, 'ipcat' has been updated to sync with the published public IP ranges of major cloud providers. The list of datacenter IP ranges is available in CSV format, updated automatically on a daily basis, making it easy to integrate with your applications.

There are many situations where applications need to determine if a request came from an IP address that belongs to a server in a known datacenter, like a public cloud provider such as Amazon Web Services. Applications may want to only process requests from actual users by filtering those from servers, which likely represent bots, scrapers and other non-human programs.

One such situation is counting podcast downloads using request logs. Both the IAB and Open Downloads podcast measurement specifications dictate that requests from datacenter IP addresses should be discarded.

When Open Downloads (oDL) was originally released, it used the client9/ipcat project for its datacenter IP list. However, like oDL, that project has gone stale. Its datacenter IP list hasn't been updated in 4 years... and the internet has changed a lot since then!

An updated version of 'ipcat'

As part of updating oDL, I've been working on bringing the 'ipcat' list up to date.

This updated version of 'ipcat' is now available as a fork on Github:
https://github.com/growlfm/ipcat

With this fork, the datacenter IP list has been brought up to date after 3+ years of inactivity. This includes adding support for syncing with the latest published IP ranges of the most prevalent hosting providers, such as AWS and Azure. To keep the list current going forward, it uses Github Actions to automatically regenerate once per day.

The datacenter IP list is provided in CSV format. Each row represents an IP address range (start IP address - end IP address). IP ranges are non-overlapping and in sorted order.

There is also a summary of the total number of IP addresses for each of the providers included in the datacenter IP list.

Additional updates in this release include:

Sync with the published public IP address ranges of the following providers:
- Amazon Web Services (AWS)
- Microsoft Azure
- Google Cloud
- Cloudflare
- Fastly
- Akamai
- DigitalOcean
Update to Go 1.19
Use Github Actions to auto-generate latest IP list once per day
When building the IP list, handle proper subset ranges by skipping them (instead of throwing error)
Support for Docker

Rebooting Open Downloads (oDL)

Chris Hickman — Thu, 29 Sep 2022 21:00:14 GMT

Have you looked at Open Downloads (oDL) and considered using it, but were discouraged that no updates have been made in over 3 years? Now there's a fork of the project that brings fixes and improvements in hopes of greater adoption of this spec to foster open and transparent counting of downloads in the podcast community.

About 3 years ago, the Podsights team open-sourced how they counted podcast downloads. Calling the effort Open Downloads (oDL), they released the source code with the goal of coming together as a podcast community so that we all count podcast downloads the same way, openly and transparently.

Unfortunately, since the original release in August 2019, no updates were ever made and the oDL spec and codebase hasn't moved forward.

But... even though it hasn't been updated since its release, oDL does provide a great opportunity upon which to build an open, transparent, and uniform methodology for counting podcast downloads. Even though the IAB defines what should count as a podcast download, every platform has a different method of implementation, with no two providers coming up with exactly the same results. With oDL, there is an opportunity for the industry to remove that variance from the process, fostering trust of the data.

So, when it came to evaluating options for building the analytics pipeline for Growl, the technology platform I'm working on for building podcast products and services, I chose to start with oDL.

oDL Updated

It has now been over 2 years since I started working with the oDL source code, making fixes and improvements as needed to make it production-worthy for the analytics pipeline for Growl.

Recently, I've noticed there have been more inquiries and interest in the original oDL project. Seeing that there might be others in the community interested in adopting oDL, I've been working on merging my updates into a proper fork and making it available for others to see and use.

This fork is now available on Github:
https://github.com/growlfm/odl

With this fork, you'll find all the necessary updates to bring oDL up-to-date after three years of inactivity. This includes updating to Python 3.8, syncing with the latest version of the OPAWG user agent database, and bug fixes.

In addition, you'll find the following new functionality:

Add support for optional 'ip' attribute
- If 'ip' is supplied, then the event is compared against the IP deny list (to prevent counting downloads coming from datacenters, such as AWS and Azure)
Support for Open Podcast Prefix Project (OP3) JSON as source events
Support for Docker, making it easier to run oDL either locally on a laptop or in production

If there is interest from the community, future improvements to be made include:

Updated IP deny list
Output individual download records (instead of summaries)
Additional output fields, such as 'listener_id', 'device', 'os'
Add support for calculating downloads hourly (but look at full 24 hour window to ensure no duplicates counted)

Node.js troubleshooting: Child process spawn output is (sometimes) empty

Chris Hickman — Mon, 12 Sep 2022 23:29:35 GMT

My Node.js application uses child_process.spawn() to invoke ffmpeg for inspecting audio and video files. This code had been running without fail for many months. How come it suddenly started sporadically returning empty results?

Recently, I was preparing for an important demo of Growl, the technology platform I'm working on for building podcast products and services. The demo shows off the platform's ability to perform high-speed import of podcast feeds, taking advantage of parallel processing while handling complicated signaling and locking semantics to bring it all back together.

This high-speed import feature has been in place for many, many months and performed extremely reliably. So, when doing a practice run before the demo to verify that all was working, I was dismayed to discover that there were a small number of errors happening when using ffmpeg to get metadata about the audio files. Running through the process again, I was still getting a small number of errors, albeit this time the errors were happening on a different set of audio files.

The code hadn't been changed recently, and no recent deploys had been made. What was going on?

The code in question

Here's the original version of the code that spawns ffmpeg (actually, ffprobe) to get metadata about audio and video files. It is wrapped in a Promise so the caller can use simple async/await syntax to invoke it.

NOTE: Certain parts of the original code have been omitted for brevity/readability.

Original buggy version: Spawning ffprobe and returning output

const FFPROBE_PATH = 'ffprobe';
const FFPROBE_ARGS = [ '-hide_banner', '-loglevel', 'fatal', '-print_format', 'json' ];

async function probe(filename) {

    const createPromise = new Promise((resolve, reject) => {
        const args = FFPROBE_ARGS.concat(filename);
        const proc = child_process.spawn(FFPROBE_PATH, args);

        const outputBuffers = [];

        proc.stdout.on('data', (data) => { 
            outputBuffers.push(data); 
        });

        proc.on('error', (err) => {
            reject(errnew Error(err.toString());
        });

        proc.on('exit', (code) => {
            const output = JSON.parse(outputBuffers.join(''));
            
            if (code !== 0) {
                const msg = `Failed with code = ${code}`;
                return reject(new Error(msg));
            }

            resolve(output);
        });
    });

    const props = await createPromise;

    return props;

}

The troubleshooting process

To start with, all I knew was that, rather suddenly, every once in a while when under load, JSON.parse() was throwing an exception because the input string to be parsed was empty.

My first thought was perhaps this was a memory issue. The system had a high degree of concurrency and was invoking many simultaneous child processes to run ffmpeg against the imported audio files. Everything was running within containers, with memory limits. However, after digging into the metrics, memory utilization wasn't an issue.

Next, I thought that maybe ffmpeg was intermittently exiting with an error. I updated the process's exit handler to first check the exit code value and throw an exception if not a successful result.

Updated version: Check for exit code before parsing output

    proc.on('exit', (code) => {
        if (code !== 0) {
            const msg = `Failed with code = ${code}`;
            return reject(new Error(msg));
        }

        const output = JSON.parse(outputBuffers.join(''));
        resolve(output);
    });

After re-testing, I discovered that the exit code from ffmpeg was always successful. So, there were no problems with spawning ffmpeg as a child process or with ffmpeg opening and reading the file.

It was now obvious that the most likely culprit was that stdout was not getting flushed by the time the code was trying to process the results.

RTFM

So, time to turn to the Node.js documentation. In particular I wanted to get the details on the various events emitted by the spawned child process.

Here's what the Node.js API documentation says about the exit event:

When the 'exit' event is triggered, child process stdio streams might still be open.

Whoops. That would explain why stdout was not getting flushed for some invocations.

Reading further, here's what the documentation says about the close event:

The 'close' event is emitted after a process has ended and the stdio streams of a child process have been closed. This is distinct from the 'exit' event since multiple processes might share the same stdio streams. The 'close' event will always emit after 'exit' was already emitted, or 'error' if the child failed to spawn.

So, given that the code was relying on capturing stdout to return results from the child process (ffmpeg), the code was hooking the wrong event. Instead of treating exit as the termination event, it should be listening for the close event to guarantee that the output streams have been closed.

Final correct version: Spawning ffprobe and returning output

const FFPROBE_PATH = 'ffprobe';
const FFPROBE_ARGS = [ '-hide_banner', '-loglevel', 'fatal', '-print_format', 'json' ];

async function probe(filename) {

    const createPromise = new Promise((resolve, reject) => {
        const args = FFPROBE_ARGS.concat(filename);
        const proc = child_process.spawn(FFPROBE_PATH, args);

        const outputBuffers = [];

        proc.stdout.on('data', (data) => { 
            outputBuffers.push(data); 
        });

        proc.on('close', (code) => {
            if (code !== 0) {
                const msg = `Failed with code = ${code}`;
                return reject(new Error(msg));
            }

            const output = JSON.parse(outputBuffers.join(''));
            resolve(output);
        });
    });

    const props = await createPromise;

    return props;

}

Lessons learned

What's surprising about this bug is that it was always there, right from the beginning. Just waiting for the perfect time to rear its ugly head, like right before an important demo. I suspect that I ran into this now because concurrency has increased significantly within the system, making timing issues more likely.

When this code was originally written, I was probably looking at sample code for how to wrap child_process.spawn() with a Promise. And that sample code hooked the 'exit' event, instead of the 'close' event.

Of course, the big lesson here is don't make assumptions. Don't blindly trust code snippets or examples. Make sure you understand exactly what the code is doing. Oh... and read the docs!

Updating Container Secrets Using CloudWatch Events + Lambda

Chris Hickman — Tue, 03 Mar 2020 19:15:33 GMT

Using Amazon Elastic Container Service (ECS) secrets management integration, but afraid to rotate credentials because your app will break? Here's a technique for automatically updating your containers when secrets are changed.

In a previous post, I showed how Amazon Elastic Container Service (ECS) makes it easy to inject sensitive data stored as either AWS Secrets Manager secrets or AWS Systems Manager Parameter Store parameters into your containers.

However, one of the problems with this approach is that container startup is the only time when ECS will inject sensitive data into your container. This means that if the sensitive data is updated after the container is started, your container will not automatically receive any updates. It is up to you to ensure that the container is stopped and a new one created in order to read the updated value.

A best practice with secrets management is to periodically rotate credentials. But given that our containers won't receive these updates after the containers are started, how can we safely rotate these credentials without breaking the application?

What we need is a method to automatically update containers when secrets are updated. To accomplish that, we need to have two components in place. First, we need to receive a notification when a secret is updated. Then, we trigger an action to recycle the container(s). In this post, I will show you how to leverage CloudWatch Events and Lambda to perform both of these tasks to automatically update your container secrets.

Using CloudWatch Events to receive notifications when secrets are updated

To receive notifications about changes when secrets are updated, you can leverage CloudWatch Events. CloudWatch Events is a service that delivers a near real-time stream of system events that describe changes in AWS resources. There are three primary components associated with CloudWatch Events: events, rules and targets.

Whenever an action is performed on a Secrets Manager secret or Systems Manager parameter, a CloudWatch event representing the action is emitted. For example, events are emitted whenever a value is created, updated or deleted.

To consume the CloudWatch Event, you create a CloudWatch Events rule that filters for these events. You can then invoke a target, such as Lambda function, to trigger other actions whenever a filtered event is received.

Determining the event structure

Events in Amazon CloudWatch Events are represented as JSON objects. All CloudWatch events have the same top-level fields, such as source and detail-type. The combination of the source and detail-type fields serves to identify the emitter of the event. All custom data is stored in the detail field of the event.

CloudWatch event emitted by Systems Manager Parameter Store

Keep in mind that the schema of the event will depend on the source that emitted it. For example, the detail field structure for a Systems Manager Parameter Store event will be different than the detail field structure of a Secrets Manager event.

Since we plan to use Lambda to process events, normally we would create our Lambda function first. But the code will need to know the schema of the event structure passed to it.

For AWS resources that emit events directly to CloudWatch Events, you can view sample events when creating rules in the CloudWatch Events console. To view these sample events, just expand the "Show sample event(s)" dropdown under the event pattern textbox. But samples are not available for all types of resources, such as AWS Secrets Manager.

An alternative technique for discovering event schemas is to use CloudWatch Logs as a temporary target. You'll be able to see the exact structure of the events in the CloudWatch Logs, which can then serve as your "specification" when writing the Lambda handler code. Then, after coding the Lambda function, you can update the target to be the Lambda function instead of CloudWatch Logs. Note that this technique works for any AWS resource.

Configure CloudWatch Events for Systems Manager parameters

To consume the CloudWatch events emitted by Systems Manager Parameter Store, you create a CloudWatch Events rule that filters for these specific events. The rule can be created using the AWS Console, using the AWS Command Line Interface (CLI) or by making a direct API call.

Here's how to create a CloudWatch Events rule for Systems Manager parameters using the AWS Console.

Open the CloudWatch console.
From the left-hand navigation pane, choose Events->Rules, and then click the "Create rule" button.
Under Event Source, verify that Event Pattern is selected.
For "Service Name" dropdown, choose "EC2 Simple Systems Manager (SSM)".
For "Event Type" dropdown, choose "Parameter Store".
Enable the "Specific detail type(s)" radio button, and then choose "Parameter Store Change" from the dropdown.
Under Targets, click the "Add target" button.
In the Targets list, choose "CloudWatch log group" as the target type. Specify the name of the log group (e.g. /aws/events/ssm).
Click the "Configure details" button to move to the next screen.
Provide a name and (optional) description for the CloudWatch Events rule. Leave the Enabled box selected to make the rule active immediately.
Finally, click the "Create rule" button.

Creating a CloudWatch Event rule for Parameter Store

Configure CloudWatch Events for AWS Secrets Manager secrets

Unlike Systems Manager Parameter Store, Secrets Manager does not directly emit events that can be detected by CloudWatch Events. However, you can use AWS CloudTrail to produce CloudWatch Events when secrets are modified within Secrets Manager.

AWS CloudTrail is a service that automatically records AWS API calls. Each time CloudTrail records a Secrets Manager API call, it will emit a CloudWatch Event. We can then create a CloudWatch Events rule to trigger on the information captured by CloudTrail.

Enable CloudTrail logging

In order to use CloudTrail to produce CloudWatch Events, you need to enable at least one trail for your account. There is no charge for creating a trail that delivers a single copy of management events (the default setting when creating a trail). You only pay for S3 charges associated with storing the CloudTrail logs.

Here's how to create a trail for your account using the AWS Console.

Open the CloudTrail console.
Click the "Create Trail" button.
Specify a trail name.
By default, management events will be enabled, and insights and data events will be disabled. These settings are sufficient for triggering CloudWatch events when secrets are updated in Secrets Manager.
Under "Storage Location", specify the S3 bucket where the CloudTrail logs should be delivered.
Click the "Create" button.

Create a CloudWatch Events rule for Secrets Manager

Now that CloudTrail logging is enabled, you can create a CloudWatch Events rule that filters for events emitted by CloudTrail specific to Secrets Manager operations.

Here's how to create a CloudWatch Events rule for Secrets Manager parameters using the AWS Console.

Open the CloudWatch console.
From the left-hand navigation pane, choose Events->Rules, and then click the "Create rule" button.
Under Event Source, verify that Event Pattern is selected.
For "Service Name" dropdown, choose "Secrets Manager".
For "Event Type" dropdown, choose "AWS API Call via CloudTrail".
Leave the "Any operation" radio button selected.
Under Targets, click the "Add target" button.
In the Targets list, choose "CloudWatch log group" as the target type. Specify the name of the log group (e.g. /aws/events/secrets-mgr).
Click the "Configure details" button to move to the next screen.
Provide a name and (optional) description for the CloudWatch Events rule. Leave the "Enabled" checkbox selected to make the rule active immediately.
Finally, click the "Create rule" button.

Creating a CloudWatch Event rule for Secrets Manager

Testing the CloudWatch Events rule

Now that we have created rules that capture events emitted when values change in either System Manager Parameter Store or AWS Secrets Manager, we can test the rule by updating a secret value and observing the output sent to the CloudWatch logs group.

To do this, go to the AWS console for the secrets management service you are using (either Systems Manager Parameter Store or AWS Secrets Manager). From the listing of parameters/secrets, choose an existing item that will get updated (if you don't have any yet, create one first). On the value detail page, select "Edit", provide an updated value and then save your change.

To view the event emitted when you updated the item, open the CloudWatch console, and select Logs->Log groups from the left-hand navigation pane. Choose the log group you specified when creating your rule to view the captured event. You should see an event similar to one of the following (depending on which service hosts the secret you updated):

Example of AWS Secrets Manager update event

{
    "version": "0",
    "id": "6e6b200b-f2b2-95c4-42ac-c26e912d2738",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.secretsmanager",
    "account": "1234567890",
    "time": "2020-02-05T19:15:10Z",
    "region": "us-west-2",
    "resources": [],
    "detail": {
        "eventVersion": "1.05",
        "eventName": "PutSecretValue",
        "requestParameters": {
            "secretId": "/development/credentials/test.json"
        }
    }
}

NOTE: For AWS Secret Manager events, detail-type will be "AWS API Call via CloudTrail" and source will be "aws.secretsmanager". The operation that was performed can be found in detail.eventName.

TIP: The detail.requestParameters.secretId property can be in either short name format (e.g. /development/credentials/test.json) or a full ARN (e.g. arn:aws:secretsmanager:us-west-2:1234567890:secret:/development/credentials/test.json-fWJsLX). The particular format that will be used depends on how the request was made and by which client. For example, if you update the secret via the AWS Console, the short name format will be used. But if the update was done via the built-in credential rotation (Lambda function), the full ARN will be used. If you need to test against specific secret names, you should perform substring matching instead of exact matching.

Example of Systems Manager Parameter Store update event

{
    "version": "0",
    "id": "60794edf-9ea4-a349-1f9e-451156ae5a8c",
    "detail-type": "Parameter Store Change",
    "source": "aws.ssm",
    "account": "1234567890",
    "time": "2020-02-21T21:57:33Z",
    "region": "us-west-2",
    "resources": [
        "arn:aws:ssm:us-west-2:1234567890:parameter/development/credentials/test.json"
    ],
    "detail": {
        "name": "/development/credentials/test.json",
        "type": "SecureString",
        "operation": "Update"
    }
}

NOTE: For Systems Manager Parameter Store events, detail-type will be "Parameter Store Change" and source will be "aws.ssm". The operation that was performed can be found in detail.operation.

Recycling containers in response to CloudWatch Event

Now that we know the format of the event, we can create a Lambda function to process the CloudWatch events.

The Lambda function will need to be able to process events from both Parameter Store and AWS Secrets Manager. It will look for changes made to a specific item that represents the database credentials, and when it detects a change to this item, it will then reboot the containers associated with the application service.

TIP: You can use the "Force new deployment" option for ECS services to recycle all containers without creating a new task definition file.

First, we start with the primary handler function. This entry point is essentially a router, sending events to the appropriate function based on whether this is a Parameter Store or AWS Secrets Manager update.

Lambda handler function (Node.js)

const AWS = require('aws-sdk');
const ecs = new AWS.ECS({ apiVersion: '2014-11-13' });

exports.handler = async function(event, context) {
    if ('aws.ssm' === event.source &&
            'Parameter Store Change' === event['detail-type']) {
        await handleSsmChange(event.detail);
    }
    else if ('aws.secretsmanager' === event.source &&
            'Parameter Store Change' === event['detail-type']) {
        await handleSecretsManagerChange(event.detail);
    }
};

Since the event schema used by each service is different, we break up processing into two helper functions, each specific to their respective secrets service. The helper function verifies that the event represents an "update" of the secret value representing the database credentials. If so, it then calls a helper function for rebooting the containers.

Handling Parameter Store events

const handleSsmChange = async detail => {
    if ('Update' === detail.operation) {
        if (detail.name.includes(DB_CONFIG_SECRET_NAME)) {
            // DB credentials have been updated -
            // restart containers with ECS
            await updateEcsService(CLUSTER_NAME, SERVICE_NAME)
        }
    }
};

Handling Secrets Manager events

const handleSecretsManagerChange = async detail => {
    if (detail.errorCode && typeof detail.errorCode === 'string') {
        //  This is a failure event - we can ignore
        return;
    }

    if ('PutSecretValue' === detail.eventName) {
        const secretId = detail.requestParameters.secretId;
        if (secretId.includes(DB_CONFIG_SECRET_NAME)) {
            // DB credentials have been updated -
            // restart containers with ECS
            await updateEcsService(CLUSTER_NAME, SERVICE_NAME)
        }
    }
};

To reboot the containers, we add a helper function that invokes the ECS API to make a "ecs.updateService" API call using the forceNewDeployment flag.

Reboot ECS containers

const updateEcsService = async function(clusterName, serviceName) => {
    const params = {
        service: serviceName,
        cluster: clusterName,
        forceNewDeployment: true
    };

    try {
        await ecs.updateService(params).promise();
    } catch (error) {
        console.log(`ecs.UpdateService failed. Reason: ${error}`);
    }
};

REMEMBER: After you have created the Lambda function, make sure to update the CloudWatch Event rules to specify the Lambda function as the target (instead of the CloudWatch Logs group).

Wrapping it all up

Let's consider a real-world use case. Suppose we have a containerized application running on ECS. The application uses a MySQL RDS database to store state. The application retrieves database credentials from AWS Secrets Manager. Within Secrets Manager, automatic rotation has been configured for the MySQL RDS database credentials.

Now, with our system in place for detecting and responding to changes to secrets, we have the following automated workflow:

Secrets Manager updates the secret (credential rotation).
CloudWatch Event(s) are emitted to the system bus.
The CloudWatch Events rule for AWS Secrets Manager fires on the event and invokes the Lambda handler.
The Lambda function processes the event, detects that the database credentials have been updated, and then makes the ECS API call to force a new deployment.
Containers associated with the ECS service are stopped and restarted per ECS service rules. As the containers restart, they receive the new database credentials.

TIP: By setting appropriate minimum, desired and maximum task counts, you can ensure zero downtime during the container reboot cycle.

One final note

It may seem overkill to force a new deployment when a secret is updated. However, when leveraging a "no-code" solution to secrets management (such as using ECS secrets injection via task definition files), this is likely one of the most appropriate techniques. Especially considering that credential rotation will happen sporadically (say, once per month) and redeployment can happen with zero downtime.

If, on the other hand, you have developed secrets management for your application by direct use of the APIs, then you can be much finer grained in your response to secrets being updated. For example, your application could have a listener for events when secrets are updated, and then simply update its connection string dynamically without any restart required.

The Future of Containers - What's Next?

Chris Hickman — Thu, 30 Jan 2020 18:04:56 GMT

Maybe you've heard the buzzwords everyone seems to be talking about when discussing the future of containers. Strange names like "microVMs"... "unikernels"... "sandboxes".

Have you wondered what these things are and how you can use them? Or, for that matter, should you use them?

In this post, I'll tell you about some of the most promising technologies on the horizon for running cloud-native applications. If you are using (or considering using) containers, it's important that you understand these new technologies so you can decide if and when to start adopting them.

But before we dive into the future, let's first understand the present state of containers and some of the problems we now face.

Virtual machines

Cloud computing would not be possible if not for virtual machines. They are the fundamental computing resource for cloud-native applications.

Virtual machines allow us to virtualize an entire server, known as full virtualization. The virtual machine runs a full copy of the operating system as well as a virtual copy of the hardware. Enough hardware is simulated such that a "guest" operating system can run unmodified within the virtual machine. The guest operating system runs without being aware that it is executing virtually.

Virtual machines are enabled by the hypervisor, specialized software that runs directly on the host computer. The hypervisor is responsible for managing the physical resources of the underlying server. The hypervisor ensures that each virtual machine is allocated its own exclusive set of resources, such as CPU and memory.

Thanks to the hypervisor, each virtual machine is isolated from all other virtual machines. They each have their own guest operating system and kernel. Because of these strong boundaries, virtual machines offer great security and strong workload isolation.

However, there are downsides with traditional virtual machines. Because they are designed to run any operating system without modification, they must provide broad functionality and a robust set of simulated hardware. Consequently, virtual machines are "heavyweight" solutions that require significant computing resources, which lead to poor resource utilization. Virtual machines also typically have long boot times, limiting the rate of how fast they can be created.

Containers

Then containers came along.

With containers, we can virtualize just our applications, rather than the entire server. This makes containers an ideal abstraction for our cloud-native applications.

To work their virtualization magic, container implementations rely on operating system kernel functionality, such as Linux namespaces and cgroups. Functionally, containers are simply processes running under the host operating system. The kernel partitions resources among these processes and isolates them by placing them in separate namespaces.

Because containers are not virtualizing an entire server and all its hardware, containers are faster and less resource intensive than virtual machines.

But we pay a price for the performance and resource efficiency gains of containers. By relying on kernel functionality to enable virtualization, containers must share a single operating system kernel between the host and all other containers running on that host. This sharing of the kernel makes containers less secure than virtual machines.

The best of both worlds

What if we could have the performance and resource efficiency of containers coupled with the enhanced security and isolation of virtual machines?

Let's explore three of the most promising technologies aiming to combine the best of both virtual machines and containers: microVMs, unikernels and container sandboxes.

MicroVMs

MicroVMs are a new way of looking at virtual machines. Rather than being general purpose and providing all the potential functionality a guest operating system may require, microVMs seek to address the problems of performance and resource efficiency by specializing for specific use cases.

For example, a cloud-native application only needs a few hardware devices, such as for networking and storage. There's no need for devices such as full keyboards and video displays. Why run the application in a virtual machine that provides a bunch of unnecessary functionality?

By implementing a minimal set of features and emulated devices, microVM hypervisors can be extremely fast with low overhead. Boot times can be measured in milliseconds (as opposed to minutes for traditional virtual machines). Memory overhead can be as little as 5MB of RAM, making it possible to run thousands of microVMs on a single bare metal server.

Perhaps one of the most talked about microVMs is Firecracker. AWS created Firecracker to specifically address the need to run serverless applications quickly, efficiently and with utmost security. By specializing on a very specific use case, AWS was able to build a virtualization environment that perfectly suits the needs of cloud-native applications.

But remember, a big advantage of containers is that they virtualize at the application level, not the server level like virtual machines. This is a natural fit with our development lifecycle - after all, we build, deploy and operate applications, not servers.

Containers are a mature technology, supported by a rich ecosystem of tooling and services that provide end-to-end coverage for the entire application lifecycle. Build tools, packaging formats, runtimes and orchestration systems allow us to work much more efficiently than with virtual machines.

A better virtual machine by itself doesn't help us much if we have to go back to deploying servers and give up our rich container ecosystem. The goal is to keep working with containers but run them inside their own virtual machine to address the security and isolation problem.

Most microVM projects provide a mechanism to integrate with the existing container runtimes. Instead of directly launching a container, the microVM-based runtime first launches a microVM, and then creates the container inside that microVM. Containers are encapsulated within a virtual machine barrier, without any impact on performance or overhead.

It's like having our cake and eating it too. MicroVMs give us the enhanced security and workload isolation of virtual machines, while preserving the speed, resource efficiency and rich ecosystem of containers.

Unikernels

Unikernels aim to solve some of the same problems as microVMs. Like microVMs, unikernels allow us to run cloud-native applications with high performance and low overhead, while providing a strong security posture.

Although unikernels address the same issues as microVMs, they do so in a radically different way.

A unikernel is a lightweight, immutable OS compiled specifically to run a single application. During compilation, the application source code is combined with the minimal device drivers and OS libraries necessary to support the application. The result is a machine image that can run without the need for a host operating system.

Unikernels achieve their performance and security benefits by placing severe restrictions on execution. Unikernels can only have a single process. When packaged as a unikernel, your application cannot spawn sub-processes. With no other processes running, there is less surface area for security vulnerabilities.

In addition, unikernels have a single address space model, with no distinction between application and operating system memory spaces. This increases performance by removing the need to "context switch" between user and kernel address spaces. Note that with a single address space, there is no protection of the kernel from application errors. But, given that the unikernel can only run a single process - the application - there is not much use for this type of protection. If the application dies, there is no use keeping around the OS. It's better to simply restart the unikernel.

However, one of the big drawbacks with unikernels is that they are implemented entirely differently than containers. The rich container ecosystem is not interchangeable with unikernels. In order to adopt unikernels, you will need to pick an entirely new stack, starting with choosing a unikernel implementation. There are many disparate unikernel platforms to choose from, each with their own requirements and constraints. For example, to build unikernels with MirageOS, you'll need to develop your applications in the OCaml programming language.

It is interesting to note that Docker acquired Unikernel Systems back in January 2016. The expectation was that this would combine the familiar tooling and portability of Docker with the efficiency and specialization of unikernels. Unfortunately, it didn't exactly work out that way. Docker abandoned the concept of unikernels and remains focused on containers.

Container sandboxes

Container sandboxes are designed to address the security issues of a shared operating system kernel. Sandboxes provide a "kernel proxy" for each container. Rather than each container directly addressing the host operating system, it gets assigned its own kernel proxy. The kernel proxy implements all the kernel features expected by the container, such that the container can run in the sandbox without any modifications.

One prominent project that implements container sandboxes is Google's gVisor project. gVisor provides a kernel proxy module, written in the Go language, that acts as an intermediary between the container and the host operating system.

Container sandboxes specifically address the security issues posed by sharing the OS kernel between containers by introducing a new layer of separation. So, while sandboxes may provide additional isolation, they do come with an additional performance penalty incurred with translations between the proxy and kernel.

So, What's Next?

Container sandboxes are an interesting approach to solving workload isolation, but don't really offer enough benefits to warrant a switch. Looking ahead to the future, I think that microVMs and unikernels deserve the most attention.

If you are using containers, microVMs should definitely be on your roadmap. MicroVMs integrate with existing container tooling, making adoption rather painless. As microVMs mature, they will become a natural addition to the runtime environment, making containers much more secure.

Unikernels, on the other hand, require an entirely new way of packaging your application. For very specific use cases, unikernels may be worth the investment of converting your workflow. But for most applications, containers delivered within a microVM will provide the best option.

Your Most Important Skill

Chris Hickman — Wed, 15 Jan 2020 19:56:10 GMT

The rapid pace of technology innovation

"Life moves pretty fast. If you don't stop and look around once in a while, you could miss it." - Ferris Bueller

Ferris was right. It's amazing how fast everything is changing, with technology leading the way. New programming languages, new frameworks, new methodologies, new databases. Cloud-native software. DevOps. DevSecOps. AI, ML, analytics. It can be hard to keep up.

I want to share a personal story that reinforced the importance of continual learning and, in particular, taking ownership of your own growth.

A familar problem with a familiar solution

Recently, I was preparing for an episode of Mobycast, a weekly podcast I co-host with Jon Christensen where we discuss topics related to modern cloud-native software development.

With this new episode, I walk the listener through the steps of separating cloud-based resources into public and private subnets. Public-facing resources get placed on the public subnets. All other resources get placed on private subnets. This is a best practice that reduces the surface area you must protect against Internet-based attacks.

One hurdle of incorporating private subnets into your network design is that you need a way to securely access those resources, which no longer have public Internet access. You no longer can simply SSH into your servers. Other methods must be used.

My typical "go to" solution for accessing private subnets has been to use a Virtual Private Network (VPN) connection. There are various types of VPNs, some of which require hardware, such as AWS Managed VPN, and others which are software-only.

The hardware-assisted solutions are more suited when connecting on-premise locations to the public cloud. A software-only VPN is ideal when you have remote access users and want simplicity at the lowest cost. For the podcast episode, I opted to use a software-only VPN.

With AWS, implementing a software-only VPN has always meant deploying third-party software on an EC2 instance in your VPC. The only question has been what third party software to use? There are numerous options to choose from, ranging from commercial, paid software to free, open source packages.

Wanting to demonstrate an option with the least cost, I evaluated two open source packages: SoftEther and OpenVPN. After some research, it was obvious that OpenVPN has the best support on AWS, making it an easy choice. I took notes as I installed and configured OpenVPN Access Server on an EC2 instance in my VPC. After a couple of hours, I had a secure VPN connection to my private subnets, along with detailed notes for the podcast episode.

That feeling of "Ohhhh nooooo"

While doing some final fact checking in preparation for recording the episode, I was reviewing documentation on the VPN choices available from AWS. Curiously, two choices stood out which I hadn't heard of before: AWS Site-to-Site VPN and AWS Client VPN.

After some reading, I discovered that AWS Managed VPN has become AWS Site-to-Site VPN. It's hard to tell when this change went into effect. Some of the official AWS documentation still refers to Managed VPN as a valid option. Ok, name change and perhaps some additional features. I can deal with that.

But wait... what is "AWS Client VPN"? After a few minutes skimming the docs, I learned that AWS Client VPN is a fully managed service for a software-only VPN solution. Additionally, it has native support for OpenVPN clients. What?!? To say I was surprised by this discovery is an understatement. I had just spent hours standing up my own VPN solution without knowing that AWS offers the equivalent as a fully managed service. How had I missed this?

Turns out, AWS Client VPN is relatively new, launching in December 2018. A mistake on my part was not being vigilant about reading updates on the AWS What's New page. But the biggest mistake by far was my reliance on a familiar pattern in solving a familiar problem.

Kill your darlings

"Kill your darlings, kill your darlings, even when it breaks your egocentric little scribbler’s heart, kill your darlings." ― Stephen King

Even though Mr. King was referring to ruthlessly editing one's prose, I think this quote equally applies to the patterns and practices we hold on to.

Technology is changing so rapidly and it can be quite an investment to learn a new skill or technique. Our knowledge gained is so hard fought that it is only natural to rely on it dearly. It becomes part of our core set of practices and patterns.

But this is the catch-22 of being in the tech industry. We must work hard to keep abreast of changes and advancements, so that we can then use technology effectively. But we must also be willing to "kill our darlings" with the realization that the way we have been doing things may no longer be the best solution.

The most important skill

This is why your most important skill is the ability to learn new things quickly. Gone are the days where learning takes place during your formal education (i.e. attaining a college degree), and then you leave that behind and become a practitioner.

Instead, embrace the new reality that learning is a continual process, with which you will never be done. Be flexible with your opinions, be open to new ideas, stay curious and commit to a growth mindset. In short, own your personal development. No one else will do it for you.

If you don't, after a few years you may find yourself left behind, mired in your own technical debt and out of touch with the rapidly evolving state of technology.

Secrets Handling for Containerized Applications Running on ECS

Chris Hickman — Tue, 07 Jan 2020 02:47:17 GMT

Recap so far

In a previous post, I discussed why we need secrets management for our applications and some of the possible solutions available to us.

Now that we know the "theory", it's time to put that knowledge into practice.

In this follow up post, I'll show how you can easily implement secrets management for a containerized application running on Amazon Elastic Container Service (ECS). Let's get started.

Amazon Elastic Container Service (ECS) and secrets management

Amazon ECS enables you to inject sensitive data into your containers stored in either AWS Secrets Manager secrets or AWS Systems Manager Parameter Store parameters and then referencing them in your container definition. This feature is supported by tasks using both the EC2 and Fargate launch types.

Hmmm... but what about EKS?

Kubernetes does have its native Secrets objects, which are used for storing and managing sensitive information. Storing secrets in a Secrets object is much safer than putting it verbatim in a Pod definition or burned into a container image.

However, there is currently no direct integration between Amazon Elastic Kubernetes Service (EKS) and Parameter Store or Secrets Manager. If your containers are leveraging Kubernetes on AWS, you'll need alternative methods for secrets handling.

Fortunately, others have developed solutions to extend Kubernetes Secrets to include the concept of ExternalSecrets. For example, there is GoDaddy's open source project, which injects sensitive data managed by an external system, such as Parameter Store or Secrets Manager, into Kubernetes secrets.

Three types of sensitive data injection for ECS

With ECS, secrets can be exposed to a container in the following three ways.

1. Container secrets as environment variables

This is the method you will most likely use. Using this type of injection, sensitive information will be exposed as environment variables that are isolated to the target container.

To inject the secrets, you specify parameters in the task definition file as name/value pairs. The name portion specifies the environment variable name and the value portion references the Amazon Resource Name (ARN) of the secret (either a Secrets Manager ARN or a Parameter Store ARN).

The ARN must be in the same account as the running container (but can be in a different region). With Parameter Store secrets, you don't have to use the full ARN if it is hosted in the same region - you can simply use the parameter name. But, you should consider always using the full ARN so there is consistency across all your task definition files, regardless of where the secrets are stored. This will reduce copy/paste errors as you add or update secrets for your containers.

Task definition example for container secrets as environment variables

After this container is started, there will be two environment variables named MY_SECRET and ANOTHER_SECRET, which contain the specified values from Secrets Manager and Parameter Store.

{
  "containerDefinitions": [{
    "secrets": [{
      "name": "MY_SECRET",
      "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-AbCdEf"
    },{
      "name": "ANOTHER_SECRET",
      "valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name"
    }]
  }]
}

2. Sensitive data for log configuration

Use this method when you need to specify secret information as part of log configuration, such as when using a third-party logging service like Splunk.

The secret information for the log configuration gets specified using the secretOptions parameter. The sensitive data then gets passed to the logging driver as part of the options data.

Task definition example for log configuration

{
  "containerDefinitions": [{
    "logConfiguration": [{
      "logDriver": "splunk",
      "options": {
        "splunk-url": "https://cloud.splunk.com:8080"
      },
      "secretOptions": [{
        "name": "splunk-token",
        "valueFrom": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name-AbCdEf"
      }]
    }]
  }]
}

3. Private registry credentials

Use this type of sensitive data injection when you need to access private repositories that require credentials, such as Docker Hub and JFrog Artifactory.

Note that this does not apply when accessing private repositories hosted in Elastic Container Registry (ECR). ECR relies on IAM roles for secure access to private repositories.

To use this feature, you first create a secret in Secrets Manager that contains your private registry credentials in the following format:

{
  "username" : "privateRegistryUsername",
  "password" : "privateRegistryPassword"
}

The private registry credentials then get specified using the Secrets Manager ARN as the credentialsParameter in the repositoryCredentials section of the task definition file.

Task definition example for private registry credentials

"containerDefinitions": [
    {
        "image": "private-repo/private-image",
        "repositoryCredentials": {
            "credentialsParameter": "arn:aws:secretsmanager:region:aws_account_id:secret:secret_name"
        }
    }
]

How it works

Injection at container startup only

When starting your container, ECS will process any secrets directives found in the task definition file and make calls on your behalf to Systems Manager Parameter Store and/or Secrets Manager. Container startup is the only time when ECS will inject sensitive data into your container.

This means that your container will not automatically receive any subsequent updates to sensitive data, such as when credentials are rotated. In order to receive the updated sensitive data, you must launch a new container.

TIP: You can use the "Force new deployment" option for ECS services to recycle all containers without creating a new task definition file.

Note that AWS recently launched AWS AppConfig, which is a new feature to deploy configurations across applications in a validated, controlled and monitored way. Configuration can be stored either as a Systems Manager Document or a single Parameter Store parameter.

At first glance, it may appear that AppConfig would help solve the problem of updating containers when secrets are changed. Unfortunately, however, applications must poll for changes when using AppConfig. This requires custom code in the application to poll for changes and then reload of the configuration.

Required configuration and IAM permissions

Note that during this injection of sensitive data, ECS is making calls to Systems Manager Parameter Store and Secrets Manager on your behalf. In order to make those calls, the ECS agent uses the ECS Task Execution IAM Role (ecsTaskExecutionRole).

Therefore, when specifying secrets in your task definition file, you must also ensure to specify the ecsTaskExecutionRole parameter with a valid role ARN that has the proper permissions to make calls to Parameter Store and/or Secrets Manager.

Also, if you are using sensitive data for log configuration and the EC2 launch type, you will need to update the ECS agent configuration (the "./etc/ecs/ecs.config" file) to specify the following flag:
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true

Putting it all together - implementation steps

Now that we know how ECS injects sensitive data into our containers, let's walk through the implementation steps to make this all work.

1. Store secret

First we need to store the sensitive data in either Systems Manager Parameter Store or Secrets Manager. You can do this by using the AWS Console, using the AWS Command Line Interface (CLI) or by making a direct API call.

Creating a secret using the Systems Manager Parameter Store console

2. Configure the ECS Task Execution role

Next, we need to ensure that the ECS Task Execution role has permissions to make calls to Systems Manager Parameter Store or Secrets Manager.

Warning: If the ECS Task Execution role doesn't have the correct permissions, the container will fail to start (i.e. "hard fail") and you'll see an error message similar to the following:

Stopped reason Fetching secret data from AWS Secrets Manager in 
region us-west-2: secret arn:aws:secretsmanager:secret:/my-secret: 
AccessDeniedException: User: arn:aws:sts:assumed-role/ecsTaskExecutionRole 
is not authorized to perform: secretsmanager:GetSecretValue on 
resource: arn:aws:secretsmanager:secret:/my-secret

The ECS Task Execution role will need permission to the following actions:

ssm:GetParameters - if using Systems Manager Parameter Store
secretsmanager:GetSecretValue - if using Secrets Manager
kms:Decrypt - if your secret uses a custom KMS key (i.e. not using the default encryption key)

In order to give the ECS Task Execution role these permissions, create a new IAM policy and then attach this policy to the ECS Task Execution role. As a best practice, you should also consider explicitly specifying the resources (parameters, secrets, CMKs) that can be accessed.

Example Task Execution Role Inline Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameters",
        "secretsmanager:GetSecretValue",
        "kms:Decrypt"
      ],
      "Resource": [
        "arn:aws:ssm:::parameter/parameter_name",
        "arn:aws:secretsmanager:::secret:secret_name",
        "arn:aws:kms:::key/key_id"
      ]
    }
  ]
}

TIP: Parameter Store supports hierarchies directly and you can specify wildcards in resource ARNs, such as arn:aws:ssm:us-west-2:123456789012:parameter/prod-*.

Attaching the IAM policy to the ECS Task Execution role

3. Update task definition file

The last step is to update the task definition file for our container. After specifying the secrets to be injected (using one or more of the three available options described above), we then set the ecsTaskExecutionRole parameter to the ARN of the ECS Task Execution role you configured.

Note: If you specify secrets injection in your task definition, but leave ecsTaskExecutionRole unspecified, you will get an error when trying to save the the task definition.

Updating the task definition file

After updating the task definition, deploy it as a new task revision. After deployment is complete, your containers will now have the specified sensitive data injected into them, securely, using Parameter Store and/or Secrets Manager. To verify, you can SSH into a running container and view its environment variables (env command).

Secrets Management for Cloud-Native Applications

Chris Hickman — Tue, 31 Dec 2019 02:15:07 GMT

Psst... Can you keep a secret?

Applications frequently need access to sensitive data, such as database credentials, API keys, passwords and tokens.

Of course, we can't just store these secrets in plain text or hard coded into our applications. Rather, we need to securely protect this sensitive information to ensure that only those with a "need to know" basis can access it.

In this post, I outline various approaches to secrets management and explore some of the most popular solutions out there. We'll finish with some guidance on how you can choose the right solution for your particular application.

Before discussing secrets management, let's frame the problem with some guidelines to follow when dealing with sensitive data.

Secrets management guidelines

Secrets should be encrypted at rest and use multiple encryption keys to limit blast radius.
Secrets should be rotated often.
You should automate the creation of secrets using strong algorithms.
There should be support for various access patterns across container environments such as development, test, and production.
There should be isolated access to secrets on a container/application level rather than at the host level.

There are many approaches to secrets management, ranging from basic roll-your-own techniques to full-blown secrets management solutions. Let's take a closer look at some of the options.

A simple, roll-your-own technique for secrets management

A simple technique for securely storing and accessing sensitive data uses a strategy of encryption at rest coupled with a secure object store to persist the encrypted cipher text.

When using AWS, we can leverage the Key Management Service (KMS) for encryption and decryption and Amazon Simple Storage Service (Amazon S3) for the object store. To implement our secrets handling, we first need to create a KMS customer master key (CMK).

Note that when using this technique, we are using two different resources (KMS CMK and S3 bucket/key), each of which can be locked down using IAM permissions. This gives us very fine-grained access control (and auditing!) of secrets, with very little effort.

Storing a secret

To store a secret:

Use KMS::Encrypt with our CMK to convert the secret to cipher text.
Store the cipher text in S3 using S3API::PutObject.

Example code (bash using AWS CLI)

#!/bin/bash

# Encrypt local file
# NOTE:  When you use the AWS CLI, the output, `CiphertextBlob` is 
#        Base64-encoded. So decode before storing (so subsequent 
#        `decrypt` call will be valid).
aws kms encrypt --key-id $KMS_KEY_ALIAS --plaintext fileb://$PUT_FILE \\
    --output text --query CiphertextBlob | base64 --decode > tmp.encrypt

# Copy encrypted file to S3
aws s3api put-object --region $AWS_REGION --bucket $BUCKET --key $KEY --body tmp.encrypt

Retrieving a secret

To retrieve a secret:

Read the cipher text from S3 using S3API::GetObject.
Decrypt the cipher text by calling KMS::Decrypt

Example code (bash using AWS CLI)

#!/bin/bash
# Retrieve encrypted file from S3
aws s3api get-object --region $AWS_REGION --bucket $BUCKET \\
    --key $KEY tmp.encrypt

# Decrypt
# NOTE:  When you use the AWS CLI, the decrypted output is 
#        Base64-encoded. So we must Base64 decode after 
#        decryption.
aws kms decrypt --ciphertext-blob fileb://tmp.encrypt --query Plaintext \\
    --output text | base64 --decode

Secrets management solutions

While you can implement your own secrets management solution, in general, you probably shouldn't. Instead, you should consider using a purpose-built secrets management solution. When you use a secrets management solution, you can expect a robust set of capabilities, such as:

Securely store and tightly control access to sensitive data
Protect data at rest and in transit
Centralized management capabilities
Secure authorization and authentication mechanisms
Integration with key management and encryption providers
Auditing
Secrets rotation and revocation

Vault

One popular solution is Hashicorp's Vault. Vault is vendor-agnostic and works well for on-premise, hybrid and enterprise environments. It also offers a multi-cloud approach. However, Vault is a self-hosted solution. You'll need to install the software on server(s) you manage and you'll be responsible for scalability and availability. When considering cost, you'll need to account for Vault licensing fees, along with the infrastructure cost to host the software.

AWS options

On the other hand, AWS offers not one, but two, managed services for secrets management.

Both Systems Manager Parameter Store and AWS Secrets Manager are managed services that allow you to securely store key/value pairs. Since they have similar functionality, it can sometimes be confusing to decide which to choose for your secrets management. Let's compare the two services.

Similarities
- Both are managed key/value store services
  - Allow you to store values under a name or key
  - Keys can have prefixes
- Both use KMS for encryption and decryption
- Both are referenceable in CloudFormation and integrated with other AWS services such as ECS
Key differences
- Types of values
  - Parameter Store allows for both encrypted and unencrypted values, whereas Secrets Manager only supports encrypted values.
- Cost
  - Parameter Store is free when using standard tier parameters (less than 4KB). Secrets Manager charges $0.40/secret per month, as well as API usage charges.
- Features
  - Secrets Manager is purpose-built for secrets only, and has several additional key features over Parameter Store, including:
    - Auto rotation of credentials, including seamless integration with RDS database services
    - Password generation (generate random secrets)
    - Secrets can be shared across accounts

Choosing a secrets management solution

Given the availability of robust, mature purpose-built secrets management solutions, it's hard to justify a do-it-yourself approach. Why spend any time or resources on "undifferentiated heavy lifting"?

So, which secrets management solution to choose?

If you want a self-hosted solution or need multi-cloud capabilities, Vault is the right choice. Otherwise, as long as you are using AWS, you should choose either Systems Manager Parameter Store or Secrets Manager or a combination of both.

Personally, because of the flexibility and cost model, I prefer to use Parameter Store for most of the sensitive data used by my applications. But for RDS database instances, I like using Secrets Manager because of its direct support for credential rotation.

Next steps

In a future post, I'll dive into the details of how to implement secrets management using Systems Manager Parameter Store for a containerized application running on Amazon Elastic Container Service (ECS).

Add Encryption to an Unencrypted RDS DB Instance

Chris Hickman — Fri, 18 Oct 2019 17:29:07 GMT

When using encryption to protect data for our cloud workloads, we have two primary scenarios to consider. Either the data is being transmitted to another recipient ("in transit"), or the data is stored ("at rest"). The protocols and algorithms used to encrypt the data will depend on which of these situations applies.

For example, consider a microservice that exposes a RESTful API for accessing data stored in a relational database. To protect the data while in transit, we use Transport Layer Security (TLS) for the API calls between clients and the microservice. But what about protecting the data when it is stored by the database?

With Amazon Relational Database Service (RDS), you can encrypt your data at rest by literally "checking a box". By enabling the encryption option for the database instance, RDS handles decryption of the data transparently, with minimal impact on performance.

However, you can only enable encryption when you create the database instance. Another limitation is that you cannot restore an unencrypted backup or snapshot to an encrypted database instance. What if you have an existing RDS instance that was created without the encryption option enabled? How can you update your database to have encryption at rest?

Luckily, there is an easy workaround to accomplish this. We can take advantage of the RDS feature that allows you to add encryption when making a copy of a snapshot.

To add encryption to an unencrypted RDS instance, perform the following 3 steps.

Step 1: Take a snapshot of the existing unencrypted database instance.

From the RDS Console, navigate to the database instance, and then choose "Actions->Take snapshot".

Step 2: Create a copy of the snapshot, enabling the encryption option.

Navigate to the list of snapshots, select the snapshot you just created, then choose "Actions->Copy snapshot". In the Encryption section, choose "Enable encryption" and then select the master key to be used. You can use either the default encryption key for Amazon RDS for your AWS account or you can opt for a specific KMS customer master key (CMK).

Step 3: Restore the encrypted snapshot to a new database instance.

From the list of snapshots, select the new encrypted snapshot, then choose "Actions->Restore Snapshot". Specify the details for the new instance, then click the "Restore DB Instance" button.

After the new database instance is available, you can then update your microservice to use the new RDS endpoint and then delete the original RDS instance.

Voilà! Your microservice now has full encryption, supporting both in transit and at rest encryption.

Introducing Mobycast

Chris Hickman — Thu, 23 Aug 2018 04:31:38 GMT

Over the past several months, I have been busy helping to create a new podcast, called Mobycast, with my colleagues, Jon Christensen and Rich Staats. This podcast is focused on topics related to modern distributed systems software development, with a focus on containerization, cloud and the CI/CD pipeline. It's based on our own real-world experience developing, deploying and maintaining cloud-first software.

A new episode is posted weekly, and we now have almost 25 episodes available to download.

If you get the chance, I would appreciate you listening to one or more episodes and letting me know what you think. I've never done a podcast before, so this is a journey of experimentation and learning.

The first few episodes are longer (about 1 hour) and a bit rougher around the edges as we work to find our voice. But with each new episode, I think we are improving and becoming more successful at providing relevant, useful information in an entertaining format. Also, we have transitioned to a shorter format (20-30 minutes) with more recent episodes. I would love to hear your thoughts on which length you prefer.

You can download and listen to Mobycast from wherever you get your podcasts, such as Apple iTunes, Google Play, SoundCloud and Stitcher.

p.s. Why the name, "Mobycast", you ask? Well, the original intent for the podcast was to help make containerization topics easier for folks to digest ("we make the hard things easier"). And since Docker is the de facto king of containers, we thought it only fitting to acknowledge the official Docker mascot, Moby the Whale, in the name of the podcast.

Moby the Whale flying high above the crowd at DockerCon 2018

Do These 7 Things to Become a Great Software Developer

Chris Hickman — Thu, 04 Jan 2018 23:57:24 GMT

During my career, whether as a founder, manager, interviewer or fellow developer, I have had many opportunities to work with software developers as they begin their careers. Often, these aspiring developers ask me for advice on how to become a better software developer. Here are some of the key principles that I find myself repeating time and time again.

1. Practice, practice, practice

The only way to become a great software developer is by writing code. Lots of it.

While you can certainly learn about software development by reading books and articles and taking classes, there is no substitute for writing code. No classroom or book can provide the learning and growth that comes with writing, testing and debugging your own code.

I see many developers building software mostly by copying and pasting code found online (such as from Stack Overflow). They build software as if it were a patchwork quilt, stitching together patches of code here and there to implement a solution, without actually writing much code themselves. I don't think you can ever become a good (let alone great) developer with this workflow. You simply must write the code yourself, and do it over, and over and over.

I encourage you to have your own personal projects to accelerate your programming experience. It doesn't really matter what software you decide to build. Just pick something interesting, and develop the software. When you're done, repeat the process, again and again.

2. Persistence

As with most challenging topics, mastery of software development does not come easy. Repeatedly, you will be faced with problems that you aren't sure how to solve. Have faith in yourself and your problem-solving skills.

Give yourself completely to the craft of software development. It will be hard. You will struggle. You need those experiences where you spend two days straight chasing down a difficult bug, where you become so frustrated that you want to throw your keyboard through the window. Don't give up. Stick with it until you find the answer.

By persevering through these difficult problems, your software development skills will improve by orders of magnitude.

Persistence is key here. Great software developers battle through difficult challenges and "ship" code. If you give up on a project once it gets difficult, you will never become a great software developer. Without question, finishing matters.

3. Value understanding above correctness

Frequently, I see junior developers (and sometimes even senior developers) try to fix code bugs by making random tweaks (usually via Stack Overflow snippets) until they get a working piece of software. This type of approach squanders any opportunity for learning and growth. You may arrive at a solution, but not be entirely sure why it works. The biggest problem with this approach is the simple fact that there is no understanding (and hence growth) for the developer.

Don’t let anything about your software be a mystery. Make sure you understand what every line of your code is doing.

This may slow you down in the short-term, but will pay off immensely in the future. Each time you chase down one of these mysteries, you are making a deposit to your software development account. By taking the time to fully understand the code and how it works, you are dramatically improving your software development abilities.

4. Review great code written by others

One of the best ways to learn a craft is by studying the works of experts. Just as artists, architects, and others study the masters of their fields to improve their own work, software developers should read and analyze code written by masterful software developers.

Seek out the work created by developers or projects you admire. For example, identify an open source project written in your preferred language that is well respected by the community. Clone the repo, and study the code. Some aspects you should pay especially close attention to include:
- Is there a consistent coding style?
- How readable is the code? How quickly are you able to understand what the software is doing?
- What documentation is provided with the code?
- What types of tests are included? Are they manual or automated?

When studying the work of great software developers, don't just read the software, but strive to truly understand how it works. Doing so will accelerate your growth as a software developer.

5. Details matter

Imagine a skyscraper built by a team with little attention to detail. The team focuses on simply completing the building, without much care about the building details. They might use multiple types and sizes of brick for the facing. Perhaps windows are of various sizes. And maybe they install doors on only half the rooms. What would you think of such a building? You'd probably immediately judge the building to be of low quality (and perhaps not even safe). Attention to detail and strictly adhering to a building's specification is crucial for delivering a high-quality skyscraper.

Just as building a safe and effective skyscraper requires a keen eye for details, so does writing great software.

Great software is consistent through the enforcement of a coding standard. Each file of the project has the same look and feel. Even though many developers may have contributed to the code, the reader is unaware of this because of the uniform consistency across the project. Given that it is easy to enforce this consistency via automatic linting, there is no excuse for writing inconsistent code.

Great software is organized. Just from glancing at the directory structure, the reader should be able to intuit exactly where to find the various components of the software project, such as configuration, tests, database models, and endpoint implementations.

Great software doesn't have any unused code or files. Including unused code or files in a project increases the contextual load for anyone reviewing the project. Don't make others spend time trying to understand code only to discover that the code isn't actually used. Unused code or files are weeds in your software garden. Tend to your software by ruthlessly yanking out unused code and files.

6. Optimize for clarity

Earlier in my software development career, I took great pride in writing highly performant code. I used techniques such as unrolling loops, lookup tables and memory mapping to build the fastest software possible. While the software may have been performant, it was also very complicated, making it difficult to understand, troubleshoot and maintain.

Great software prioritizes readability and maintainability. Others will need to update and maintain your code. Keep the software as simple as possible to help reduce the contextual load associated with understanding your software. Perhaps a good measure of simplicity is how many comments are necessary to document the code. I believe that the best software requires no inline comments. Because the software is clear, concise and simple, it is self-documenting.

Be careful about premature optimization. Optimization that compromises simplicity should be carefully considered. Performance is most greatly affected by the software’s architecture, design and algorithms, and not by individual lines of code.

Great software avoids using idioms, especially those which are complicated or obscure. It is tempting to include tricky one-liners when writing code, but that's not great software. Great software doesn't force the reader to spend an inordinate amount of time trying to understand it. Keep it simple.

When you write clear and concise software and prioritize readability and maintainability, your teammates will thank you. Those who inherit your codebase will thank you. And you yourself will be grateful when you need to make a change to software you wrote six months ago (or even three weeks ago).

7. Testing

The best software developers take complete accountability for testing their code. They don't expect others to find their bugs. In fact, when bugs escape their detection and are found by others, they are disappointed.

Be relentless with your testing. Great software includes a collection of both unit and integration tests, with significant code coverage.

While the primary reason for developing test cases is to validate new code additions, test cases also serve as an investment, which over time, will produce substantial dividends. As a software codebase grows, having a robust suite of test cases gives confidence when making changes that there are no unexpected side effects.

By emphasizing testing as an essential component of software development, you will be forced to understand your code more fully and anticipate edge cases. Good software works under normal conditions, but great software works under unexpected conditions as well.

Roll up your sleeves and get to work

The path to becoming a great software developer is not easy, but it is also not a mystery.

Be diligent and persistent. Seek understanding and clarity. Study great code written by others. Take pride in the code you write. Pay attention to details. Test your code.

As with any goal worth attaining, you must put in the effort. If you commit yourself to these principles, you will become a better software developer.

Now go write some code!

Signature errors when uploading files to S3

Chris Hickman — Mon, 05 Dec 2016 02:07:16 GMT

I have an application that is written in Python and uses the Boto library for making calls to the AWS S3 API in support of a file upload feature. Since being deployed, the code has handled many, many file uploads flawlessly. However, recently I encounted a bizarre bug where a single file would simply not upload to S3 correctly.

When trying to upload this file using the S3 put_object operation, AWS returned the following error:

An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: The request signature we calculated does not match the signature you provided. Check your key and signing method.

When uploading to S3, my code sets request header values for both Content-Type and Content-Disposition. The value set for Content-Disposition contains, among other information, the original filename for the file being uploaded.

headers = {
    'ContentDisposition': 'attachment;filename="{}"'.format(filename)
}

A clue as to why this upload was failing was that this particular file had multiple consecutive space characters in the filename.

When making a request to S3, the request is first signed with a signature. Among other things, the signature is based on a calculation that includes the request header data.

When a request is received by S3, it also calculates a signature based upon the request data and compares this calculation to the value calculated during the initial request. If the values do not match, we get a SignatureDoesNotMatch error.

It turns out, signature calculations are performed differently by AWS and Boto. When Boto makes its calculation, it does so without any manipulation of the request header data. But with AWS, it will fold consecutive spaces into a single space before making its calculation. Since the signature calculations are performed differently, the result is a SignatureDoesNotMatch error for our file with multiple consecutive spaces in the filename.

To fix this problem, we just need to make sure that signature calculations are consistent between Boto and AWS. When setting the Content-Disposition request header using the filename, a simple regular expression is used to fold runs of multiple spaces into a single space character.

#  Replace runs of consecutive whitespace with single space
filename = re.sub('\s+', ' ', filename).strip()

Once this change was made, the S3 put_object operation succeeded for this troublesome file.

Docker clock drift on MacBooks

Chris Hickman — Sat, 26 Nov 2016 23:47:11 GMT

Like many other developer teams, we have standardized on using MacBook Pros for doing our development work. And we use Docker to containerize our (micro)services, making it easy for any team member to run our services locally with the same setup as production.

However, using Docker on MacBook Pros is not without its issues. Once such hiccup is dealing with host VM clock drift. This is where the Docker host VM internal clock gets out of sync with the actual system time.

It appears the primary reason for clock drift with the Docker host VM is due to hibernate cycles. When your MacBook comes out of sleep mode, if the system does not have access to an NTP server, the Docker host VM clock may get considerably out of sync.

This clock skew can cause various problems. For us, the tell-tale sign of clock drift is when AWS API calls start failing due to expired signatures (if the clock drift is greater than 5 minutes from actual time, the AWS signature will be deemed invalid). Here's an example error we might see in our logs (generated by Python code using Boto for making AWS calls):

botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the Decrypt operation: Signature expired: 20160406T191109Z is now earlier than 20160406T191613Z (20160406T192113Z - 5 min.)

When you discover that you have clock drift with your host VM, you can fix this problem by simply forcing a clock sync for your host VM. Setting the clock in any container sets it for the underlying VM. So once you force the clock reset, all containers will see the new time and you should be back to regular operation.

How to force the clock sync depends on whether you are using Docker for Mac or docker-toolbox for your host VM.

Docker for Mac

With Docker for Mac, the Docker engine is running in an Alpine Linux distribution on top of an xhyve Virtual Machine. The VM clock can be manually reset by running the following command:

$ docker run --rm --privileged alpine hwclock -s

docker-toolbox

With docker-machine, you can force a resync by restarting the Docker host VM. However, this can be painfully slow.

A much quicker and easier technique is to shell into the Docker host VM and use ntpclient to force a clock sync. Just run the following command from an OS X terminal:

$ docker-machine ssh dev \
  "date; sudo ntpclient -s -h time.nist.gov; date"

Simple, quick and easy!

Easily switch between multiple versions of Node.js

Chris Hickman — Mon, 10 Oct 2016 00:44:11 GMT

For a while, Node.js was relatively static (and serialized) with its versioning. For example, Node.js remained on version 0.10.X for about 18 months. There wasn't much need to have multiple versions of Node.js installed on the same machine, so installing and maintaining Node.js with your favorite package manager - like homebrew - worked quite well.

However, once Node.js moved beyond 0.10 to 0.12, things got quite a bit more complicated. With version 0.12 of Node.js, there were several breaking changes that made upgrading from 0.10 to 0.12 non-trivial.

Then the io.js group formed, spurring even more rapid updates and innovation. Soon new versions of Node.js came faster than ever before. Now, we find that the latest stable version of Node has progressed through several major versions to 6.7.0.

If you are like me and have been using Node.js for more than a couple of years, you likely have a significant amount of code written for older versions of Node.js. I find myself needing to run older versions of Node.js for legacy compatibility, while using newer versions of Node.js for new development projects. So I now have a very real need for being able to quickly switch between multiple versions of Node.js. And that's where package managers like homebrew don't quite fit the bill.

Enter nvm (Node Version Manager). nvm was built expressly for this use case. It allows you to quickly and easily switch among multiple versions of Node.js. Installation and configuration is very simple, and it simply just works. I highly recommend you give it a try.

Initial commit

Chris Hickman — Mon, 26 Sep 2016 04:16:49 GMT

Just as every repo must begin with an initial commit, I suppose so too must a blog. Consider this post my "initial commit".

My intent for this blog is to cover all issues related to building not just software, but building companies as well (I've built or helped to build 3 companies so far). I've learned quite a few lessons along the way, and hopefully you'll find what I have to share both interesting and useful. (You can find out more about me here).

Initially, most of my posts will be focused around the work I've been doing most recently - such as helping to build a platform that provides natural language access to non-natural services and devices. Among other things, I'll be talking about some of my favorite technologies and principles, such as: microservices, event driven design, cloud technologies, dev ops, RESTful APIs, Node.js and JavaScript.

Periodically, I'll also write about some of my experiences gained while building two of my previous companies: one funded with $24 million in venture capital, the other bootstrapped with a $75k SBA loan. As different as those two companies were, there was also a lot they shared in common. Both taught me some incredibly valuable lessons, and I look forward to sharing and reflecting upon them.

Thank you for stopping by and taking the time to read my "initial commit”. Onward!