Configuring cloud infrastructure can be painful. Especially, if you’re doing it by hand. Especially, if you’re doing it frequently and at scale.

It can take a long time to set up the required resources in the cloud, even for a medium-sized project. And you need to be really careful, otherwise, you risk setting everything up just to realize, that you’ve defined an incorrect CIDR block for a virtual network, or made a typo in a name of a resource. And now you have to start all over again because these settings cannot be changed. Ouch.

There should be a better way, than doing it manually, and this is where automation and code come into play. We can create scripts to set up our cloud infrastructure for us. This approach is called Infrastructure as Code, or IaC.

Benefits of IaC

Programming our infrastructure provisioning sounds complex. Why bother? For one, deploying your infrastructure automatically is much faster. This means that you can quickly make changes to your configuration or even recreate it completely as you need. And you can use it to quickly provision multiple environments, e.g. development, test, and production.

But this doesn’t end here. When your infrastructure is code, we can treat it as code. This means: check it into version control, perform code reviews, static code analysis, and set up up CI/CD pipelines. This opens up new capabilities in our infrastructure setup.

Approaches to Automation

There are multiple ways you can do IaC and different tools available, but you can roughly classify them using two criteria.

  1. A tool can be either specific to a particular cloud provider, or cloud-agnostic. Each major cloud provider comes with its own set of tools to deploy infrastructure, however, some third-party products offer a single solution that spans multiple cloud providers. While cloud-agnostic tools give you the obvious benefit of using a single tool for multiple clouds, they may fall behind feature-wise compared to the native tools, since they are implemented by third parties.
  2. A tool can have either a declarative or an imperative API. Just as with other programming languages, we have tools that implement the imperative paradigm, which is convenient for scripting, and declarative, which allow defining the desired state and let the tool do the rest. Declarative solutions are often easier to use since you don’t have to worry about things like provisioning order, dependencies, error handlings, and other usual scripting concerns. However, it can be tricky to add special logic to declarative solutions, for example creating a resource based on a condition, or in a loop.
Cloud-specific Cloud-agnostic
Declarative Azure ARM Templates, AWS CloudFormation, Google Cloud Deployment Manager Terraform, Pulumi
Imperative Azure: CLI, PowerShell, SDK; AWS: CLI, CDK; Google Cloud: Console, Client Libraries ?

For most cases I’ve seen so far, declarative tools work best. When using the imperative approach, you always need to consider the order in which your resources must be provisioned, handle potential race conditions and eventual consistency, handle errors, when your script fails halfway, etc. And once you get the initial setup working, you need to think about how to make changes to already existing infrastructure, since most of these tools have separate APIs for creating and updating resources. With declarative tools, it’s simpler. You just list the resources you need and it’s up for the tool to make it happen. And when you want to make a change, you just update your definition, and the tool decides, how to reconcile the actual and desired state.

The choice between a native or a cloud-agnostic tool can be more difficult. If you have a multi-cloud or poly cloud strategy, it’s a no-brainer, go for a tool that will work everywhere. But if you focus on a single cloud provider, teams might be hesitant to add another third-party tool to their arsenal. In such a case, you’ll need to do your own feature comparison and see if it’s worth it to you. I’ve, personally, came to the conclusion that Terraform’s plan feature and great developer experience trumps having to learn a new tool. But from time to time you encounter a situation when a particular feature is not implemented, and you need to look for a workaround.

If you prefer to use your favorite programming language, then take a look a Pulumi. It’s a newer kid on the block but allows you to code in five different languages. And although Pulumi uses general-purposes languages for infrastructure definition, it still creates a declarative desired state model when you run it. Similar to how an ORM creates an SQL statement.

Look around your environment, see what is most valuable to you, and choose the tool, that contributes to this value.

Challenges

Doing IaC is quite the paradigm shift and comes with its own set of challenges. You’ll need to learn a new tool and adopt a new change management process for your infrastructure. Once you start your journey, there will be several tricky questions that you’ll need to answer at some point.

Taking the First Steps

To emphasize once again, IaC is a radically different approach to managing infrastructure, so start slow and safe. Look at the tools available on the market and identify the features that you need most, whether this is safety, complete resource coverage, your favorite language support, or something else. With a tool in mind, find a smaller project you can experiment on and give it a go there. Don’t push it onto your business-critical application just yet. Once you start to feel comfortable with the tool, shift your focus on the deployment process and start working on an automated pipeline and quality gates. And don’t forget to share your progress with the rest of the organization to get them on board as well.