Terraform: Skipping Buggy Provider Version

PROBLEM

Given the following required_providers block…

terraform {
  required_providers {
    google = "~> 3.8"
  }
}

… it will allow the following Google provider version: >= 3.8, < 4.0.

As of today (May 10), the latest Google provider is 3.20.0. A quick terraform init confirms that.

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google" (hashicorp/google) 3.20.0...

However, sometimes, there’s a need to skip a buggy version. For example, 3.20.0 breaks google_compute_firewall.

SOLUTION

To achieve that, we can do the following…

terraform {
  required_providers {
    google = "~> 3.8, != 3.20.0"
  }
}

To confirm this works, after deleting .terraform/ dir, terraform init now shows the following result…

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "google" (hashicorp/google) 3.19.0...

GCP + Terraform: Running Terraform Commands with a Service Account

PROBLEM

When running these commands…

gcloud auth login
gcloud auth application-default login

… it allows terraform apply to provision the infrastructure using your credential.

However, sometimes there’s a need to run Terraform using a service account.

SOLUTION

First, identify the service account you want to use… for example: my-service-account@my-project.iam.gserviceaccount.com.

Then, create and download the private key for the service account.

Command:

gcloud iam service-accounts keys create --iam-account my-service-account@my-project.iam.gserviceaccount.com  key.json              

Output:

created key [xxxxxxxx] of type [json] as [key.json] for [my-service-account@my-project.iam.gserviceaccount.com]

With this service account’s private key, we can now authorize its access to GCP.

Command:

gcloud auth activate-service-account --key-file key.json  

Output:

Activated service account credentials for: [my-service-account@my-project.iam.gserviceaccount.com]

You can verify whether the right account is being used or not.

Command:

gcloud auth list

Output:

                      Credentialed Accounts
ACTIVE  ACCOUNT
*       my-service-account@my-project.iam.gserviceaccount.com
        user@myshittycode.com

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

In this case, the * marks the active account being used.

Now, you can run terraform apply to provision the infrastructure using the selected service account.

GCP + Kitchen Terraform: Local Development Workflow

INTRODUCTION

Here’s a typical workflow for implementing and running Kitchen Terraform tests outside of the GCP environment, for example, from an IDE on a Mac laptop.

Enable “gcloud” Access

Command:

gcloud auth login

The first step is to ensure we can interact with GCP using the gcloud command using our user credential. This is needed because the tests use the gcloud commands to retrieve GCP resource information in order to do the assertions.

Enable SDK Access

Command:

gcloud auth application-default login

This ensures our Terraform code can run the GCP SDK successfully without a service account. Instead, it will use our user credential.

Without this command, we may get the following error when running the Terraform code:

Response: {
 "error": "invalid_grant",
 "error_description": "reauth related error (invalid_rapt)",
 "error_subtype": "invalid_rapt"
}

Display All Kitchen Test Suites

Command:

bundle exec kitchen list    

This command displays a list of Kitchen test suites defined in kitchen.yml.

The output looks something like this:

Instance                            Driver     Provisioner  Verifier   Transport  Last Action    Last Error
router-all-subnets-ip-ranges-local  Terraform  Terraform    Terraform  Ssh          
router-interface-local              Terraform  Terraform    Terraform  Ssh          
router-no-bgp-no-nat-local          Terraform  Terraform    Terraform  Ssh          
router-with-bgp-local               Terraform  Terraform    Terraform  Ssh          
router-with-nat-local               Terraform  Terraform    Terraform  Ssh          

Run a Specific Test Suite

Command:

bundle exec kitchen test [INSTANCE_NAME]    

# For example:-
bundle exec kitchen test router-with-nat-local

This command allows us to run a specific test suite. This will handle the entire Terraform lifecycle… ie: setting up the infrastructure, running the tests and destroying the infrastructure.

This is helpful especially when we need to run just the test suite that is currently under development. This way, it runs faster because we don’t have to provision/deprovision the cloud infrastructure for other test suites. At the same time, we will also reduce the incurred cost.

Run a Specific Test Suite with Finer Controls

There are times where running bundle exec kitchen test [INSTANCE_NAME] is still very time consuming and expensive, especially when we try to debug any failed assertions or add a little assertions at a time.

To provision the infrastructure once, run the following command:

bundle exec kitchen converge [INSTANCE_NAME]    

# For example:-
bundle exec kitchen converge router-with-nat-local

To run the assertions, run the following command as many times as possible until all the assertions are implemented successfully:

bundle exec kitchen verify [INSTANCE_NAME]    

# For example:-
bundle exec kitchen verify router-with-nat-local

Finally, once the test suite is implemented properly, we can now deprovision the infrastructure:

bundle exec kitchen destroy [INSTANCE_NAME]    

# For example:-
bundle exec kitchen destroy router-with-nat-local

Terragrunt: “plan-all” while Passing Outputs between Modules

PROBLEM

Terragrunt has a feature that allows one module to pass outputs to another module.

For example, if “project-prod” module wants to consume “subfolders” output from “folder” module, it can be done like this in “project-prod” module’s terragrunt.hcl:-

include {
    path = find_in_parent_folders()
}

dependency "folder" {
    config_path = "../folder"
}

inputs = {
    env_folders = dependency.folder.outputs.subfolders
}

The challenge is when running commands such as plan-all, it will fail with the following error:-

Cannot process module Module [...] because one of its 
dependencies, [...], finished with an error: /my/path/folder/terragrunt.hcl 
is a dependency of /my/path/project-prod/terragrunt.hcl 
but detected no outputs. Either the target module has not 
been applied yet, or the module has no outputs. If this 
is expected, set the skip_outputs flag to true on the 
dependency block.

SOLUTION

This error occurs because the generated plan for “folder” module has not been applied yet (ie: the infrastructure does not exist), hence there are no outputs to pass to “project-prod” module to satisfy plan-all.

To fix this, mock outputs can be supplied:-

include {
    path = find_in_parent_folders()
}

dependency "folder" {
    config_path = "../folder"

    mock_outputs = {
        subfolders = {
            "dev" = {
                "id" = "temp-folder-id"
            }
            "prod" = {
                "id" = "temp-folder-id"
            }
            "uat" = {
                "id" = "temp-folder-id"
            }
        }
    }
}

inputs = {
    env_folders = dependency.folder.outputs.subfolders
}

Finally, when running apply-all, it will use the runtime outputs instead of provided mock outputs to build the rest of the infrastructure.

GCP + Terraform: “google: could not find default credentials” Error

PROBLEM

When running any Terraform commands (init, plan, etc) from a different server, the following error is thrown:-

Error: google: could not find default credentials. 
See https://developers.google.com/accounts/docs/application-default-credentials 
for more information.

  on  line 0:
  (source code not available)

SOLUTION

One recommended way is to set up a service account by following the instruction from the above link.

Another way, for developement purpose, is to install Google Cloud SDK and run the following gcloud command, which will generate an Application Default Credentials (ADC) JSON file based on your user account and store it in a location where the SDK can find it automatically:-

gcloud auth application-default login

Terraform: “Error acquiring the state lock” Error

PROBLEM

When running terraform plan, the following error is thrown:-

Acquiring state lock. This may take a few moments...

Error: Error locking state: Error acquiring the state lock: writing "gs://my/bucket/terraform.tfstate/default.tflock" failed: googleapi: Error 412: Precondition Failed, conditionNotMet
Lock Info:
  ID:        1234567890
  Path:      gs://my/bucket/folder/terraform.tfstate/default.tflock
  Operation: migration destination state
  Who:       mike@machine
  Version:   0.12.12
  Created:   2019-10-30 12:44:36.410366 +0000 UTC
  Info:      


Terraform acquires a state lock to protect the state from being written
by multiple users at the same time. Please resolve the issue above and try
again. For most commands, you can disable locking with the "-lock=false"
flag, but this is not recommended.

SOLUTION

One way is to disable locking by passing -lock=false flag.

However, if you are sure the lock isn’t properly released, to perform a force unlock, run this command:

terraform force-unlock [LOCK_ID]

In this case…

terraform force-unlock 1234567890