Several months ago, I embarked on a journey to accomplish a captivating project: investigating how Terraform templates can be parsed and specifically focusing on parsing Terraform variable types.
My goal was to build a solution that could parse any given Terraform template, including the variable types, and generate an output for a frontend app to dynamically generate forms.
Several challenges arose as soon as I started the project. Firstly, I had to find a solution to generate frontend forms dynamically. The solution I settled on was using jsonform, a frontend library that generates forms from JSON schema.
Now that I knew the frontend required a JSON schema to function properly, the second challenge was to create a solution to convert Terraform templates to JSON schema. In my case, I only needed to parse the variables.tf
file.
While there are numerous tools available that parse Terraform templates, all of them leave the variable types untouched.
To illustrate the pain point I faced, here's the output of a variable block parsed using python-hcl2:
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import hcl2
In [2]: input = """
...: variable "project_metadata" {
...: description = "Project metadata"
...: type = object({
...: name = string,
...: id = string,
...: tags = list(string)
...: })
...:
...: validation {
...: condition = substr(var.project_metadata.name, 0, 2) == "p-"
...: error_message = "Project name must be prefixed with 'p-'"
...: }
...: }"""
In [3]: output = hcl2.loads(input)
In [4]: output
Out[4]:
{'variable': [{'project_metadata': {'description': 'Project metadata',
'type': "${object({'name': '${string}', 'id': '${string}', 'tags': '${list(string)}'})}",
'validation': [{'condition': '${substr(var.project_metadata.name, 0, 2) == "p-"}',
'error_message': "Project name must be prefixed with 'p-'"}]}}]}
In [5]: output['variable'][0]["project_metadata"]["type"]
Out[5]: "${object({'name': '${string}', 'id': '${string}', 'tags': '${list(string)}'})}" # Remains a string
As you can see in the last output, the variable type remains a string. Consequently, I had to find a way to further process the output and convert it to a JSON schema.
Despite the tiny setback, I was excited that I encountered the challenge. I recalled subjects I had studied back in school that discussed the formal specifications of programming languages, coupled with a book I read not long ago, which also delved into the same subject. The first idea that came up was to create a custom parser.
To build a custom parser, I had to first define the grammar of the language. A quick search brought me to the lark-parser, a library that allows us to create custom parsers in Python and other languages. Their getting started guide is well-written and easy to follow.
The fun part was defining an EBNF grammar. The grammar in Lark is based on EBNF but with several enhancements that make defining grammar easier.
Fortunately, the Terraform variable type itself is relatively simple. After several iterations, I was able to come up with the following grammar that parses all the complex variable types I had.
from lark import Lark
type_parser = Lark(r"""
?type: "any" -> any
| "string" -> string
| "number" -> number
| "bool" -> bool
| "object({" [keyval (keyval)*] "})" -> object
| "list(" [type] ")" -> list
| "set(" [type] ")" -> set
| "map(" [type] ")" -> map
| "tuple(" [type (type)* ] ")" -> tuple
keyval: CNAME keyval_separator type [comment]
?keyval_separator: "=" | ":"
?comment: SH_COMMENT
%import common.SH_COMMENT
%import common.CNAME
%import common.WS
%ignore WS
""", start='type')
Defining a grammar for parsing variable type was just the first part. With the defined grammar, variable types are parsed into a parsed tree. A further step was necessary to convert the parsed tree to a JSON schema.
With Lark, we can define transformers to convert a parsed tree back to a text form. The first experiment I did was to convert the parsed tree back to its original form, and it worked well. After that, I had to convert it to a JSON schema. Before I could do that, I had to first map Terraform variable types to JSON schema types. Primitive types such as string, boolean, and number are straightforward, as their equivalent counterpart exists in JSON schema. However, some complex types, including maps and objects, require more thought.
Eventually, I created the following mapping, where on the left is the Terraform variable type, and on the right is its corresponding JSON schema type:
Let's revisit the Terraform variable block that I was trying to parse:
variable "project_metadata" {
description = "Project metadata"
type = object({
name = string,
id = string,
tags = list(string)
})
validation {
condition = substr(var.project_metadata.name, 0, 2) == "p-"
error_message = "Project name must be prefixed with 'p-'"
}
}
With the custom parser and transformer I developed, I was able to reliably generate the JSON schema required by the frontend to dynamically create forms.
{
"title": "",
"type": "object",
"additionalProperties": false,
"properties": {
"project_metadata": {
"description": "Project metadata",
"validation": [
{
"condition": "${substr(var.project_metadata.name, 0, 2) == \"p-\"}",
"error_message": "Project name must be prefixed with 'p-'"
}
],
"type": "object",
"properties": {
"name": {
"type": "string"
},
"id": {
"type": "string"
},
"tags": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
}
}
The parser was eventually packaged as a Python library and distributed internally. Backend APIs were also built to interact with the front end.
In conclusion, the development of this custom parser and transformer has proven to be a rewarding journey personally. If you have any feedback, questions, or thoughts, please feel free to reach out. Thank you for making it this far! =)
Also published here.