Terraform your physical network with YANG

Every time when I get bored from my day job I tend to find some small interesting project that I can do that can give me an instant sense of accomplishment and as the result lift my spirits and improve motivation. So this time I remembered when someone once asked me if they could use Terraform to control their physical network devices and I had to explain how this is the wrong tool for the job. Somehow the question got stuck in my head and now it came to fruition in the form of terraform-yang.

This is a small Terraform plugin (provider) that allows users to manipulate interface-level settings of a network device. And I’m not talking about a VM in the cloud that runs network OS of your favourite vendor, this stuff is trivial and doesn’t require anything special from Terraform. I’m talking about Terraform controlling your individual physical network devices over an OpenConfig’s gNMI interface with standard Create/Read/Update/Delete operations exposed all the way to Terraform’s playbooks (or whatever they are called). Network Infrastructure as code nirvana…

Writing a custom Terraform provider for a network device

Although this may look scary at the beginning, the process of creating your own TF provider is fairly easy. In fact a provider is nothing but a pointer to a remote API, which from the client point of view is just a URL (or a session to that URL) along with the necessary authentication credentials. TF provider simply combines all that information in a struct, which is later made available to various resource-specific API calls. For a network device with a gNMI interface, this is all the work that needs to be done to initialise the provider:

cfg := &gnmi.Config{
	Addr:     d.Get("address").(string),
	TLS:      d.Get("tls").(bool),
	Username: d.Get("username").(string),
	Password: d.Get("password").(string),
}
client, err := gnmi.Dial(cfg)

The only problem with this approach is that we have multiple devices and obviously it wouldn’t make sense to write a dedicated provider for each one. This is where Terraform aliases come to the rescue. With aliases we can define different providers that all use the same custom gNMI provider logic. This is how a provider.tf file may look like:

provider "gnmi" {
  alias    = "SW1"
  address  = "192.0.2.0:6030"
  username = "admin"
  password = "admin"
}

provider "gnmi" {
  alias    = "SW2"
  address  = "192.0.2.1:6030"
  username = "admin"
  password = "admin"
}

Writing a resource for an interface

Most of the work and logic goes into resources. Each resource represents an object hosted by a provider, that can be manipulated, i.e. created, updated and deleted. For public clouds, this could be a VM, a disk or a security group. For my little experiment, I’ve picked the simplest (and most common) configuration object that exists on a network device - an interface. I didn’t have time to boil the ocean so I decided to expose only a subset of interface-level settings:

  • description
  • switchport flag
  • IPv4 Address
  • Access VLAN
  • Trunk VLANs

In order to build the structured configuration data, I’m using Go structs generated by ygot based on OpenConfig’s YANG models. A little hint for those of you who’ve read my Ansible & YANG series and know what pyangbind or YDK are: ygot to gNMI is what pyangbind/YDK is to ncclient. So to configure a new interface, I first build an empty struct skeleton with ygot, populate it with values inside resourceInterfaceCreate() and then do gnmi.Set() to send them off to the device. The logic for resource update is slightly more complicated since it should take into account mutually exclusive modes (e.g. switchport) and the behaviour when multiple conflicting arguments are defined. But ultimately you can decide how far you want to go and for a simple use case I’ve chosen, it only took me a few hours to codify the logic I wanted.

Using a gNMI interface resource

With all of the provider/resource work done, making interface changes becomes really easy. Here’s an example of two different interfaces being configured on two different devices. The provider argument points TF to one of the pre-defined aliases (i.e. network devices) and name tells it which interface to configure. The rest of the arguments should be fairly self-explanatory.

resource "gnmi_interface" "SW1_Eth1" {
    provider = "gnmi.SW1"
    name = "Ethernet1"
    description = "TF_INT_ETH1"
    switchport = false
    ipv4_address = "12.12.12.1/24"
}
resource "gnmi_interface" "SW2_Eth1" {
    provider = "gnmi.SW1"
    name = "Ethernet1"
    description = "TF_INT_ETH1"
    switchport = true
    trunk_vlans = [100, 200]
}

Surprises and Gotchas

While writing this plugin I’ve stumbled across several interesting and what I thought were surprising issues with gNMI and OpenConfig models in general.

Firstly, because the gNMI spec is in a constant state of flux, the official tools may not work with your device out of the box. There may be slightly different implementations of gNMI/gRPC clients, which obviously make it difficult to operate in a multivendor environment.

Second, I was surprised to discover that a lot of structured data is still encoded as JSON. This JSON is serialised into a string and later encoded as protobuf as it gets sent to the device but still, my naive assumption was that protobuf was used for everything.

Third, there are still a lot of vendor augments to standard openconfig models, which results in a vendor-specific ygot code. This feels almost like we’ve gone back to automating vendor-specific CLIs with all their quirks and corner cases.

Fourth, there’s still a lot of YANG<->CLI translation going on under the hood, especially for the configuration part (less for telemetry), so always expect the unexpected.

Finally, I was initially bemused by the gNMI message format. I didn’t understand why I can have multiple updates in a single notification message and what’s the purpose of duplicates. Until I realised that one of the primary use cases for gNMI was streaming telemetry and the protocol format was designed to work for both that and configuration updates. Some of these and other protocol-specific things still don’t make a lot of sense to me, and the GNMI specification doesn’t do a very good job explaining why (not sure if it’s even supposed to).

But as I’ve said multiple times before, just having the gNMI support that we have today, is way, way much better than not having it and having to rely on vendor-specific CLIs.

Outro

I always liked writing plugins. They may look like some serious piece of software but in reality, they’re just a bunch of for loops and conditionals, so writing them is really easy. Not only do you get all of the boilerplate code that exposes all the bells and whistles you might need, but you also have tons of production-grade examples of how to write this kind of stuff available on Github. So don’t treat terraform-yang as a serious project, this was just a proof-of-concept and a learning exercise. I’m not convinced this is the right way to configure your network, although I feel the same way about most of the other popular automation tools out there.

Related