In this post I’ll show how to build a dockerized OpenStack and OpenContrail lab, integrate it with Juniper MX80 DC-GW and demonstrate one of Contrail’s most interesting and unique features called BGP-as-a-Service.
Continuing on the trend started in my previous post about OpenDaylight, I’ll move on to the next open-source product that uses BGP VPNs for optimal North-South traffic forwarding. OpenContrail is one of the most popular SDN solutions for OpenStack. It was one of the first hybrid SDN solutions, offering both pure overlay and overlay/underlay integration. It is the default SDN platform of choice for Mirantis Cloud Platform, it has multiple large-scale deployments in companies like Workday and AT&T. I, personally, don’t have any production experience with OpenContrail, however my impression, based on what I’ve heard and seen in the last 2-3 years that I’ve been following Telco SDN space, is that OpenContrail is the most mature SDN platform for Telco NFVs not least because of its unique feature set.
During the time of production deployment at AT&T, Contrail has added a lot of features required by Telco NFVs like QoS, VLAN trunking and BGP-as-a-service. My first acquaintance with BGPaaS took place when I started working on Telco DCs and I remember being genuinely shocked when I first saw the requirement for dynamic routing exchange with VNFs. To me this seemed to break one of the main rules of cloud networking - a VM is not to have any knowledge or interaction with the underlay. I gradually went though all stages of grief, all the way to acceptance and although it still feels “wrong” now, I can at least understand why it’s needed and what are the pros/cons of different BGPaaS solutions.
There’s a certain range of VNFs that may require to advertise a set of IP addresses into the existing VPNs inside Telco network. The most notable example is PGW inside EPC. I won’t pretend to be an expert in this field, but based on my limited understanding PGW needs to advertise IP networks into various customer VPNs, for example to connect private APNs to existing customer L3VPNs. Obviously, when this kind of network function gets virtualised, it still retains this requirement which now needs to be fulfilled by DC SDN.
This requirement catches a lot of big SDN vendors off guard and the best they come up with is connecting those VNFs, through VLANs, directly to underlay TOR switches. Although this solution is easy to implement, it has an incredible amount of drawbacks since a single VNF can now affect the stability of the whole POD or even the whole DC network. Some VNFs vendors also require BFD to monitor liveliness of those BGP sessions which, in case a L3 boundary is higher than the TOR, may create even a bigger number of issues on a POD spine.
There’s a small range of SDN platforms that run a full routing stack on each compute node (e.g. Cumulus, Calico). These solutions are the best fit for this kind of scenarios since they allow BGP sessions to be established over a single hop (VNF <-> virtual switch). However they represent a small fraction of total SDN solutions space with majority of vendors implementing a much simpler OpenFlow or XMPP-based flow push model.
OpenContrail, as far as I know, is the only SDN controller that doesn’t run a full routing stack on compute nodes but still fulfills this requirement in a very elegant way. When BGPaaS is enabled for a particular VM’s interface, controller programs vRouter to proxy BGP TCP connections coming to virtual network’s default gateway IP and forward them to the controller. This way VNF thinks it peers with a next hop IP, however all BGP state and path computations still happen on the controller.
The diagram below depicts a sample implementation of BGPaaS using OpenContrail. VNF is connected to a vRouter using a dot1Q trunk interface (to allow multiple VRFs over a single vEth link). Each VRF has its own BGPaaS session setup to advertise network ranges (NET1-3) into customer VPNs. These BGP sessions get proxied to the controller which injects those prefixes into their respective VPNs. These updates are then sent to DC gateways using either a VPNv4/6 or EVPN and the traffic is forwarded through DC underlay with VPN segregation preserved by either an MPLS tag (for MPLSoGRE or MPLSoUDL encapsulation) or a VXLAN VNI.
Now let me briefly go over the lab that I’ve built to showcase the BGPaaS and DC-GW integration features.
Lab setup overview
OpenContrail follows a familiar pattern of DC SDN architecture with central controller orchestrating the work of multiple virtual switches. In case of OpenContrail, these switches are called vRouters and they communicate with controller using XMPP-based extension of BGP as described in this RFC draft. A very detailed description of its internal architecture is available on OpenContrail’s website so it would be pointless to repeat all of this information here. That’s why I’ll concentrate on how to get things done rather then on the architectural aspects. However to get things started, I always like to have a clear picture of what I’m trying to achieve. The below diagram depicts a high-level architecture of my lab setup. Although OpenContrail supports BGP VPNv4/6 with multiple dataplane encapsulations, in this post I’ll use EVPN as the only control plane protocol to communicate with MX80 and use VXLAN encapsulation in the dataplane.
EVPN as a DC-GW integration protocol is relatively new to OpenContrail and comes with a few limitations. One of them is the absence of EVPN type-5 routes, which means I can’t use it in the same way I did in OpenDaylight’s case. Instead I’ll demonstrate a DC-GW IRB scenario, which extends the existing virtual network to a DC-GW and makes IRB/SVI interface on that DC-GW act as a default gateway for this network. This is a very common scenario for L2 DCI and active-active DC deployment models. To demonstrate this scenario I’m going to setup a single OpenStack virtual network with a couple of VMs whose gateway will reside on MX80. Since I only have a single OpenStack instance and a single MX80, I’ll setup one half of L2 DCI and setup a mutual redistribution to make our overlay network reachable from MX80’s global routing table.
All-in-one VM setup
OpenContrail’s kolla github page contains a set of instructions to setup the environment. As usual, I have automated all of these steps which can be setup from a hypervisor with the following commands:
1 2 3
Once installation is complete and all docker containers are up and running, we can setup the OpenStack side of our test environment. The script below will do the following:
- Download cirros and CumulusVX images and upload them to Glance
- Create a virtual network
- Update security rules to allow inbound ICMP and SSH connections
- Create a pair of VMs - one based on cirros and one based on CumulusVX image
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
The only thing worth noting in the above script is that a default gateway
10.0.100.161 gets overridden by a default host route pointing to
10.0.100.190. Normally, to demonstrate DC-GW IRB scenario, I would have setup a gateway-less L2 only subnet, however in that case I wouldn’t have been able to demonstrate BGPaaS on the same network, since this feature relies on having a gateway IP setup (which later acts as a BGP session termination endpoint). So instead of setting up two separate networks I’ve decided to implement this hack to minimise the required configuration.
EVPN integration with MX80
DC-GW integration procedure is very simple and requires only a few simple steps:
- Make sure VXLAN VNI is matched on both ends
- Configure import/export route targets
- Setup BGP peering with DC-GW
All of these steps can be done very easily through OpenContrail’s GUI. However as I’ve mentioned before, I always prefer to use API when I have a chance and in this case I even have a python library for OpenContrail’s REST API available on Juniper’s github page, which I’m going to use below to implement the above three steps.
Before we can begin working with OpenContrail’s API, we need to authenticate with the controller and get a REST API connection handler.
1 2 3 4 5 6 7 8 9 10 11
The first thing I’m going to do is override the default VNI setup by OpenContrail for
irb-net to a pre-defined value of
5001. To do that I first need to get a handler for
irb-net object and extract the
virtual_network_properties object containing a
vxlan_network_identifier property. Once it’s overridden, I just need to update the parent
irb-net object to apply the change to the running configuration on the controller.
1 2 3 4 5
The next thing I need to do is explicitly set the import/export route-target properties for the
irb-net object. This will require a new
RouteTargetList object which then gets referenced by a
route_target_list property of the
1 2 3 4
The final step is setting up a peering with MX80. The main object that needs to be created is
BgpRouter, which contains a pointer to BGP session parameters object with session-specific values like ASN and remote peer IP. BGP router is defined in a global context (default domain and default project) which will make it available to all configured virtual networks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
For the sake of brevity, I will not cover MX80’s configuration in details and simply include it here for reference with some minor explanatory comments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
The easiest way to verify that BGP peering has been established is to query OpenContrail’s introspection API:
1 2 3
Datapath verification can be done from either side, in this case I’m showing a ping from MX80’s global VRF towards one of the OpenStack VMs:
1 2 3 4 5 6 7 8
To keep things simple I will not use multiple dot1Q interfaces and setup a BGP peering with CumulusVX over a normal, non-trunk interface. From CumulusVX I will inject a loopback IP
220.127.116.11/32 into the
irb-net network. Since REST API python library I’ve used above is two major releases behind the current version of OpenContrail, it cannot be used to setup BGPaaS feature. Instead I will demonstrate how to use REST API directly from the command line of all-in-one VM using cURL.
In order to start working with OpenContrail’s API, I first need to obtain an authentication token from OpenStack’s keystone. With that token I can now query the list of IPs assigned to all OpenStack instances and pick the one assigned to CumulusVX. I need the UUID of that particular IP address in order to extract the ID of the VM interface this IP is assigned to.
1 2 3 4 5 6 7
With VM interface ID saved in a
VMI_ID variable I can create a BGPaaS service and link it to that particular VM interface.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
The final step is setting up a BGP peering on the CumulusVX side. CumulusVX configuration is very simple and self-explanatory. The BGP neighbor IP is the IP of virtual network’s default gateway located on local vRouter.
1 2 3 4 5 6 7 8 9 10 11
Here’s where we come across another limitation of EVPN. The loopback prefix
18.104.22.168/32 does not get injected into EVPN address family, however it does show up automatically in the VPNv4 address family which can be verified from the MX80:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
It’s hidden since I haven’t configured MPLSoUDP dynamic tunnels on MX80. However this proves that the prefix does get injected into customer VPNs and become available on all devices with the matching import route-target communities.
This post concludes Series 2 of my OpenStack SDN saga. I’ve covered quite an extensive range of topics in my two-part series, however, OpenStack networking landscape is so big, it’s simply impossible to cover everything I find interesting. I started writing about OpenStack SDN when I first learned I got a job with Nokia. Back then I knew little about VMware NSX and even less about OpenStack. That’s why I started researching topics that I found interesting and branching out into adjacent areas as I went along. Almost 2 years later, looking back I can say I’ve learned a lot about the internals of SDN in general and hopefully so have my readers. Now I’m leaving Nokia to rediscover my networking roots at Arista. I’ll dive into DC networking from a different perspective now and it may be awhile before I accumulate a critical mass of interesting material to start spilling it out in my blog again. I still may come back to OpenStack some day but for now I’m gonna take a little break, maybe do some house keeping (e.g. move my blog from Jekyll to Hugo, add TLS support) and enjoy my time being a farther.