Now that Ansible has done all the information gathering for us it’s time to finally make use of it. In this post I will show how to use Ansible to run traceroutes from and to the hosts defined in a test scenario and perform verification of the results of those tests. Should any of those tests fail, Ansible will provide a meaningful description of what exactly failed and why. While doing all this I’ll introduce a couple of new Ansible features like conditional looping and interactive prompts.
TDD Playbook
In order to run and verify tests I will create a separate playbook. It makes sense to separate it from the previous playbook simply because this time it will be used multiple times, while the information gathering playbook can only be run once. The new playbook will have to accomplish the following tasks:
- Select which scenario to test
- Run tests as specified in that scenario
- Parse test results
- Verify that test results conform to the specification
Selecting test scenario
Our scenarios/all.txt
file contains multiple test scenarios each defined by a name. Each test scenario represent a certain state in the network, e.g. scenario #1 tests how the network behaves in a normal state with no outages or link failures, scenario #2 tests how traffic should be rerouted in the event of primary link failure. Inside each scenario there are one or more test steps each testing a behaviour of a particular traffic flow, e.g. traffic from router R1 to router R4 should traverse R2 followed by R3. Each steps contains keywords From
, To
and Via
which identify source, destination and transit routers. This is how a typical scenario file looks like.
1. Testing of Primary Link
1.1 From R1 to R3 via R2
1.2 From R1 to R4 via R2, R3
1.3 From R2 to R4 via R3
1.4 From R1 to R2 via R2
2. Testing of Backup Link
2.1 From R1 to R3 via R4
2.2 From R1 to R2 via R4,R3
In the previous post I showed how to parse and store these scenarios in YAML dictionary in group_vars/all.yml
file, which makes this information automatically available to any future playbooks. So in the new playbook ~/tdd_ansible/cisco_tdd.yml
all we need to do is let the user decide which scenario to test:
- name: Run traceroute commands
hosts: cisco-devices
gather_facts: false
remote_user: cisco
vars_prompt:
- name: scenario_num
prompt: "Enter scenario number"
default: "1"
private: no
tasks:
- name: extracting scenario name and steps
set_fact:
scenario_steps: "{{ scenarios[scenario_num][1] }}"
scenario_name: "{{ scenarios[scenario_num][0] }}"
This playbook contains a standard header followed by a vars_prompt
section which prompts user to select a particular scenario number and stores the selection in scenario_num
variable. The first task in the playbook extracts scenario name and steps from scenarios
dictionary stored in group_vars/all.yml
file and stores them in respective variables. Of course this task is optional and it’s possible to reference the same data using full notation, however I prefer things to be more readable even if it leads to some inefficient memory use.
Run test specified in scenario steps
Now it’s time to run traceroutes to see how the packets flow in the network. As we did in one of the previous posts we’ll use the raw
module to run traceroutes. However this time, instead of running a full-mesh any-to-any traceroutes we’ll only run them if they were defined in one of the test steps. Indeed, why would we run a traceroute between devices if we’re not going to verify it? Ansible’s conditionals will help us with that. For each of the hosts in cisco-devices
group we’ll look into scenario_steps dictionary and see if there were any tests defined and if there were, we’ll run a traceroute to each of the destination hosts.
- name: run traceroutes as per the defined scenario steps
raw: traceroute {{ hostvars[item.key]['ansible_ssh_host'] }} source Loopback0 probe 1 numeric
when: scenario_steps[inventory_hostname] is defined
with_dict: scenario_steps[inventory_hostname]|default({})
register: trace_result
When both a loop (with_dict
) and a conditional (when
) are defined in a task, Ansible does the looping first. That’s why if a test scenario is not defined for a particular host (e.g. R3
) the conditional check will fail and stop execution of the playbook. To overcome that we can use Ansible (Jinja) templates inside the with_dict
loop. Appending |default({})
will instruct Ansible create an empty dictionary in case scenario_steps[inventory_hostname]
does not exist which will make conditional return False
and skip this host altogether.
Parse test results
There’s no silver bullet when it comes to parsing of the outcome of traceroute command. We’ll have to use Python to traverse the textual output line by line looking for msec
and storing all found IPs in a list.
class TraceParse(object):
def __init__(self, module):
self.std_out = module.params['std_out']
self.dest_host = module.params['dest_host']
def parse(self):
result = dict()
path = list()
for line in self.std_out.split("\n"):
if 'msec' in line:
path.append(line.split()[1])
result[self.dest_host] = path
return result
def main():
module = AnsibleModule(
argument_spec=dict(
std_out=dict(required=True, type='str'),
dest_host=dict(required=True, type='str')
)
)
traceParser = TraceParse(module)
result = traceParser.parse()
module.exit_json(changed=False, ansible_facts=result)
# import module snippets
from ansible.module_utils.basic import *
main()
The playbook task will run through each hosts’ trace_results variable and pass it to the trace parse module.
- name: parse traceroute ouput
cisco_trace_parse:
dest_host: "{{ item.item.key }}"
std_out: "{{ item.stdout }}"
connection: local
when: item.stdout is defined
with_items: trace_result.results
Test verification
Finally we need to compare the captured output with the scenario steps. This time all the information collected by Ansible in the previous tasks needs to be passed to a module.
- name: verify traceroutes against pre-defined scenarios
cisco_tdd_verify:
dest_host: "{{ item.key }}"
src_host: "{{ inventory_hostname }}"
scenario: "{{ scenario_steps }}"
ip2host: "{{ ip2host }}"
path: "{{ hostvars[inventory_hostname][item.key] }}"
scenario_name: "{{ scenario_name }}"
when: scenario_steps[inventory_hostname] is defined
with_dict: scenario_steps[inventory_hostname]|default({})
connection: local
Ansible module contains a class with a single public method compare
. The first thing it does is converts the list of IP addresses of transit devices into a list of hostnames. That’s where the IP-to-Hostname dictionary created in the previous playbook is first used. IP address is used as a lookup key and the Hostname is extracted from the first element of the returned list (second element, the interface name, is currently unused). The private method __validatepath
is used to confirm that devices listed after Via
in a test scenario are present in the traceroute path in the specified order. If this verification fails, the whole module fails and the error message is passed back to Ansible playbook.
class ResultCompare(object):
def __init__(self, module):
self.dest_host = module.params['dest_host']
self.src_host = module.params['src_host']
self.trace_path = module.params['path']
self.ref_scenario = module.params['scenario']
self.ip2host = module.params['ip2host']
self.scenario_name = module.params['scenario_name']
def compare(self):
trace_path_new = list()
for dev in self.trace_path:
if dev in self.ip2host:
trace_path_new.append(self.ip2host[dev][0])
else:
trace_path_new.append(dev)
if self.src_host in self.ref_scenario:
if self.dest_host in self.ref_scenario[self.src_host]:
ref_path = self.ref_scenario[self.src_host][self.dest_host]
if not self. __validatepath(trace_path_new):
msg = "Failed scenario " + self.scenario_name + ".\r\nTraceroute from " + self.src_host + " to " + self.dest_host + " has not traversed " + str(ref_path)
msg += "\r\n Actual path taken: " + ' -> '.join([self.src_host] + trace_path_new) + "\r\n"
return 1, msg
return 0, 'no error'
def __validatepath(self, path):
index = 0
for device in path:
if device == self.ref_scenario[self.src_host][self.dest_host][index]:
index += 1
if index == len(self.ref_scenario[self.src_host][self.dest_host]):
return True
return False
def main():
module = AnsibleModule(
argument_spec=dict(
dest_host=dict(required=True, type='str'),
src_host=dict(required=True, type='str'),
scenario=dict(required=True, type='dict'),
ip2host=dict(required=True, type='dict'),
path=dict(required=True, type='list'),
scenario_name=dict(required=True, type='str')
)
)
comparator = ResultCompare(module)
rc, error = comparator.compare()
if rc != 0:
module.fail_json(msg=error)
else:
module.exit_json(changed=False)
from ansible.module_utils.basic import *
main()
TDD in action
So let’s finally see the whole thing action. First let’s modify a 4-router topology so that traffic from R1 to R4 is routed via R2 and R3 (a simple delay 9999
on Ethernet0/1 will do). Now let’s run the first scenario and verify that no errors are displayed.
~/tdd_ansible# ansible-playbook cisco_tdd.yml
Enter scenario number [1]:
PLAY [Run traceroute commands] ************************************************
...
PLAY RECAP ********************************************************************
R1 : ok=4 changed=0 unreachable=0 failed=0
R2 : ok=4 changed=0 unreachable=0 failed=0
R3 : ok=2 changed=0 unreachable=0 failed=0
R4 : ok=2 changed=0 unreachable=0 failed=0
Nothing much really, which is good, that means all scenarios were verified successfully. Now let’s see how it fails. The easiest way is to run the tests from a second scenario, the one that assumes that the link between R1 and R2 failed and all the traffic is routed via R4.
~/tdd_ansible# ansible-playbook cisco_tdd.yml
Enter scenario number [1]: 2
PLAY [Run traceroute commands] ************************************************
...
TASK: [verify traceroutes against pre-defined scenarios] **********************
skipping: [R2]
skipping: [R4]
failed: [R1] => (item={'key': 'R2', 'value': ['R4', 'R3']}) => {"failed": true, "item": {"key": "R2", "value": ["R4", "R3"]}}
msg: Failed scenario Testing of Backup Link.
Traceroute from R1 to R2 has not traversed ['R4', 'R3']
Actual path taken: R1 -> R2
failed: [R3] => (item={'key': 'R1', 'value': ['R4']}) => {"failed": true, "item": {"key": "R1", "value": ["R4"]}}
msg: Failed scenario Testing of Backup Link.
Traceroute from R3 to R1 has not traversed ['R4']
Actual path taken: R3 -> R2 -> R1
failed: [R1] => (item={'key': 'R3', 'value': ['R4']}) => {"failed": true, "item": {"key": "R3", "value": ["R4"]}}
msg: Failed scenario Testing of Backup Link.
Traceroute from R1 to R3 has not traversed ['R4']
Actual path taken: R1 -> R2 -> R3
PLAY RECAP ********************************************************************
to retry, use: --limit @/root/cisco_tdd.retry
R1 : ok=3 changed=0 unreachable=0 failed=1
R2 : ok=2 changed=0 unreachable=0 failed=0
R3 : ok=3 changed=0 unreachable=0 failed=1
R4 : ok=2 changed=0 unreachable=0 failed=0
Here all 3 test steps within a scenario failed. Ansible displayed error messages passed down by our module, specifying the expected and the actual path.
Now if we simply shutdown Ethernet0/0 of R1 to simulate a link failure and re-run the same scenario all tests will succeed again.
~/tdd_ansible# ansible-playbook cisco_tdd.yml
Enter scenario number [1]: 2
PLAY [Run traceroute commands] ************************************************
...
PLAY RECAP ********************************************************************
R1 : ok=4 changed=0 unreachable=0 failed=0
R2 : ok=2 changed=0 unreachable=0 failed=0
R3 : ok=4 changed=0 unreachable=0 failed=0
R4 : ok=2 changed=0 unreachable=0 failed=0
So there it is, a working network TDD framework in action. I still haven’t covered a lot of corner cases (e.g. when traceroute times out) and deployment scenarios (device with VRFs) but it should still work for a lot of scenarios and can be easily extended to cover those corner cases.