DiffPlus is now used by NAPALM Huawei VRP driver! See PR.
A lightweight module to help in the comparison of config files. In particular, it computes an incremental diff between two indented config files whilst respecting the scope of the indented blocks (aka contextual diff).
The module leverages the n-ary tree data structure to achieve such a diff. Taking about a hundred lines of code, it only relies on Python builtins and has no extra dependencies.
The whole point of DiffPlus is to compare config files before running them in production.
Basically, we'd like to merge A
config file (the candidate config, partial or full) into another B
config file (the running config). The next section states the problem and gives an insight of what we expect from the module.
Say we have two indented configs:
Config A (to merge into B) | Config B |
---|---|
|
|
The example shows Cisco configs but it applies to any indented config (NOT necessarily network-related).
Some items of A
are missing in B
. How to find them?
- We are only looking for new items to be added in
B
— it is called an incremental diff - The comparison must respect the indented blocks scope — it is called a contextual diff
A line-by-line diff will not help here since
A
is typically a partial config to merge into a full one beingB
.
We humans are able to compute that diff with ease because we visually identify blocks and items (though we make mistakes, sometimes). After some effort, we'd end up with the following result in mind:
# items of A to be added in B (to be computed by diffplus)
+no ip domain lookup
interface FastEthernet0/0
+ no shutdown
+interface FastEthernet0/0.10
+ description LAN
+ encapsulation dot1Q 10
+ ip address 192.168.1.254 255.255.255.0
router bgp 64512
address-family ipv4
+ neighbor 172.16.0.1 allowas-in 1
+ network 192.168.1.0 mask 255.255.255.0
As stated above, this is an incremental diff: there are only additions (hence the module name), no deletions.
As simple as it seems, such a diff is not so trivial for an algorithm.
- Convert each config to an n-ary tree
- Do a deep comparison of the n-ary trees
The n-ary tree data structure will help: to represent how nested the items are, to do the match between blocks.
Because we deal with config files, each line is unique per indented block. Therefore, no need for a list of nodes (allowing for duplicates). We can directly use raw nested dicts as n-ary trees, keys being the lines.
For comparison, this gist illustrates another approach based on a list of nodes.
The deep comparison can then be achieved with either a dedicated third-party module or the lightweight IncrementalDiff helper embedded in this module.
The config files used below are configA.txt and configB.txt.
DiffPlus is available on PyPI:
pip install diffplus
The to_dict()
method converts an indented config to an n-ary tree:
from diffplus import IndentedConfig
from json import dumps
configA = open('configA.txt').read() # or it may be a string
configB = open('configB.txt').read() # or it may be a string
configA = IndentedConfig(configA, comment_char='!', sanitize=True)
configB = IndentedConfig(configB, comment_char='!', sanitize=True)
print(dumps(configA.to_dict(), indent=2))
print(dumps(configB.to_dict(), indent=2))
Output:
Config A (to merge into B) | Config B |
---|---|
{
"no ip domain lookup": {},
"interface FastEthernet0/0": {
"no shutdown": {}
},
"interface FastEthernet0/0.10": {
"description LAN": {},
"encapsulation dot1Q 10": {},
"ip address 192.168.1.254 255.255.255.0": {}
},
"router bgp 64512": {
"neighbor 172.16.0.1 remote-as 100": {},
"address-family ipv4": {
"neighbor 172.16.0.1 activate": {},
"neighbor 172.16.0.1 allowas-in 1": {},
"neighbor 172.16.0.1 prefix-list IN in": {},
"neighbor 172.16.0.1 prefix-list OUT out": {},
"network 192.168.1.0 mask 255.255.255.0": {}
}
}
} |
{
"hostname R1": {},
"interface FastEthernet0/0": {
"description LAN": {},
"no ip address": {},
"shutdown": {},
"duplex auto": {},
"speed auto": {}
},
"router bgp 64512": {
"neighbor 172.16.0.1 remote-as 100": {},
"address-family ipv4": {
"neighbor 172.16.0.1 activate": {},
"neighbor 172.16.0.1 prefix-list IN in": {},
"neighbor 172.16.0.1 prefix-list OUT out": {}
}
},
"ip prefix-list IN seq 5 permit 192.168.2.0/24": {},
"ip prefix-list OUT seq 5 permit 192.168.1.0/24": {}
} |
There is no list but only dicts. Each item may have child items. The nesting level is NOT limited.
At this point, you are free to compare the dicts the way you want. However, to that end, you may be interested in existing modules like DeepDiff or the lightweight IncrementalDiff helper embedded in this module.
ℹ Depending on your config format, both indent_char
and comment_char
can be set at init:
IndentedConfig(config, comment_char='#', indent_char=' ') # default values (e.g., for Huawei)
IndentedConfig(config, comment_char='!') # '!' as comment_char (e.g., for Cisco)
IndentedConfig(config, indent_char='\t') # tab as indent_char
IndentedConfig(config, sanitize=True) # see next section
Your config may:
- have trailing spaces
- have blank lines
- contain comments
- not be correctly indented
The sanitize()
method has been made to address this. The last point especially would break the tree conversion. The other ones would make some parts of the diff irrelevant.
from diffplus import IndentedConfig
config = """
# a global command badly indented
sysname PE
#
interface GigabitEthernet0/0/1
# badly indented as well
description my-super-description
ip address 1.1.1.1 255.255.255.0
#
"""
config = IndentedConfig(config, comment_char='#')
print(config)
config.sanitize()
print(config)
Print before sanitizing ❌ | Print after sanitizing ✔ |
---|---|
# a global command badly indented
sysname PE
#
interface GigabitEthernet0/0/1
# badly indented as well
description my-super-description
ip address 1.1.1.1 255.255.255.0
# |
sysname PE
interface GigabitEthernet0/0/1
description my-super-description
ip address 1.1.1.1 255.255.255.0 |
ℹ It is recommended to always sanitize
at init:
config = IndentedConfig(config, sanitize=True)
To compute the diff just as introduced in the #problem section:
from diffplus import IndentedConfig, IncrementalDiff
configA = open('configA.txt').read()
configB = open('configB.txt').read()
configA = IndentedConfig(configA, comment_char='!', sanitize=True)
configB = IndentedConfig(configB, comment_char='!', sanitize=True)
diff = IncrementalDiff(configA, configB)
print(diff)
Output:
# items of A to be added in B (computed by IncrementalDiff)
+no ip domain lookup
interface FastEthernet0/0
+ no shutdown
+interface FastEthernet0/0.10
+ description LAN
+ encapsulation dot1Q 10
+ ip address 192.168.1.254 255.255.255.0
router bgp 64512
address-family ipv4
+ neighbor 172.16.0.1 allowas-in 1
+ network 192.168.1.0 mask 255.255.255.0
Alternatively, we can merge A
into B
. It is useful to get a preview of the full config before applying it:
from diffplus import IndentedConfig, IncrementalDiff
configA = open('configA.txt').read()
configB = open('configB.txt').read()
configA = IndentedConfig(configA, comment_char='!', sanitize=True)
configB = IndentedConfig(configB, comment_char='!', sanitize=True)
diff = IncrementalDiff(configA, configB, merge=True)
print(diff)
Output:
# items of A merged into B (computed by IncrementalDiff)
hostname R1
interface FastEthernet0/0
description LAN
no ip address
shutdown
duplex auto
speed auto
+ no shutdown
router bgp 64512
neighbor 172.16.0.1 remote-as 100
address-family ipv4
neighbor 172.16.0.1 activate
neighbor 172.16.0.1 prefix-list IN in
neighbor 172.16.0.1 prefix-list OUT out
+ neighbor 172.16.0.1 allowas-in 1
+ network 192.168.1.0 mask 255.255.255.0
ip prefix-list IN seq 5 permit 192.168.2.0/24
ip prefix-list OUT seq 5 permit 192.168.1.0/24
+no ip domain lookup
+interface FastEthernet0/0.10
+ description LAN
+ encapsulation dot1Q 10
+ ip address 192.168.1.254 255.255.255.0
This option is useful to better visualize the changes in the diff:
from diffplus import IndentedConfig, IncrementalDiff
configA = open('configA.txt').read()
configB = open('configB.txt').read()
configA = IndentedConfig(configA, comment_char='!', sanitize=True)
configB = IndentedConfig(configB, comment_char='!', sanitize=True)
diff = IncrementalDiff(configA, configB, merge=True, colored=True)
print(diff)
Output:
Not colored | Colored |
---|---|
Colorization is done through ANSI escape sequences: \033[32m
for green color and \033[m
for reset color.
The incremental diff is computed recursively as a dict you can access (if needed) through the to_dict()
method:
diff = IncrementalDiff(configA, configB, merge=False) # or merge=True
print(dumps(diff.to_dict(), indent=2))
Output:
{
"+no ip domain lookup": {},
"interface FastEthernet0/0": {
"+no shutdown": {}
},
"+interface FastEthernet0/0.10": {
"description LAN": {},
"encapsulation dot1Q 10": {},
"ip address 192.168.1.254 255.255.255.0": {}
},
"router bgp 64512": {
"address-family ipv4": {
"+neighbor 172.16.0.1 allowas-in 1": {},
"+network 192.168.1.0 mask 255.255.255.0": {}
}
}
}
New items are marked with a +
so that they can be pretty rendered recursively via the __str__()
special method:
print(diff)
print(str(diff)) # equivalent
print(diff.__str__()) # equivalent
Converting IndentedConfig
to dict allows for deep comparison using existing modules like DeepDiff. This way, you are not stuck with the rather simplistic IncrementalDiff helper.
from diffplus import IndentedConfig
from deepdiff import DeepDiff
configA = open('configA.txt').read()
configB = open('configB.txt').read()
configA = IndentedConfig(configA, comment_char='!', sanitize=True)
configB = IndentedConfig(configB, comment_char='!', sanitize=True)
diff = DeepDiff(configB.to_dict(), configA.to_dict())
for item_added in diff['dictionary_item_added']:
print(item_added)
Output:
root['no ip domain lookup']
root['interface FastEthernet0/0.10']
root['interface FastEthernet0/0']['no shutdown']
root['router bgp 64512']['address-family ipv4']['neighbor 172.16.0.1 allowas-in 1']
root['router bgp 64512']['address-family ipv4']['network 192.168.1.0 mask 255.255.255.0']
But third-party modules are generally heavier (since they offer more features) and may not do exactly what you want. For example, the above output does not include (on purpose) the children of missing items, e.g., interface FastEthernet0/0.10
is missing from B
as well as its children yet NOT displayed:
interface FastEthernet0/0.10 # displayed in above output as root['interface FastEthernet0/0.10']
description LAN # not displayed in above output
encapsulation dot1Q 10 # not displayed in above output
ip address 192.168.1.254 255.255.255.0 # not displayed in above output
Because DiffPlus focuses on merging a config into another one—and not just doing a line-by-line diff—deletions aren't as easy as additions to compute. How to know that an item of A
will affect another one in B
?
Let's take an example in a network context:
# computed by diffplus (only additions)
interface FastEthernet0/0
description Some interface
+ no description
ip address 10.0.0.1 255.255.255.0
ip address 10.0.0.2 255.255.255.0 secondary
ip address 10.0.0.3 255.255.255.0 secondary
+ no ip address 10.0.0.3 255.255.255.0 secondary
+ ip address 10.0.0.4 255.255.255.0 secondary
speed 10
+ speed 100
Some of the new items will negate or change existing ones. So we'd like a smarter diff:
# NOT computed by diffplus (additions and deletions)
interface FastEthernet0/0
- description Some interface
ip address 10.0.0.1 255.255.255.0
ip address 10.0.0.2 255.255.255.0 secondary
- ip address 10.0.0.3 255.255.255.0 secondary
+ ip address 10.0.0.4 255.255.255.0 secondary
- speed 10
+ speed 100
We humans are able to compute that diff because we visually identify items and are familiar with the config logic. From an algorithmic point of view, however, it is challenging. Not only it depends on the config grammar and syntax (i.e., what are considered keywords and values) but also on the semantic (e.g., adding an item won't necessarily replace a similar one as it is the case for secondary
addresses).
The closest string match approach using an helper like difflib.get_close_matches()
is an interesting lead, yet not 100% accurate and it would have over-complexified the module.
By essence, DiffPlus is not suited for code diff. Each line is assumed to be unique per indented block. Therefore, the tree conversion won't work well with algorithms as they have statements which repeat in the code.
Let's take an example with pseudocode:
# algo.txt
if some_expression then
first_if_content
else
first_else_content
fi
if anoter_expression then
second_if_content
else
second_else_content
fi
Tree conversion:
from diffplus import IndentedConfig
from json import dumps
algo = open('algo.txt').read()
algo = IndentedConfig(algo)
print(dumps(algo.to_dict(), indent=4))
Output:
{
"if some_expression then": {
"first_if_content": {}
},
"else": {
"second_else_content": {}
},
"fi": {},
"if anoter_expression then": {
"second_if_content": {}
}
}
Inconsistencies:
- There is only one
fi
instead of twos. - Likewise, the first
else
has been overwritten by the second one at the same indentation level.
This is in accordance with how the module works: it has been designed for config diff, not code diff.
For now, DiffPlus does not support multiple characters as indentation symbol:
IndentedConfig(config, indent_char=' ') # 2 spaces (NOT supported)
IndentedConfig(config, indent_char=' ') # 4 spaces (NOT supported)
IndentedConfig(config, indent_char='\t') # tab char (supported)
The first two lines will raise an error:
ValueError: "indent_char" must be a char, not an str
The same limitation applies for comment_char
.
A future version may support it if the module gains interest in the community.
I didn't find a suitable module providing such an incremental and contextual diff simply based on an indented config (NOT necessarily Cisco-based or network-based for the sake of genericity).
- difflib (Python builtin) does a line-by-line diff
- DeepDiff compares dicts but does not convert an indented text to dict
- ConfigTree only supports YAML and JSON formats
- anytree drops the support of the indented text format
- diffios only supports Cisco format and it is not clear what it does exactly (variable parsing or diff?)
- shconfparser (probably the closest one) only supports Cisco format and does not offer diff features
- conf_diff does a line-by-line diff
Update: netutils, which is partially used in the NAPALM project, provides a similar diff mechanism. Yet network-oriented, the proposed implementation is interesting.