A common culprit behind some of the biggest outages in the Internet is misconfigured BGP route policies. For example:
- BGP Leak Causing Internet Outages in Japan and Beyond
- How a Tiny Error Shut Off the Internet for Parts of the US
- Telia engineer error to blame for massive net outage
Such outages typically occur when route policy configuration changes end up accidentally leaking routes or accepting routes they shouldn’t.
That these events occur regularly and across a wide swath of networks reinforces our belief that getting network configuration right is a tools problem, not an experience or training problem. Validating network configuration in general, and BGP route policies in particular, is a highly complex task for which engineers need better tools. While network engineers often know what their route policies should or should not do (e.g., see MANRS guidelines), ensuring that the policy implementation matches their intent is notoriously hard.
Batfish solves this problem by providing two ways to analyze routing policies:
- Test the policy against a specific set of input routes with testRoutePolicies
- Find (search for) a specific set of input routes that trigger a specific action by the policy with searchRoutePolicies (new!).
These questions bring to route policies similar capabilities that you may already know and love from Batfish for analyzing ACLs and firewall rules. They make it easy to find bugs in routing policies and get strong correctness guarantees, all before you deploy changes to the network.
Here are just a few examples of the kinds of intents that you can validate with these analyses:
- Deny all incoming routes with private addresses
- Only permit incoming routes if they have the correct origin AS
- Tag incoming routes from a neighbor with a specific community
- Set the local preference for all customer routes to 300
- Advertise only prefixes that we (and our customers) own
Testing Route Policy Behavior
The testRoutePolicies question enables you to test the behavior of a route policy for specific routes of interest. You can find out,
- if the route will be permitted or denied by the policy.
- if permitted, how attributes such as communities are transformed.
For example, to test the “deny all incoming routes with private addresses” intent you would run testRoutePolicies on routes with prefixes in the private address space and check that all of them are denied.
Let’s take a look at an example route-policy from_customer and evaluate its behavior with testRoutePolicies.
route-map from_customer deny 100 match ip address prefix-list private-ips ! route-map from_customer permit 200 match ip address prefix-list from44 match as-path origin44 set community 20:30 set local-preference 300 ! route-map from_customer deny 300 match ip address prefix-list from44 ! route-map from_customer permit 400 set community 20:30 set local-preference 300 ip prefix-list private-ips seq 5 permit 10.0.0.0/8 ge 8 ip prefix-list private-ips seq 10 permit 172.16.0.0/28 ge 28 ip prefix-list private-ips seq 15 permit 192.168.0.0/16 ! ip prefix-list from44 seq 10 permit 184.108.40.206/24 ge 24 ! ip as-path access-list origin44 permit _44$
inRoute1 = BgpRoute(network="10.0.0.0/24", originatorIp="220.127.116.11", originType="egp",protocol="bgp") result = bfq.testRoutePolicies(policies="from_customer",direction="in", inputRoutes=[inRoute1]).answer().frame() print(result) Node Policy_Name Input_Route Action Output_Route Difference 0 border1 from_customer BgpRoute(network='10.0.0.0/24', originatorIp='18.104.22.168', originType='egp', protocol='bgp', asPath=, communities=, localPreference=0, metric=0, sourceProtocol=None) DENY None None 1 border2 from_customer BgpRoute(network='10.0.0.0/24', originatorIp='22.214.171.124', originType='egp', protocol='bgp', asPath=, communities=, localPreference=0, metric=0, sourceProtocol=None) DENY None None
As you can see, Batfish correctly determines that the 10.0.0.0/24 route advertisement will get denied by the policy.
This capability is extremely useful when designing (or changing) your routing policy. For a concrete set of routes you can determine the specific behavior of the routing policy. The testRoutePolicies question achieves this by simulating the behavior of the route policy on input routes.
Searching for Route Policy Misbehaviors (Verification)
Testing is extremely useful for debugging route policies, but it can only guarantee that the policy behaves correctly on the specific routes that are tested. The space of potential input routes is so large, it would be infeasible to test each one individually. This is where searchRoutePolicies comes into play. It allows you to verify the policy against your intent, across all possible routes. The searchRoutePolicies question has been recently added to Batfish and can be used to analyze a host of common route policy behaviors.
The searchRoutePolicies question provides comprehensive guarantees by searching for routes that cause a route policy to behave in a particular way. You start by describing a space of potential input routes—using any combination of prefix ranges, a list of allowed communities, an AS-path regular expression, etc.—along with an action (permit or deny). Batfish will search this space of potential input routes and identify a route, if one exists, for which the route policy you are evaluating takes the specified action.
For example, to verify the “deny all incoming routes with private addresses” intent, you would specify the space of interest as all routes with private addresses and search if anything in that space is permitted. If Batfish returns any route, that indicates that the routing policy violates your intent. Conversely, if there are no results then you can be sure that the intent is satisfied, and that all routes with private addresses are indeed denied.
# Define the space of private addresses and route announcements privateIps = ["10.0.0.0/8:8-32", "172.16.0.0/28:28-32", "192.168.0.0/16:16-32"] inRoutes1 = BgpRouteConstraints(prefix=privateIps) # Verify that no such announcement is permitted by our policy result = bfq.searchRoutePolicies(policies="from_customer", inputConstraints=inRoutes1, action="permit").answer().frame() print(result.loc) Node border2 Policy_Name from_customer Input_Route BgpRoute(network='192.168.0.0/32', originatorIp='0.0.0.0', originType='igp', protocol='bgp', asPath=, communities=, localPreference=0, metric=0, sourceProtocol=None) Action PERMIT Output_Route BgpRoute(network='192.168.0.0/32', originatorIp='0.0.0.0', originType='igp', protocol='bgp', asPath=, communities=['20:30'], localPreference=300, metric=0, sourceProtocol=None) Difference BgpRouteDiffs(diffs=[BgpRouteDiff(fieldName='communities', oldValue='', newValue='[20:30]'), BgpRouteDiff(fieldName='localPreference', oldValue='0', newValue='300')])
Batfish has found a route advertisement 192.168.0.0/32 that will be allowed by the routing policy, despite our intent being for it to be denied. There may be multiple route advertisements that violate our intent, Batfish picks one as an example to highlight the error. If you look closely at the routing policy, the route-map from_customer is going to deny routes that match the prefix-list private-ips. The last entry in that prefix-list is incorrect. It is missing the “ge 16” option. As defined, that entry only matches the exact route 192.168.0.0/16, which means any other prefix from that 192.168.0.0/16 space will not be matched and therefore not be denied by the route-map.
route-map from_customer deny 100 match ip address prefix-list private-ips ip prefix-list private-ips seq 5 permit 10.0.0.0/8 ge 8 ip prefix-list private-ips seq 10 permit 172.16.0.0/28 ge 28 ip prefix-list private-ips seq 15 permit 192.168.0.0/16
You can also use searchRoutePolicies to ensure that your routing policy is correctly transforming routes it accepts. To do this, you specify a space of output routes, using a combination of prefix ranges, a list of communities, AS-path regular expressions, etc…, along with the space of input routes. Batfish will return any input route that after being transformed by the routing policy falls in the space of the output routes. This capability can be used to validate an intent like “set the local preference for all customer routes to 300” by searching for input customer routes that do not land in the output space of routes with a local preference of 300.
You may be curious how this magic works under the hood–after all, the space of routes can be huge, representing billions of potential routes. This is where the power of Batfish comes in. Batfish encodes the route policy, which is essentially a function that maps input routes to output routes, as a mathematical equation, with a series of constraints. Using a similar algorithm to how we search for packets that meet specific criteria, Batfish solves this mathematical equation.
If you have any questions, or have complex routing policies that you need help analyzing get in touch via GitHub or Slack.
Examples and More Information
To learn more, check out these resources:
- Our new Jupyter notebook, which provides examples of using both testRoutePolicies and searchRoutePolicies to validate route policies.
- A NANOG talk on using testRoutePolicies for pre-deployment validation.
- Documentation for the questions.