There is a huge emphasis in the networking community around automation and validation. Network automation builds on the work done for server automation. The solutions are more mature and and the terminology describing the solutions and tasks are well defined. Terms like “idempotent,” “task-based,” “state-based,” “agentless,” etc. are well understood.

Network validation, however, does not have a nuanced vocabulary. The general term “network validation” gets used to refer to a number of disparate activities, and specific terms get used by different engineers to mean different things. This lack of nuance hinders the communication and collaboration required to advance network validation technology. That, in turn, harms the adoption of network automation. It is too risky to use automation without effective validation; a single typo can bring down the entire network within seconds.

In this post, we outline different dimensions of network validation and hope to start a conversation about developing a precise vocabulary. We will discuss the what, when and how of network validation.

A few decades ago, car odometers were designed to roll over to zero after 99,999 miles because it was rare for cars to last that long. But today cars come with a warranty for 100,000 miles because it is rare for cars to not last that long. This massive reliability improvement has come about despite the significantly higher complexity of modern cars. Cars have followed the arc of many engineering artifacts, where human ingenuity brought them to their initial working form and then robust engineering techniques made them work well.

The computer hardware and software domains have also invested heavily in robust engineering techniques to improve reliability. One domain where reliability improvements have lagged is computer networking, where outages and security breaches that disrupt millions of users and critical services are all too common. While there are many underlying causes for these incidents, studies have consistently shown that the vast majority are caused by errors in the configuration of network devices. Yet engineers continue to manually reason about the correctness of network configurations. While the original Internet was an academic curiosity, today’s networks are too critical for businesses and society, and also too complex—they span the globe and connect billions of endpoints—-for their correctness to be left (solely) to human reasoning.

When you compare software and network engineering trends at a high level, the contrast is striking. Application development has become remarkably agile, robust and responsive, while the networks that carry those apps have not. They continue to be slow to evolve and prone to error. The difference is tools.

Software engineers have leveraged a suite of tools to rapidly respond to changing business needs, accelerate development and improve reliability. Network engineers need to follow suit. The tools they need are now available.

We are excited to announce the release of pybatfish, an open-source Python SDK for Batfish. Batfish is an open-source, multi-vendor network validation framework that enables network engineers, architects and operators to proactively test and validate network design and configuration. It is being used in some of the world’s largest networks to prevent deployment of incorrect configurations that can lead to outages or security breaches.

Batfish simulates the network behavior and builds a model just from device configurations, thus predicting how the network will forward packets and how it will react to failures. This capability of building the model from the just the device configurations enables Batfish to evaluate network changes and guarantee correctness proactively, without requiring configuration changes to be first pushed to the network.

The inherent complexity in today's networks means humans are simply incapable of reasoning about its correctness. Yet network engineers are asked to do so on a daily basis. It is no surprise then that we consistently see headlines such as “Comcast Suffers Outage Due to Significant Level 3 BGP Route Leak” or “Google accidentally broke Japan's Internet”. Fortunately, recent advances in network validation, specifically control plane validation, can provide strong guarantees on the correctness of network configuration and completely prevent such errors.

Using network validation tools like Batfish, network engineers can make configuration changes without taking down the Internet, making headlines like those above a thing of the past.

Intentionet © 2019