Data Formats and Models

Introduction to Data Formats and Data Models:

If you’re reading this, then you’ve most likely seen some compelling examples of network automation online or in a presentation and you want to learn more, you may have heard of JSON and YAML, but what are these things? And what’s their relevance in Network Programmability and Automation?

The idea of an encoded data format like XML or JSON is to help represent data in a format that machines can understand. For example, machines would read the below output as one long string:

A machine would read this output without understanding what the relationship is between the different pieces of data like the neighbor ID, state, dead time, etc. That’s where the data format comes into play.

The data format is going to allow us to structure the data in a manner that the machine would ingest and be able to understand the relationships between the various data pieces. Once a machine understands the data, then it can make use of it.

To drive the point home, in the same way that routers and switches require standardized protocols such as OSPF, BGP and TCP/IP in order to communicate, software applications need to be able to agree on syntax in order to exchange data between them. For this, applications can use standard data formats like JSON and XML (among others). Not only do applications need to agree on how the data is formatted, but also on how the data is structured. Data models like YANG define how the data stored in a particular data format is structured.

Having said that, let us talk about the first data format of interest, which is XML:

    1. XML stands for eXtensible Markup Language. XML was designed to store and transport data. XML was designed to be both consumable by machines and friendly to humans. XML enjoys wide support in a variety of tools and languages, such as the LXML library in Python. In fact, the XML definition itself is accompanied by a variety of related definitions for things like schema enforcement, transformations, and advanced queries.XML is hierarchical with parent and child constructs as seen below. This is an XML format of a show ospf neighbor command:

      DATA MODELS

      In the above, the <router> element/tag is referred to as the root/parent, which nests the <ospf> element within it (referred to as a child). This child element also has a nested child element called <neighbor>. XML elements can have attributes, for instance the neighbor element has a hostname attribute; this can be used to uniquely identify and/or distinguish the neighbor tags within the ospf element. All XML elements/tags must have an opening and closing tag </>. Also, unlike HTML, XML does not have predefined tags; we can be as creative as we want to be. We could also add comments into XML.

    2. The second widely used data format is JSON, which stands for JavaScript Object Notation. It is written in the syntax of the JavaScript notation. JSON seems to be the new kid on the block when it comes to network information exchange over the network. JSON obviously is not new, it has been around for a long time, and is a popular format used for many sites, including Twitter. Here’s an example file in JSON:

      DATA MODELS

      The whole file is wrapped in curly braces {}, indicating that JSON objects are contained inside. Objects can be thought of as key-value pairs or dictionaries, same as in YAML. JSON objects always use string values when describing the keys in these constructs. For example, “ospf” is a key, with a list value of multiple key-value pairs. Objects are separated by commas, with square brackets denoting lists. There’s no native concept of attributes or comments in JSON. In my opinion, JSON offers the same type of hierarchy structure as XML, but it is more lightweight and therefore a better fit to bubble up messages from routers and switches that is more complex to express than syslog.

    3. The third is YAML (a recursive acronym for “YAML Ain’t Markup Language”), which is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax. Here’s an example:

      DATA MODELS

      The three hyphens (—) are used at the top of every YAML file to indicate the start of the file, represented with either the ‘.yaml’ or ‘.yml’ extension. YAML very closely mimics the flexibility of Python’s data structures, one could easily use a combination of data types e.g., a list, strings, integers, Booleans, dictionaries etc. The example above is making use of a dictionary, with multiple key-value pairs. The keys are to the left of the colon with the corresponding values to the right. YAML is also extremely white space sensitive and allows for the addition of comments. An increasing number of tools e.g., Ansible, Nornir and Kubernetes are using YAML as a method of defining an automation workflow or providing a data set to work with (Like a list of VLANs). It’s very easy to use YAML to get from zero to a functional automation workflow, or to define the data you wish to push to a device.

    4. YANG is used to model configuration and operational state data and used to model general RPC data. General RPC data and tasks allow us to model generic tasks, such as upgrading a device.The YANG data model provides the ability to define syntax and semantics to more easily define data using built-in and customizable types. You can enforce semantics such that VLAN IDs must be between 1 and 4094. You can enforce the operational state of an interface in that it must be “up” or “down.” The model defines these types of constructs and ultimately becomes the source of truth on what’s permitted on a network device.Here’s an example:

      DATA MODELS

The YANG syntax includes the leaf statement, which allows you to define an object that is a single instance, has a single value, and has no children. The YANG list statement allows you to create a list of leafs or leaf-lists. Finally, the YANG container statement denotes a container node which is used to group related nodes in a subtree. It has only child nodes and no value and may contain any number of child nodes of any type (including leafs, lists, containers, and leaf-lists).

A model can be written in YANG and then be represented as either JSON or XML, this is the capability that the RESTCONF protocol offers. RESTCONF is a REST API that uses XML or JSON-encoded data that happens to represent data defined by YANG models. We’ll talk about RESTCONF in a future article.

The main difference between data formats such as JSON, YAML, XML and data models such as YANG is that data formats specify how data is encoded, but don’t necessarily define the structure of the data. On the other hand, data models define the structure of data encoded in a data format. YANG is neutral to the encoding type. However, we have data models like XSD that is specific to XML and JSON schema that is specific to JSON.

Data Formats in Relation to API’s:

As we conclude, let’s highlight how these data formats map into API’s that are used to interface with network devices. We can summarize this in the table below:

API Data Format
NETCONF XML
RESTCONF XML or JSON
REST XML or JSON
gRPC Protocol Buffers

References:

  1. Network Programmability and Automation by Jason Edelman
  2. Data Formats: XML, JSON and YAML by Calvin Remsburg
  3. networktocode.com