Internet-Draft knowledge-graph-yang May 2024
Martinez-Casanueva Expires 8 November 2024 [Page]
Workgroup:
Network Management Operations
Internet-Draft:
draft-marcas-nmop-knowledge-graph-yang-01
Published:
Intended Status:
Informational
Expires:
Author:
I. D. Martinez-Casanueva
Telefonica

Knowledge Graphs for YANG-based Network Management

Abstract

The success of the YANG language and YANG-based protocols for managing the network has unlocked new opportunities in network analytics. However, the wide heterogeneity of YANG models hinders the consumption and analysis of network data. Besides, data encoding formats and transport protocols will differ depending on the network management protocol supported by the network device. These challenges call for new data management paradigms that facilitate the discovery, understanding, integration and access to silos of heterogenous YANG data, abstracting from the complexities of the network devices.

This document introduces the knowledge graph paradigm has a solution to this data management problem, with focus on YANG-based network management. The document provides background on related topics such as ontologies and graph standards, and shares guidelines for implementing knowledge graphs from YANG data.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://idomingu.github.io/knowledge-graph-yang/draft-marcas-knowledge-graph-yang.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-marcas-nmop-knowledge-graph-yang/.

Discussion of this document takes place on the Network Management Operations Working Group mailing list (mailto:nmop@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nmop/. Subscribe at https://www.ietf.org/mailman/listinfo/nmop/.

Source for this draft and an issue tracker can be found at https://github.com/idomingu/knowledge-graph-yang.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 8 November 2024.

Table of Contents

1. Introduction

The size and complexity of networks keeps increasing, thus the path towards enabling an autonomous network requires the combination of network telemetry mechanisms [RFC9232]. These mechanisms range from legacy protocols like SNMP to the recent model-driven telemetry (MDT) based on the YANG language [RFC7950] and network management protocols such as NETCONF [RFC6241] or gNMI [gnmi].

MDT in particular has drawn the attention of the network industry due to the benefits of modeling configuration and status data of the network with a formal data modeling language like YANG. However, since the inception of YANG, the network industry has experienced the massive creation of YANG data models developed by vendors, standards developing organizations (e.g., IETF), and consortia (e.g., OpenConfig). In turn, these data models target different abstraction layers of the network, namely, network element, and network service [RFC8199]. Additionally, YANG data models may augment or deviate other models to respectively define new features or remove existing ones depending on the device implementation. In summary, this tendency has resulted into a wide variety of independent YANG data models, hence, the creation of data silos in the network.

Such amount and heterogeneity of YANG data models has hindered the collection and combination of network data for advanced network analytics. The current landscape shows different YANG models referencing the same concepts in a different way. For example, ietf-interface from the IETF and openconfig-interfaces from OpenConfig follow different structures and syntax, but both reference the same “interface” concept. On the other, YANG models conveying semantic relationships with other concepts via identifiers as shown in [RFC9418], where the leaf “device” hints a relationship between the “subservice “concept and the “device” concept.

module: ietf-service-assurance-device

  augment /sain:subservices/sain:subservice/sain:parameter:
    +--rw parameters
      +--rw device    string

The extraction of this hidden knowledge from YANG models would enable the integration of YANG data silos at a conceptual level, regardless of the physical implementation (i.e., the YANG schema, syntax, and encoding format). In this regard, the knowledge graph is getting traction as promising technology that can link data silos based on common concepts like “device” that are captured in ontologies. Besides, by transforming the YANG data into a graph structure the relationships between data silos are represented as first class citizens in the graph instead of “foreign keys” where the relationship is made implicit. In the following, this document provides guidelines for building a knowledge graph for data sources based on the YANG language.

2. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. Background

3.1. Knowledge Graphs

A knowledge graph contains a collection of facts alongside what know we about them and represents following a graph structure. Knowledge graphs enable a contextualized understanding of data as the data (i.e., the individuals, instance) travel with the meaning of the data themselves (i.e., the concepts, knowledge). For example, a knowledge graph can contain data about an interface “eth0”, but also, that an interface can be physical or virtual, belongs to a network device, and has a name, description, and an mtu.

To this end, knowledge graphs build upon on ontologies, which are explicit representations of conceptualizations in a specific domains. In other words, ontologies can be seen as representations of conceptual models following a formal logic that allows machines to understand and reason over them. In this regard, a conceptual model model, also known as information model, may translate into different data models depending on the data source technology [RFC3444].

By mapping the data models (i.e., physical level) with the concepts represented in ontologies (i.e., conceptual level), we can find heterogenous datasets scattered in the network that reference common concepts such as “interface” or “device”. Based on this semantic mapping, in addition to the flexibility of the graph structure, knowledge graphs enable the integration of heterogenous data based their semantics is what knowledge graphs can deliver.

3.2. Graph standards

The RDF data model from the W3C Semantic Web has been considered the standard graph data model given its maturity. For this reason, most of the knowledge graph implementation have relied upon the RDF standard and other standards from the Semantic Web like RDFS, OWL, SHACL, and SPARQL.

However, the late success of graph databases like Neo4j have proved the Labelled Property Graph (LPG) data model as an alternative for implementing knowledge graphs. Aiming to bridge the gap between these two graph data models, the W3C RDF-Star working group is working towards evolving RDF to facilitate the representation of statement about statements.

Similarly, the ETSI ISG CIM defined the NGSI-LD standard, which builds upon two novelties: i) NGSI-LD information model that derives from the LPG model and grounds on the RDF for a semantic annotation of the data in the graph; ii) the NGSI-LD API, which defines a REST API for building and interacting with the graph.

4. Knowledge Graph Construction

The construction of a knowledge graph can be divided into two main activities: ontology development and knowledge graph construction pipeline.

4.1. Ontology Development

Ontologies provide the formal representation of the conceptual models that capture the semantics of data, and building on this, the integration of data in the knowledge graph. Ontologies can be developed following different techniques, ranging from manual to fully automated, depending on the characteristics of the data to be integrated in the knowledge graph (e.g., format, schema).

4.1.1. Automatic knowledge extraction from YANG models

The extraction of knowledge from YANG models can be automated, in particular, by analyzing YANG identities to generate controlled vocabularies and taxonomies.

RFC 7950 defines a YANG identity as “globally unique, abstract, and untyped identity”, therefore, a relation between a YANG identity and a concept is straightforward. Additionally, YANG identities can inherit from other YANG identities via the “base” statement. These ideas align with the notion of a taxonomy, where concepts are hierarchically linked with other concepts.

To support the creation of knowledge structures like taxonomies or thesauri, the W3C standardized the Simple Knowledge Organization System (SKOS). In this ontology, a concept scheme comprises a set of concepts that can be linked with other concepts via hierarchical and associative relations. In the case of YANG, a YANG model containing YANG identities can be represented as an instance of the skos:ConceptScheme class. Next, all YANG identities included in the YANG model can be represented as skos:Concept instances that are contained in the concept scheme. Lastly, those YANG identities that include the “base” statement, the respective SKOS concept will include a relation skos:broader whose range is the SKOS concept representing the parent YANG identity.

TBD: Include an example here or in the annex

4.1.2. Standard development methodologies

Automating the extraction of all the knowledge from YANG models is not possible, and therefore, manual intervention from domain experts is required. To ease this process a recommended practice is to develop the ontology by following a standard methodology like Linked Open Terms (LOT).

LOT is an ontology development methodology that adopts best practices from agile software development. The methodology has been widely used in European projects as well as in the creation of the ETSI SAREF ontology and its extensions. Precisely, with SAREF Ontology ETSI tackled a similar problem in the scope of IoT, where there is a heterogeneous variety of standard data models and protocols. The methodology iterates over a workflow of the following four activities: i) ontology requirements specification; ii) ontology implementation; iii) ontology publication, and iv) ontology maintenance.

The workflow starts with the specification of requirements that the ontology must fulfill. For this the methodology proposes collecting knowledge from domain experts, but also by analyzing the data sources (e.g., network devices) and schemas for the data (e.g., YANG models) to be ingested and integrated in the knowledge graph. LOT recommends several approaches such as competency questions (CQs), natural language statements, or tabular information inspired by METHONTOLOGY.

TBD: Include sample requirements of network topology YANG model (RFC 8345).

4.2. Construction Pipeline

The construction of a knowledge graph is supported by a data pipeline that follows the archetypical Extract-Transform-Load (ETL), wherein the raw data is collected from the source, transformed, and finally, stored for consumption. In this sense, the knowledge graph creation can be split into multiple steps as depicted in Fig X.

+-----------+       +---------+       +-----------------+
|           |       |         |       |                 |
| Ingestion +------>| Mapping +------>| Materialization |
|           | Raw   |         | RDF   |                 |
+-----------+ data  +---------+ data  +--------+--------+
      ^      (YANG)                            |
 Raw  |                                        | RDF
 data |                                        | data
(YANG)|                                        |
      |                                        v
+-----+----+                             +-----------+
|   Data   |                             | Knowledge |
|  Source  |                             |   Graph   |
| (device) |                             +-----------+
+----------+

These steps are the following: ingestion, mapping, and materialization.

4.2.1. Ingestion

Represents the first step in the creation of the knowledge graph. This step is realized by means of collectors that ingest raw data from the selected data source. These collectors implement data access protocols which are specific to the technology and type of the data source. When it comes to network management protocols based on YANG, these protocols can be NETCONF [RFC6241], RESTCONF[RFC8040] and gNMI[gnmi].

Two main types of data sources are identified based on the techniques used to ingest the data, namely, batch and streaming. In the case of batch data sources data are pulled (once or periodically) from the data source. This could be represented by queries sent to a YANG-server like an SDN controller to fetch the network topology [RFC8345].

Regarding streaming data sources, the collector subscribes to the YANG-server to receives notifications of YANG data periodically or upon changes in the data source (e.g., a network device whose interface goes down). These subscriptions can be realized, either based on configurations or dynamically, using mechanisms like YANG Push[RFC8641]. But additionally, another common scenario is the use of message broker systems like Apache Kafka for decoupling the ingestion of streams of YANG data [I-D.netana-nmop-yang-message-broker-integration]. Hence, knowledge graph collectors could also support the ingestion of YANG data from these kinds of message brokers, as shown in Fig X.

   +------------------------------------------------------------+
   |                  Knowledge Graph Database                  |
   +------------------------------------------------------------+
                                  ^
                                  | (11) RDF data
                                  |
   +------------------------------------------------------------+
   |            Knowledge Graph Construction Pipeline           |
   +------------------------------------------------------------+
(9) Get  |  ^                                   ^ (8) Validate serialized Message
 Schema  |  |                                   | Against Schema on Consumer
         |  |                                   |
         |  |                                   |
         |  | (10) Issue                        | (7) Serialize YANG-Push Message
         v  | Schema             (5) Post       | annotated Schema ID
   +--------------------+          Schema  +--------------------+
   |       YANG         | <--------------  |  Data Collection   |
   |  Schema Registry   | -------------->  | YANG-Push Receiver |
   +--------------------+ (6) Issue        +--------------------+
                          Schema ID     (3) Get |  ^ (2) Receive YANG-Push
                                         Schema |  | Subscription Start Message
                                                |  |   ^
                                                |  |   |
                                                |  |   | (4) Publish YANG-Push
                                                v  |   | Message with Subscription ID
   +--------------------+                  +--------------------+
   |      Network       | (1) Subscribe    |   Network Node     |
   |   Orchestration    | ---------------> | YANG-Push Publisher|
   +--------------------+                  +--------------------+

TBD: Fig X (Integration of KG construcion pipeline with YANG-kafka pipeline)

4.2.2. Mapping

This second step receives the raw data data from the Ingestion step. Here, the raw data is mapped to the concepts capture in one or more ontologies. By applying these mapping rules, the raw data is semantically annotated and transformed into RDF data. These mappings can be declared using declarative languages like RDF Mapping Language (RML).

RML is a declarative language that is currently being standardized within the W3C KGC that allows for defining mappings rules for raw data encoded in semi-structured formats like XML or JSON. The benefits of using a declarative language like RML are twofold: i) the engine that implements the RML rules is generic, thus the mappings rules are decoupled from the code; ii) the explicit representation of mapping and transformation rules as part of the knowledge graph provides data lineage insights that can greatly improve data quality and the troubleshooting of data pipelines. RML is making progress towards becoming a standard, but support of additional YANG encoding formats like CBOR [RFC8949] or Protobuf remains a challenge.

4.2.3. Materialization

This is the final step of the knowledge graph creation. This step receives as an input the RDF data generated in the Mapping step. At this point, the RDF data can be sent to an RDF triple store like Apache Jena Fuseki [fuseki] for consumption via SPARQL. But alternatively, this step may transform the RDF data into an LPG structure and store the resulting data in a graph database like Neoj4 [neo4j]. Similarly, the RDF data could also be transformed into the ETSI NGSI-LD standard and stored in an NGSI-LD Context Broker.

5. Knowledge Graph Applications

6. Challenges

7. Security Considerations

8. IANA Considerations

This document has no IANA actions.

9. Open Issues

10. References

10.1. Normative References

[I-D.havel-nmop-digital-map]
Havel, O., Claise, B., de Dios, O. G., Elhassany, A., Graf, T., and M. Boucadair, "Modeling the Digital Map based on RFC 8345: Sharing Experience and Perspectives", Work in Progress, Internet-Draft, draft-havel-nmop-digital-map-00, , <https://datatracker.ietf.org/doc/html/draft-havel-nmop-digital-map-00>.
[I-D.ietf-ivy-network-inventory-yang]
Yu, C., Belotti, S., Bouquier, J., Peruzzini, F., and P. Bedard, "A YANG Data Model for Network Inventory", Work in Progress, Internet-Draft, draft-ietf-ivy-network-inventory-yang-01, , <https://datatracker.ietf.org/doc/html/draft-ietf-ivy-network-inventory-yang-01>.
[I-D.ietf-opsawg-collected-data-manifest]
Claise, B., Quilbeuf, J., Lopez, D., Martinez-Casanueva, I. D., and T. Graf, "A Data Manifest for Contextualized Telemetry Data", Work in Progress, Internet-Draft, draft-ietf-opsawg-collected-data-manifest-03, , <https://datatracker.ietf.org/doc/html/draft-ietf-opsawg-collected-data-manifest-03>.
[I-D.irtf-nmrg-network-digital-twin-arch]
Zhou, C., Yang, H., Duan, X., Lopez, D., Pastor, A., Wu, Q., Boucadair, M., and C. Jacquenet, "Network Digital Twin: Concepts and Reference Architecture", Work in Progress, Internet-Draft, draft-irtf-nmrg-network-digital-twin-arch-05, , <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-network-digital-twin-arch-05>.
[I-D.lincla-netconf-yang-library-augmentation]
Lin, Z., Claise, B., and I. D. Martinez-Casanueva, "Augmented-by Addition into the IETF-YANG-Library", Work in Progress, Internet-Draft, draft-lincla-netconf-yang-library-augmentation-01, , <https://datatracker.ietf.org/doc/html/draft-lincla-netconf-yang-library-augmentation-01>.
[I-D.lopez-opsawg-yang-provenance]
Lopez, D., Pastor, A., Feng, A. H., and H. Birkholz, "Applying COSE Signatures for YANG Data Provenance", Work in Progress, Internet-Draft, draft-lopez-opsawg-yang-provenance-02, , <https://datatracker.ietf.org/doc/html/draft-lopez-opsawg-yang-provenance-02>.
[I-D.netana-nmop-yang-message-broker-integration]
Graf, T. and A. Elhassany, "An Architecture for YANG-Push to Message Broker Integration", Work in Progress, Internet-Draft, draft-netana-nmop-yang-message-broker-integration-00, , <https://datatracker.ietf.org/doc/html/draft-netana-nmop-yang-message-broker-integration-00>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC3444]
Pras, A. and J. Schoenwaelder, "On the Difference between Information Models and Data Models", RFC 3444, DOI 10.17487/RFC3444, , <https://www.rfc-editor.org/rfc/rfc3444>.
[RFC6241]
Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, , <https://www.rfc-editor.org/rfc/rfc6241>.
[RFC7950]
Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", RFC 7950, DOI 10.17487/RFC7950, , <https://www.rfc-editor.org/rfc/rfc7950>.
[RFC7952]
Lhotka, L., "Defining and Using Metadata with YANG", RFC 7952, DOI 10.17487/RFC7952, , <https://www.rfc-editor.org/rfc/rfc7952>.
[RFC8040]
Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF Protocol", RFC 8040, DOI 10.17487/RFC8040, , <https://www.rfc-editor.org/rfc/rfc8040>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC8199]
Bogdanovic, D., Claise, B., and C. Moberg, "YANG Module Classification", RFC 8199, DOI 10.17487/RFC8199, , <https://www.rfc-editor.org/rfc/rfc8199>.
[RFC8345]
Clemm, A., Medved, J., Varga, R., Bahadur, N., Ananthakrishnan, H., and X. Liu, "A YANG Data Model for Network Topologies", RFC 8345, DOI 10.17487/RFC8345, , <https://www.rfc-editor.org/rfc/rfc8345>.
[RFC8641]
Clemm, A. and E. Voit, "Subscription to YANG Notifications for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, , <https://www.rfc-editor.org/rfc/rfc8641>.
[RFC8949]
Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, , <https://www.rfc-editor.org/rfc/rfc8949>.
[RFC9232]
Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, , <https://www.rfc-editor.org/rfc/rfc9232>.
[RFC9417]
Claise, B., Quilbeuf, J., Lopez, D., Voyer, D., and T. Arumugam, "Service Assurance for Intent-Based Networking Architecture", RFC 9417, DOI 10.17487/RFC9417, , <https://www.rfc-editor.org/rfc/rfc9417>.
[RFC9418]
Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. Arumugam, "A YANG Data Model for Service Assurance", RFC 9418, DOI 10.17487/RFC9418, , <https://www.rfc-editor.org/rfc/rfc9418>.

10.2. Informative References

[csvw]
"CSVW - CSV on the Web", n.d., <https://csvw.org>.
[fuseki]
"Apache Jena Fuseki", n.d., <https://jena.apache.org/documentation/fuseki2/>.
[gnmi]
"gRPC Network Management Interface (gNMI)", n.d., <https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-specification.md>.
[jsonld]
"JSON-LD - JSON for Linking Data", n.d., <https://json-ld.org>.
[neo4j]
"rdflib-neo4j - RDFLib Store backed by neo4j!", n.d., <https://github.com/neo4j-labs/rdflib-neo4j>.

Acknowledgments

This document is based on work partially funded by the EU Horizon Europe projects aerOS (grant 101069732) and ROBUST-6G (grant 101139068).

Author's Address

Ignacio Dominguez Martinez-Casanueva
Telefonica