This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. Document: draft-ietf-mboned-multicast-telemetry Reviewer: Bernard Aboba Result: Ready with Nits This document explains the deficiencies in existing OAM techniques and lays out proposals to address them. However, it doesn't provide much detail on some of the reasoning. Also in places, the wording could use improvement. Comments ---- 1. Introduction Multicast has many use cases. For example, it can be used by residential broadband customers across operator networks, private MPLS customers, and internal customers within corporate intranet. [BA] s/intranet/intranets/ Multicast provides real time interactive online meetings or podcasts, [BA] Use of multicast for conferencing is rare nowadays. Do you really want to include this? IPTV, and financial markets real-time data, which all have a reliance on UDP's unreliable transport. End-to-end QOS, therefore, should be a critical component of multicast deployment in order to provide a good end user experience. In multicast real-time media streaming, loss of a single packet containing a reference frame can result in the inability of thousands of receivers to decode a whole sequence of packets called Group-of-Picture, introducing black picture for periods of a few seconds. Unexpected long delay in propagation of a packet in such real-time media streaming may equally result in the packet not being received and create the same results. Multicast packet drops and delay can therefore severely affect the application performance and user experience. [BA] Suggest: "In multicast real-time media streaming, if a single packet is lost within a keyframe and cannot be recovered using forward error correction, this can result in many receivers being unable to decode subsequent frames within the Group of Pictures (GoP), resulting in video freezes or black pictures until another keyframe is delivered. Unexpectedly long delays in delivery of packets can result in timeouts within similar results. Multicast packet loss and delays can therefore affect application performance and the user experience." It is important to monitor the performance of the multicast traffic. New on-path telemetry techniques such as In-situ OAM (IOAM) [RFC9197], IOAM Direct Export (DEX) [RFC9326] IOAM Marking-based Postcard (PBT-M) [I-D.song-ippm-postcard-based-telemetry], and Hybrid Two-Step (HTS) [I-D.ietf-ippm-hybrid-two-step] are useful and complementary to the existing active OAM performance monitoring methods (e.g., ICMP ping [RFC0792]), provide promising means to directly monitor the network experience of multicast traffic. However, multicast traffic has some unique characteristics which pose some challenges on applying such techniques in an efficient way. Suggest: "providing a way to monitor multicast performance. However, multicast has unique characteristics that make the efficient application of these techniques challenging." 2. Requirements for Multicast Traffic Telemetry Multicast traffic is forwarded through a multicast tree. With PIM and P2MP, the forwarding tree is established and maintained by the multicast routing protocol. With BIER, no state is created in the network to establish a forwarding tree; instead, a bier header provides the necessary information for each packet to know the egress points. Multicast packets are only replicated at each tree branch fork node for efficiency. There are several requirements for multicast traffic telemetry, a few of which are: * Reconstruct and visualize the multicast tree through data plane monitoring. * Gather the multicast packet delay and jitter performance on each path. * Find the multicast packet drop location and reason. * Gather the VPN state and tunnel information in case of P2MP multicast. In order to meet these requirements, we need the ability to directly monitor the multicast traffic and derive data from the multicast packets. The conventional OAM mechanisms, such as multicast ping [RFC6450] and trace [RFC8487], are not sufficient to meet these requirements. {BA] Can you provide more detail on why existing mechanisms are not sufficient? When conventional mechanisms are combined with RTCP, it seems like the first three requirements are covered. 3. Issues of Existing Techniques On-path Telemetry techniques that directly retrieve data from multicast traffic's live network experience are ideal for addressing the aforementioned requirements. The representative techniques include In-situ OAM (IOAM) Trace option [RFC9197], IOAM Direct Export (DEX) option [RFC9326], and PBT-M [I-D.song-ippm-postcard-based-telemetry]. However, unlike unicast, multicast poses some unique challenges to applying these techniques. Multicast packets are replicated at each branch fork node in the corresponding multicast tree. Therefore, there are multiple copies of the original multicast packet in the network. If the IOAM trace option is used for on-path data collection, the partial trace data will also be replicated into the packet copy for each branch. The end result is that, at the multicast tree leaves, each copy of the multicast packet has a complete trace. Most of the data (except data from the last leaf branch) appear in multiple copies while only one copy is sufficient. Data redundancy introduces unnecessary header overhead, wastes network bandwidth, and complicates the data processing. The larger the multicast tree, or the longer the multicast path, the more severe the redundancy problem becomes. The postcard-based solutions (e.g., IOAM DEX), can be used to eliminate such data redundancy, because each node on the tree only sends a postcard covering local data. However, they cannot track and correlate the tree branches properly due to the lack of branching information, so they can bring confusion about the multicast tree topology. For example, in a multicast tree, Node A has two branches, one to Node B and the other to node C; further, Node B leads to Node D and Node C leads to Node E. When applying postcard-based methods, one cannot tell whether or not Node D(E) is the next hop of Node B(C) from the received postcards alone, unless one correlates the exporting nodes with knowledge about the tree collected by other means (e.g., mtrace). Such correlation is undesirable because it introduces extra work and complexity. The fundamental reason for this problem is that there is not an identifier (either implicit or explicit) to correlate the data on each branch. [BA] Can't the IP address be used as an identifier? Does the proposed solution address this issue in a simpler way? Note that "extra work" (e.g. new software) that is implemented outside network devices has an advantage over new protocols which can conceivably impact network device footprint and reliability (due to bugs). So the where and how matters with respect to "extra work and complexity".