This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.

Document: draft-ietf-mboned-multicast-telemetry
Reviewer: Bernard Aboba
Result: Ready with Nits

This document explains the deficiencies in existing OAM techniques and lays out proposals to address them.  However, it doesn't provide much detail on some of the reasoning.  Also in places, the wording could use improvement. 

Comments
----

1.  Introduction

   Multicast has many use cases.  For example, it can be used by
   residential broadband customers across operator networks, private
   MPLS customers, and internal customers within corporate intranet.

[BA] s/intranet/intranets/

   Multicast provides real time interactive online meetings or podcasts,

[BA] Use of multicast for conferencing is rare nowadays. Do you really
   want to include this?

   IPTV, and financial markets real-time data, which all have a reliance
   on UDP's unreliable transport.  End-to-end QOS, therefore, should be
   a critical component of multicast deployment in order to provide a
   good end user experience.  In multicast real-time media streaming,
   loss of a single packet containing a reference frame can result in
   the inability of thousands of receivers to decode a whole sequence of
   packets called Group-of-Picture, introducing black picture for
   periods of a few seconds.  

   Unexpected long delay in propagation of a
   packet in such real-time media streaming may equally result in the
   packet not being received and create the same results.  Multicast
   packet drops and delay can therefore severely affect the application
   performance and user experience.

[BA] Suggest: 

   "In multicast real-time media streaming, if a single packet is lost
   within a keyframe and cannot be recovered using forward
   error correction, this can result in many receivers being unable
   to decode subsequent frames within the Group of Pictures (GoP), resulting
   in video freezes or black pictures until another keyframe is
   delivered.

   Unexpectedly long delays in delivery of packets can result in
   timeouts within similar results. Multicast packet loss and 
   delays can therefore affect application performance and the
   user experience."
   

   It is important to monitor the performance of the multicast traffic.
   New on-path telemetry techniques such as In-situ OAM (IOAM)
   [RFC9197], IOAM Direct Export (DEX) [RFC9326] IOAM Marking-based
   Postcard (PBT-M) [I-D.song-ippm-postcard-based-telemetry], and Hybrid
   Two-Step (HTS) [I-D.ietf-ippm-hybrid-two-step] are useful and
   complementary to the existing active OAM performance monitoring
   methods (e.g., ICMP ping [RFC0792]), 

   provide promising means to
   directly monitor the network experience of multicast traffic.
   However, multicast traffic has some unique characteristics which pose
   some challenges on applying such techniques in an efficient way.

Suggest: 
   "providing a way to monitor multicast performance. However, multicast
   has unique characteristics that make the efficient application
   of these techniques challenging." 


2.  Requirements for Multicast Traffic Telemetry

   Multicast traffic is forwarded through a multicast tree.  With PIM
   and P2MP, the forwarding tree is established and maintained by the
   multicast routing protocol.  With BIER, no state is created in the
   network to establish a forwarding tree; instead, a bier header
   provides the necessary information for each packet to know the egress
   points.  Multicast packets are only replicated at each tree branch
   fork node for efficiency.

   There are several requirements for multicast traffic telemetry, a few
   of which are:

   *  Reconstruct and visualize the multicast tree through data plane
      monitoring.

   *  Gather the multicast packet delay and jitter performance on each
      path.

   *  Find the multicast packet drop location and reason.

   *  Gather the VPN state and tunnel information in case of P2MP
      multicast.

   In order to meet these requirements, we need the ability to directly
   monitor the multicast traffic and derive data from the multicast
   packets.  The conventional OAM mechanisms, such as multicast ping
   [RFC6450] and trace [RFC8487], are not sufficient to meet these
   requirements.

{BA] Can you provide more detail on why existing mechanisms are not sufficient? 
When conventional mechanisms are combined with RTCP, it seems like the first 
three requirements are covered.

3.  Issues of Existing Techniques

   On-path Telemetry techniques that directly retrieve data from
   multicast traffic's live network experience are ideal for addressing
   the aforementioned requirements.  The representative techniques
   include In-situ OAM (IOAM) Trace option [RFC9197], IOAM Direct Export
   (DEX) option [RFC9326], and PBT-M
   [I-D.song-ippm-postcard-based-telemetry].  However, unlike unicast,
   multicast poses some unique challenges to applying these techniques.

   Multicast packets are replicated at each branch fork node in the
   corresponding multicast tree.  Therefore, there are multiple copies
   of the original multicast packet in the network.

   If the IOAM trace option is used for on-path data collection, the
   partial trace data will also be replicated into the packet copy for
   each branch.  The end result is that, at the multicast tree leaves,
   each copy of the multicast packet has a complete trace.  Most of the
   data (except data from the last leaf branch) appear in multiple
   copies while only one copy is sufficient.  Data redundancy introduces
   unnecessary header overhead, wastes network bandwidth, and
   complicates the data processing.  The larger the multicast tree, or
   the longer the multicast path, the more severe the redundancy problem
   becomes.

   The postcard-based solutions (e.g., IOAM DEX), can be used to
   eliminate such data redundancy, because each node on the tree only
   sends a postcard covering local data.  However, they cannot track and
   correlate the tree branches properly due to the lack of branching
   information, so they can bring confusion about the multicast tree
   topology.  For example, in a multicast tree, Node A has two branches,
   one to Node B and the other to node C; further, Node B leads to Node
   D and Node C leads to Node E.  When applying postcard-based methods,
   one cannot tell whether or not Node D(E) is the next hop of Node B(C)
   from the received postcards alone, unless one correlates the
   exporting nodes with knowledge about the tree collected by other
   means (e.g., mtrace).  Such correlation is undesirable because it
   introduces extra work and complexity.

   The fundamental reason for this problem is that there is not an
   identifier (either implicit or explicit) to correlate the data on
   each branch.

[BA] Can't the IP address be used as an identifier?  Does the proposed
solution address this issue in a simpler way? Note that "extra work"
(e.g. new software) that is implemented outside network devices has
an advantage over new protocols which can conceivably impact network
device footprint and reliability (due to bugs). So the where and how
matters with respect to "extra work and complexity".