GENWiki

Premier IT Outsourcing and Support Services within the UK

User Tools

Site Tools


rfc:rfc5404

Network Working Group M. Westerlund Request for Comments: 5404 I. Johansson Category: Standards Track Ericsson AB

                                                          January 2009
                    RTP Payload Format for G.719

Status of This Memo

 This document specifies an Internet standards track protocol for the
 Internet community, and requests discussion and suggestions for
 improvements.  Please refer to the current edition of the "Internet
 Official Protocol Standards" (STD 1) for the standardization state
 and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

 Copyright (c) 2008 IETF Trust and the persons identified as the
 document authors.  All rights reserved.
 This document is subject to BCP 78 and the IETF Trust's Legal
 Provisions Relating to IETF Documents (http://trustee.ietf.org/
 license-info) in effect on the date of publication of this document.
 Please review these documents carefully, as they describe your rights
 and restrictions with respect to this document.

Abstract

 This document specifies the payload format for packetization of the
 G.719 full-band codec encoded audio signals into the Real-time
 Transport Protocol (RTP).  The payload format supports transmission
 of multiple channels, multiple frames per payload, and interleaving.

Westerlund & Johansson Standards Track [Page 1] RFC 5404 RTP Payload Format for G.719 January 2009

Table of Contents

 1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
 2.  Definitions and Conventions  . . . . . . . . . . . . . . . . .  3
 3.  G.719 Description  . . . . . . . . . . . . . . . . . . . . . .  3
 4.  Payload Format Capabilities  . . . . . . . . . . . . . . . . .  4
   4.1.  Multi-Rate Encoding and Rate Adaptation  . . . . . . . . .  4
   4.2.  Support for Multi-Channel Sessions . . . . . . . . . . . .  5
   4.3.  Robustness against Packet Loss . . . . . . . . . . . . . .  5
     4.3.1.  Use of Forward Error Correction (FEC)  . . . . . . . .  5
     4.3.2.  Use of Frame Interleaving  . . . . . . . . . . . . . .  6
 5.  Payload Format . . . . . . . . . . . . . . . . . . . . . . . .  7
   5.1.  RTP Header Usage . . . . . . . . . . . . . . . . . . . . .  8
   5.2.  Payload Structure  . . . . . . . . . . . . . . . . . . . .  8
     5.2.1.  Basic ToC Element  . . . . . . . . . . . . . . . . . .  9
   5.3.  Basic Mode . . . . . . . . . . . . . . . . . . . . . . . . 10
   5.4.  Interleaved Mode . . . . . . . . . . . . . . . . . . . . . 10
   5.5.  Audio Data . . . . . . . . . . . . . . . . . . . . . . . . 11
   5.6.  Implementation Considerations  . . . . . . . . . . . . . . 12
     5.6.1.  Receiving Redundant Frames . . . . . . . . . . . . . . 12
     5.6.2.  Interleaving . . . . . . . . . . . . . . . . . . . . . 12
     5.6.3.  Decoding Validation  . . . . . . . . . . . . . . . . . 13
 6.  Payload Examples . . . . . . . . . . . . . . . . . . . . . . . 13
   6.1.  3 Mono Frames with 2 Different Bitrates  . . . . . . . . . 13
   6.2.  2 Stereo Frame-Blocks of the Same Bitrate  . . . . . . . . 14
   6.3.  4 Mono Frames Interleaved  . . . . . . . . . . . . . . . . 15
 7.  Payload Format Parameters  . . . . . . . . . . . . . . . . . . 16
   7.1.  Media Type Definition  . . . . . . . . . . . . . . . . . . 16
   7.2.  Mapping to SDP . . . . . . . . . . . . . . . . . . . . . . 19
     7.2.1.  Offer/Answer Considerations  . . . . . . . . . . . . . 19
     7.2.2.  Declarative SDP Considerations . . . . . . . . . . . . 22
 8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 23
 9.  Congestion Control . . . . . . . . . . . . . . . . . . . . . . 23
 10. Security Considerations  . . . . . . . . . . . . . . . . . . . 24
 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25
 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
   12.1. Normative References . . . . . . . . . . . . . . . . . . . 25
   12.2. Informative References . . . . . . . . . . . . . . . . . . 26

Westerlund & Johansson Standards Track [Page 2] RFC 5404 RTP Payload Format for G.719 January 2009

1. Introduction

 This document specifies the payload format for packetization of the
 G.719 full-band (FB) codec encoded audio signals into the Real-time
 Transport Protocol (RTP) [RFC3550].  The payload format supports
 transmission of multiple channels, multiple frames per payload, and
 packet loss robustness methods using redundancy or interleaving.
 This document starts with conventions, a brief description of the
 codec, and the payload format's capabilities.  The payload format is
 specified in Section 5.  Examples can be found in Section 6.  The
 media type and its mappings to the Session Description Protocol (SDP)
 and usage in SDP offer/answer are then specified.  The document ends
 with considerations regarding congestion control and security.

2. Definitions and Conventions

 The term "frame-block" is used in this document to describe the time-
 synchronized set of audio frames in a multi-channel audio session.
 In particular, in an N-channel session, a frame-block will contain N
 audio frames, one from each of the channels, and all N speech frames
 represent exactly the same time period.
 This document contains depictions of bit fields.  The most
 significant bit is always leftmost in the figure on each row and has
 the lowest enumeration.  For fields that are depicted over multiple
 rows, the upper row is more significant than the next.
 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
 document are to be interpreted as described in RFC 2119 [RFC2119].

3. G.719 Description

 The ITU-T G.719 full-band codec is a transform coder based on
 Modulated Lapped Transform (MLT).  G.719 is a low-complexity full-
 bandwidth codec for conversational speech and audio coding.  The
 encoder input and decoder output are sampled at 48 kHz.  The codec
 enables full-bandwidth from 20 Hz to 20 kHz, encoding of speech,
 music, and general audio content at rates from 32 kbit/s up to 128
 kbit/s.  The codec operates on 20-ms frames and has an algorithmic
 delay of 40 ms.
 The codec provides excellent quality for speech, music, and other
 types of audio.  Some of the applications for which this coder is
 suitable are:

Westerlund & Johansson Standards Track [Page 3] RFC 5404 RTP Payload Format for G.719 January 2009

 o  Real-time communications such as video conferencing and telephony
 o  Streaming audio
 o  Archival and messaging
 The encoding and decoding algorithm can change the bitrate at any
 20-ms frame boundary.  The encoder receives the audio sampled at 48
 kHz.  The support of other sampling rates is possible by re-sampling
 the input signal to the codec's sampling rate, i.e., 48 kHz; however,
 this functionality is not part of the standard.
 The encoding is performed on equally sized frames.  For each frame,
 the encoder decides between two encoding modes, a transient mode and
 a stationary mode.  The decision is based on statistics derived from
 the input signal.  The stationary mode uses a long MLT that leads to
 a spectrum of 960 coefficients, while the transient encoding mode
 uses a short MLT (higher time resolution transform) that results in 4
 spectra (4 x 240 = 960 coefficients).  The encoding of the spectrum
 is done in two steps.  First, the spectral envelope is computed,
 quantized, and Huffman encoded.  The envelope is computed on a non-
 uniform frequency subdivision.  From the coded spectral envelope, a
 weighted spectral envelope is derived and is used for bit allocation;
 this process is also repeated at the decoder.  Thus, only the
 spectral envelope is transmitted.  The output of the bit allocation
 is used in order to quantize the spectra.  In addition, for
 stationary frames, the encoder estimates the amount of noise level.
 The decoder applies the reverse operation upon reception of the bit
 stream.  The non-coded coefficients (i.e., no bits allocated) are
 replaced by entries of a noise codebook that is built based on the
 decoded coefficients.

4. Payload Format Capabilities

 This payload format has a number of capabilities, and this section
 discusses them in some detail.

4.1. Multi-Rate Encoding and Rate Adaptation

 G.719 supports a multi-rate encoding capability that enables on a
 per-frame basis variation of the encoding rate.  This enables support
 for bitrate adaptation and congestion control.  The possibility to
 aggregate multiple audio frames into a single RTP payload is another
 dimension of adaptation.  The RTP and payload format overhead can
 thus be reduced by the aggregation at the cost of increased delay and
 reduced packet-loss robustness.

Westerlund & Johansson Standards Track [Page 4] RFC 5404 RTP Payload Format for G.719 January 2009

4.2. Support for Multi-Channel Sessions

 The RTP payload format defined in this document supports multi-
 channel audio content (e.g., stereophonic or surround audio
 sessions).  Although the G.719 codec itself does not support encoding
 of multi-channel audio content into a single bit stream, it can be
 used to separately encode and decode each of the individual channels.
 To transport (or store) the separately encoded multi-channel content,
 the audio frames for all channels that are framed and encoded for the
 same 20-ms period are logically collected in a "frame-block".
 At the session setup, out-of-band signaling must be used to indicate
 the number of channels in the payload type.  The order of the audio
 frames within the frame-block depends on the number of the channels
 and follows the definition in Section 4.1 of the RTP/AVP profile
 [RFC3551].  When using SDP for signaling, the number of channels is
 specified in the rtpmap attribute.

4.3. Robustness against Packet Loss

 The payload format supports several means, including forward error
 correction (FEC) and frame interleaving, to increase robustness
 against packet loss.

4.3.1. Use of Forward Error Correction (FEC)

 Generic forward error correction within RTP is defined, for example,
 in RFC 5109 [RFC5109].  Audio redundancy coding is defined in RFC
 2198 [RFC2198].  Either scheme can be used to add redundant
 information to the RTP packet stream and make it more resilient to
 packet losses, at the expense of a higher bitrate.  Please see either
 of the RFCs for a discussion of the implications of the higher
 bitrate to network congestion.
 In addition to these media-unaware mechanisms, this memo specifies a
 G.719-specific form of audio redundancy coding, which may be
 beneficial in terms of packetization overhead.  Conceptually,
 previously transmitted transport frames are aggregated together with
 new ones.  A sliding window can be used to group the frames to be
 sent in each payload.  However, irregular or non-consecutive patterns
 are also possible by inserting NO_DATA frames between primary and
 redundant transmissions.  Figure 1 below shows an example.

Westerlund & Johansson Standards Track [Page 5] RFC 5404 RTP Payload Format for G.719 January 2009

  1. -+——–+——–+——–+——–+——–+——–+——–+–

| f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) |

  1. -+——–+——–+——–+——–+——–+——–+——–+–
    <---- p(n-1) ---->
             <----- p(n) ----->
                      <---- p(n+1) ---->
                               <---- p(n+2) ---->
                                        <---- p(n+3) ---->
                                                 <---- p(n+4) ---->
            Figure 1: An example of redundant transmission
 Here, each frame is retransmitted once in the following RTP payload
 packet. f(n-2)...f(n+4) denote a sequence of audio frames, and
 p(n-1)...p(n+4) a sequence of payload packets.
 The mechanism described does not really require signaling at the
 session setup.  However, signaling has been defined to allow for the
 sender to voluntarily bind the buffering and delay requirements.  If
 nothing is signaled, the use of this mechanism is allowed and
 unbounded.  For a certain timestamp, the receiver may receive
 multiple copies of a frame containing encoded audio data, even at
 different encoding rates.  The cost of this scheme is bandwidth and
 the receiver delay necessary to allow the redundant copy to arrive.
 This redundancy scheme provides a functionality similar to the one
 described in RFC 2198, but it works only if both original frames and
 redundant representations are G.719 frames.  When the use of other
 media coding schemes is desirable, one has to resort to RFC 2198.
 The sender is responsible for selecting an appropriate amount of
 redundancy based on feedback about the channel conditions, e.g., in
 the RTP Control Protocol (RTCP) [RFC3550] receiver reports.  The
 sender is also responsible for avoiding congestion, which may be
 exacerbated by redundancy (see Section 9 for more details).

4.3.2. Use of Frame Interleaving

 To decrease protocol overhead, the payload design allows several
 audio transport frames to be encapsulated into a single RTP packet.
 One of the drawbacks of such an approach is that in the case of
 packet loss, several consecutive frames are lost.  Consecutive frame
 loss normally renders error concealment less efficient and usually
 causes clearly audible and annoying distortions in the reconstructed
 audio.  Interleaving of transport frames can improve the audio
 quality in such cases by distributing the consecutive losses into a
 number of isolated frame losses, which are easier to conceal.

Westerlund & Johansson Standards Track [Page 6] RFC 5404 RTP Payload Format for G.719 January 2009

 However, interleaving and bundling several frames per payload also
 increases end-to-end delay and sets higher buffering requirements.
 Therefore, interleaving is not appropriate for all use cases or
 devices.  Streaming applications should most likely be able to
 exploit interleaving to improve audio quality in lossy transmission
 conditions.
 Note that this payload design supports the use of frame interleaving
 as an option.  The usage of this feature needs to be negotiated in
 the session setup.
 The interleaving supported by this format is rather flexible.  For
 example, a continuous pattern can be defined, as depicted in
 Figure 2.
  1. -+——–+——–+——–+——–+——–+——–+——–+–

| f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) |

  1. -+——–+——–+——–+——–+——–+——–+——–+–
            [ p(n)   ]
   [ p(n+1) ]                 [ p(n+1) ]
                     [ p(n+2) ]                 [ p(n+2) ]
                                       [ p(n+3) ]
                                                         [ p(n+4) ]
 Figure 2: An example of interleaving pattern that has constant delay
 In Figure 2, the consecutive frames, denoted f(n-2) to f(n+4), are
 aggregated into packets p(n) to p(n+4), each packet carrying two
 frames.  This approach provides an interleaving pattern that allows
 for constant delay in both the interleaving and de-interleaving
 processes.  The de-interleaving buffer needs to have room for at
 least three frames, including the one that is ready to be consumed.
 The storage space for three frames is needed, for example, when f(n)
 is the next frame to be decoded: since frame f(n) was received in
 packet p(n+2), which also carried frame f(n+3), both these frames are
 stored in the buffer.  Furthermore, frame f(n+1) received in the
 previous packet, p(n+1), is also in the de-interleaving buffer.  Note
 also that in this example the buffer occupancy varies: when frame
 f(n+1) is the next one to be decoded, there are only two frames,
 f(n+1) and f(n+3), in the buffer.

5. Payload Format

 The main purpose of the payload design for G.719 is to maximize the
 potential of the codec to its fullest degree with as minimal overhead
 as possible.  In the design, both basic and interleaved modes have

Westerlund & Johansson Standards Track [Page 7] RFC 5404 RTP Payload Format for G.719 January 2009

 been included, as the codec is suitable both for conversational and
 other low-delay applications as well as streaming, where more delay
 is acceptable.
 The main structural difference between the basic and interleaved
 modes is the extension of the table of contents entries with frame
 displacement fields in the interleaved mode.  The basic mode supports
 aggregation of multiple consecutive frames in a payload.  The
 interleaved mode supports aggregation of multiple frames that are
 non-consecutive in time.  In both modes, it is possible to have
 frames encoded with different frame types in the same payload.
 The payload format also supports the usage of G.719 for carrying
 multi-channel content using one discrete encoder per channel all
 using the same bitrate.  In this case, a complete frame-block with
 data from all channels is included in the RTP payload.  The data is
 the concatenation of all the encoded audio frames in the order
 specified for that number of included channels.  Also, interleaving
 is done on complete frame-blocks rather than on individual audio
 frames.

5.1. RTP Header Usage

 The RTP timestamp corresponds to the sampling instant of the first
 sample encoded for the first frame-block in the packet.  The
 timestamp clock frequency SHALL be 48000 Hz.  The timestamp is also
 used to recover the correct decoding order of the frame-blocks.
 The RTP header marker bit (M) SHALL be set to 1 whenever the first
 frame-block carried in the packet is the first frame-block in a
 talkspurt (see definition of the talkspurt in Section 4.1 of
 [RFC3551]).  For all other packets, the marker bit SHALL be set to
 zero (M=0).
 The assignment of an RTP payload type for the format defined in this
 memo is outside the scope of this document.  The RTP profiles in use
 currently mandate binding the payload type dynamically for this
 payload format.  This is basically necessary because the payload type
 expresses the configuration of the payload itself, i.e., basic or
 interleaved mode, and the number of channels carried.
 The remaining RTP header fields are used as specified in [RFC3550].

5.2. Payload Structure

 The payload consists of one or more table of contents (ToC) entries
 followed by the audio data corresponding to the ToC entries.  The
 following sections describe both the basic mode and the interleaved

Westerlund & Johansson Standards Track [Page 8] RFC 5404 RTP Payload Format for G.719 January 2009

 mode.  Each ToC entry MUST be padded to a byte boundary to ensure
 octet alignment.  The rules regarding maximum payload size given in
 Section 3.2 of [RFC5405] SHOULD be followed.

5.2.1. Basic ToC Element

 All the different formats and modes in this document use a common
 basic ToC that may be extended in the different options described
 below.
  0 1 2 3 4 5 6 7
 +-+-+-+-+-+-+-+-+
 |F|    L    |R|R|
 +-+-+-+-+-+-+-+-+
                      Figure 3: Basic TOC element
 F (1 bit):  If set to 1, indicates that this ToC entry is followed by
    another ToC entry; if set to zero, indicates that this ToC entry
    is the last one in the ToC.
 L (5 bits):  A field that gives the frame length of each individual
    frame within the frame-block.
      L          length(bytes)
     ============================
      0           0 NO_DATA
      1-7         N/A (reserved)
      8-22        80+10*(L-8)
     23-27        240+20*(L-23)
     28-31        N/A (reserved)
              Figure 4: How to map L values to frame lengths
    L=0 (NO_DATA) is used to indicate an empty frame, which is useful
    if frames are missing (e.g., at re-packetization), or to insert
    gaps when sending redundant frames together with primary frames in
    the same payload.
    The value range [1..7] and [28..31] inclusive is reserved for
    future use in this document version; if these values occur in a
    ToC, the entire packet SHOULD be treated as invalid and discarded.
    A few examples are given below where the frame size and the
    corresponding codec bitrate is computed based on the value L.

Westerlund & Johansson Standards Track [Page 9] RFC 5404 RTP Payload Format for G.719 January 2009

       L    Bytes    Codec Bitrate(kbps)
     ===================================
       8      80        32
       9      90        36
      10     100        40
      12     120        48
      16     160        64
      22     220        88
      23     240        96
      25     280       112
      27     320       128
      Figure 5: Examples of L values and corresponding frame lengths
    This encoding yields a granularity of 4 kbps between 32 and 88
    kbps and a granularity of 8 kbps between 88 and 128 kbps with a
    defined range of 32-128 kbps for the codec data.
 R (2 bits):  Reserved bits.  SHALL be set to zero on sending and
    SHALL be ignored on reception.

5.3. Basic Mode

 The basic ToC element shown in Figure 3 is followed by a 1-octet
 field for the number of frame-blocks (#frames) to form the ToC entry.
 The frame-blocks field tells how many frame-blocks of the same length
 the ToC entry relates to.
  0 1 2 3 4 5 6 7
 +-+-+-+-+-+-+-+-+
 |    #frames    |
 +-+-+-+-+-+-+-+-+
                Figure 6: Number of frame-blocks field

5.4. Interleaved Mode

 The basic ToC is followed by a 1-octet field for the number of frame-
 blocks (#frames) and then the DIS fields to form a ToC entry in
 interleaved mode.  The frame-blocks field tells how many frame-blocks
 of the same length the ToC relates to.  The DIS fields, one for each
 frame-block indicated by the #frames field, express the interleaving
 distance between audio frames carried in the payload.  If necessary
 to achieve octet alignment, a 4-bit padding is added.

Westerlund & Johansson Standards Track [Page 10] RFC 5404 RTP Payload Format for G.719 January 2009

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |    #frames    | DIS1  |  ...  | DISi  |  ...  | DISn  | Padd  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          Figure 7: Number of frame-block + interleave fields
 DIS1...DISn (4 bits):  A list of n (n=#frames) displacement fields
    indicating the displacement of the i:th (i=1..n) audio frame-block
    relative to the preceding frame-block in the payload, in units of
    20-ms long audio frame-blocks).  The 4-bit unsigned integer
    displacement values may be between zero and 15 indicating the
    number of audio frame-blocks in decoding order between the
    (i-1):th and the i:th frame in the payload.  Note that for the
    first ToC entry of the payload, the value of DIS1 is meaningless.
    It SHALL be set to zero by a sender and SHALL be ignored by a
    receiver.  This frame-block's location in the decoding order is
    uniquely defined by the RTP timestamp.  Note that for subsequent
    ToC entries DIS1 indicates the number of frames between the last
    frame of the previous group and the first frame of this group.
 Padd (4 bits):  To ensure octet alignment, 4 padding bits SHALL be
    included at the end of the ToC entry in case there is an odd
    number of frame-blocks in the group referenced by this ToC entry.
    These bits SHALL be set to zero and SHALL be ignored by the
    receiver.  If a group containing an even number of frames is
    referenced by this ToC entry, these padding bits SHALL NOT be
    included in the payload.

5.5. Audio Data

 The audio data part follows the table of contents.  All the octets
 comprising an audio frame SHALL be appended to the payload as a unit.
 For each frame-block, the audio frames are concatenated in the order
 indicated by the table in Section 4.1 of [RFC3551] for the number of
 channels configured for the payload type in use.  So the first
 channel (leftmost) indicated comes first followed by the next
 channel.  The audio frame-blocks are packetized in increasing
 timestamp order within each group of frame-blocks (per ToC entry),
 i.e., oldest frame-block first.  The groups of frame-blocks are
 packetized in the same order as their corresponding ToC entries.
 The audio frames are specified in ITU recommendation [ITU-T-G719].
 The G.719 bit stream is split into a sequence of octets and
 transmitted in order from the leftmost (most significant (MSB)) bit
 to the rightmost (least significant (LSB)) bit.

Westerlund & Johansson Standards Track [Page 11] RFC 5404 RTP Payload Format for G.719 January 2009

5.6. Implementation Considerations

 An application implementing this payload format MUST understand all
 the payload parameters specified in this specification.  Any mapping
 of the parameters to a signaling protocol MUST support all
 parameters.  So an implementation of this payload format in an
 application using SDP is required to understand all the payload
 parameters in their SDP-mapped form.  This requirement ensures that
 an implementation always can decide whether it is capable of
 communicating when the communicating entities support this version of
 the specification.
 Basic mode SHALL be implemented and the interleaved mode SHOULD be
 implemented.  The implementation burden of both is rather small, and
 supporting both ensures interoperability.  However, interleaving is
 not mandated as it has limited applicability for conversational
 applications that require tight delay boundaries.

5.6.1. Receiving Redundant Frames

 The reception of redundant audio frames, i.e., more than one audio
 frame from the same source for the same time slot, MUST be supported
 by the implementation.  In the case that the receiver gets multiple
 audio frames in different bitrates for the same time slot, it is
 RECOMMENDED that the receiver keeps the one with the highest bitrate.

5.6.2. Interleaving

 The use of interleaving requires further considerations.  As
 presented in the example in Section 4.3.2, a given interleaving
 pattern requires a certain amount of the de-interleaving buffer.
 This buffer space, expressed in a number of transport frame slots, is
 indicated by the "interleaving" media type parameter.  The number of
 frame slots needed can be converted into actual memory requirements
 by considering the 320 bytes per frame used by the highest bitrate of
 G.719.
 The information about the frame buffer size is not always sufficient
 to determine when it is appropriate to start consuming frames from
 the interleaving buffer.  Additional information is needed when the
 interleaving pattern changes.  The "int-delay" media type parameter
 is defined to convey this information.  It allows a sender to
 indicate the minimal media time that needs to be present in the
 buffer before the decoder can start consuming frames from the buffer.
 Because the sender has full control over the interleaving pattern, it
 can calculate this value.  In certain cases (for example, if joining
 a multicast session with interleaving mid-session), a receiver may
 initially receive only part of the packets in the interleaving

Westerlund & Johansson Standards Track [Page 12] RFC 5404 RTP Payload Format for G.719 January 2009

 pattern.  This initial partial reception (in frame sequence order) of
 frames can yield too few frames for acceptable quality from the audio
 decoding.  This problem also arises when using encryption for access
 control, and the receiver does not have the previous key.  Although
 the G.719 is robust and thus tolerant to a high random frame erasure
 rate, it would have difficulties handling consecutive frame losses at
 startup.  Thus, some special implementation considerations are
 described.
 In order to handle this type of startup efficiently, decoding can
 start provided that:
 1.  There are at least two consecutive frames available.
 2.  More than or equal to half the frames are available in the time
     period from where decoding was planned to start and the most
     forward received decoding.
 After receiving a number of packets, in the worst case as many
 packets as the interleaving pattern covers, the previously described
 effects disappear and normal decoding is resumed.  Similar issues
 arise when a receiver leaves a session or has lost access to the
 stream.  If the receiver leaves the session, this would be a minor
 issue since playout is normally stopped.  The sender can avoid this
 type of problem in many sessions by starting and ending interleaving
 patterns correctly when risks of losses occur.  One such example is a
 key-change done for access control to encrypted streams.  If only
 some keys are provided to clients and there is a risk they will
 receive content for which they do not have the key, it is recommended
 that interleaving patterns do not overlap key changes.

5.6.3. Decoding Validation

 If the receiver finds a mismatch between the size of a received
 payload and the size indicated by the ToC of the payload, the
 receiver SHOULD discard the packet.  This is recommended because
 decoding a frame parsed from a payload based on erroneous ToC data
 could severely degrade the audio quality.

6. Payload Examples

 A few examples to highlight the payload format follow.

6.1. 3 Mono Frames with 2 Different Bitrates

 The first example is a payload consisting of 3 mono frames where the
 first 2 frames correspond to a bitrate of 32 kbps (80 bytes/frame)
 and the last is 48 kbps (120 bytes/frame).

Westerlund & Johansson Standards Track [Page 13] RFC 5404 RTP Payload Format for G.719 January 2009

    The first 32 bits are ToC fields.
    Bit 0 is '1' as another ToC field follows.
    Bits 1..5 are '01000' = 80 bytes/frame.
    Bits 8..15 are '00000010' = 2 frame-blocks with 80 bytes/frame.
    Bit 16 is '0', no more ToC follows.
    Bits 17..21 are '01100' = 120 bytes/frame.
    Bits 24..31 are '00000001' = 1 frame-block with 120 bytes/frame.
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0|0|0 1 1 0 0|0 0|0 0 0 0 0 0 0 1|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |d(0)   frame 1                                                 |
    .                                                               .
    |                                                         d(639)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |d(0)   frame 2                                                 |
    .                                                               .
    |                                                         d(639)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |d(0)   frame 3                                                 |
    .                                                               .
    |                                                         d(959)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

6.2. 2 Stereo Frame-Blocks of the Same Bitrate

 The second example is a payload consisting of 2 stereo frames that
 correspond to a bitrate of 32 kbps (80 bytes/frame) per channel.  The
 receiver calculates the number of frames in the audio block by
 multiplying the value of the "channels" parameter (2) with the
 #frames field value (2) to derive that there are 4 audio frames in
 the payload.
    The first 16 bits is the ToC field.
    Bit 0 is '0' as no ToC field follows.
    Bits 1..5 are '01000' = 80 bytes/frame.
    Bits 8..15 are '00000010' = 2 frame-blocks with 80 bytes/frame.

Westerlund & Johansson Standards Track [Page 14] RFC 5404 RTP Payload Format for G.719 January 2009

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0|0 1 0 0 0|0 0|0 0 0 0 0 0 1 0| d(0) frame 1 left ch.         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    .                                                               .
    |                         d(639)| d(0) frame 1 right ch.        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    .                                                               .
    |                         d(639)| d(0) frame 2 left ch.         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    .                                                               .
    |                         d(639)| d(0) frame 2 right ch.        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         d(639)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

6.3. 4 Mono Frames Interleaved

 The third example is a payload consisting of 4 mono frames that
 correspond to a bitrate of 32 kbps (80 bytes/frame) interleaved.  A
 pattern of interleaving for constant delay when aggregating 4 frames
 is used in the example below.  The actual packet illustrated is
 packet n, while the previous and following packets' frame-block
 content is shown to illustrate the pattern.
    Packet n-3:  1,  6, 11, 16
    Packet n-2:  5, 10, 15, 20
    Packet n-1:  9, 14, 19, 24
    Packet   n: 13, 18, 23, 28
    Packet n+1: 17, 22, 27, 32
    Packet n+2: 21, 26, 31, 36
    The first 32 bits are the ToC field.
    Bit 0 is '0' as there is no ToC field following.
    Bits 1..5 are '01000' = 80 bytes/frame.
    Bits 8..15 are '00000100' = 4 frame-blocks with 80 bytes/frame.
    Bits 16..19 are '0000' = DIS1 (0).
    Bits 20..23 are '0100' = DIS2 (4).
    Bits 24..27 are '0100' = DIS3 (4).
    Bits 28..31 are '0100' = DIS4 (4).

Westerlund & Johansson Standards Track [Page 15] RFC 5404 RTP Payload Format for G.719 January 2009

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |0|0 1 0 0 0|0 0|0 0 0 0 0 1 0 0|0 0 0 0|0 1 0 0|0 1 0 0|0 1 0 0|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | d(0) frame 13                                                 |
    .                                                               .
    |                                                         d(639)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | d(0) frame 18                                                 |
    .                                                               .
    |                                                         d(639)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | d(0) frame 23                                                 |
    .                                                               .
    |                                                         d(639)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | d(0) frame 28                                                 |
    .                                                               .
    |                                                         d(639)|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

7. Payload Format Parameters

 This RTP payload format is identified using the media type audio/
 G719, which is registered in accordance with [RFC4855] and uses the
 template of [RFC4288].

7.1. Media Type Definition

 The media type for the G.719 codec is allocated from the IETF tree
 since G.719 has the potential to become a widely used audio codec in
 general Voice over IP (VoIP), teleconferencing, and streaming
 applications.  This media type registration covers real-time transfer
 via RTP.
 Note, any unspecified parameter MUST be ignored by the receiver to
 ensure that additional parameters can be added in any future revision
 of this specification.
 Type name: audio
 Subtype name: G719
 Required parameters: none
 Optional parameters:

Westerlund & Johansson Standards Track [Page 16] RFC 5404 RTP Payload Format for G.719 January 2009

 interleaving:  Indicates that interleaved mode SHALL be used for the
    payload.  The parameter specifies the number of frame-block slots
    available in a de-interleaving buffer (including the frame that is
    ready to be consumed) for each source.  Its value is equal to one
    plus the maximum number of frames that can precede any frame in
    transmission order and follow the frame in RTP timestamp order.
    The value MUST be greater than zero.  If this parameter is not
    present, interleaved mode SHALL NOT be used.
 int-delay:  The minimal media time delay in milliseconds that is
    needed to avoid underrun in the de-interleaving buffer before
    starting decoding, i.e., the difference in RTP timestamp ticks
    between the earliest and latest audio frame present in the de-
    interleaving buffer expressed in milliseconds.  The value is a
    stream property and provided per source.  The allowed values are
    zero to the largest value expressible by an unsigned 16-bit
    integer (65535).  Please note that in practice, the largest value
    that can be used is equal to the declared size of the interleaving
    buffer of the receiver.  If the value for some reason is larger
    than the receiver buffer declared by or for the receiver, this
    value defaults to the size of the receiver buffer.  For sources
    for which this value hasn't been provided, the value defaults to
    the size of the receiver buffer.  The format is a comma-separated
    list of synchronization source (SSRC) ":" delay in ms pairs, which
    in ABNF [RFC5234] is expressed as:
       int-delay = "int-delay:" source-delay *("," source-delay)
       source-delay = SSRC ":" delay-value
       SSRC = 1*8HEXDIG ; The 32-bit SSRC encoded in hex format
       delay-value = 1*5DIGIT ; The delay value in milliseconds
       Example: int-delay=ABCD1234:1000,4321DCB:640
       NOTE: No white space allowed in the parameter before the end of
       all the value pairs
 max-red:  The maximum duration in milliseconds that elapses between
    the primary (first) transmission of a frame and any redundant
    transmission that the sender will use.  This parameter allows a
    receiver to have a bounded delay when redundancy is used.  Allowed
    values are between zero (no redundancy will be used) and 65535.
    If the parameter is omitted, no limitation on the use of
    redundancy is present.

Westerlund & Johansson Standards Track [Page 17] RFC 5404 RTP Payload Format for G.719 January 2009

 channels:  The number of audio channels.  The possible values (1-6)
    and their respective channel order is specified in Section 4.1 of
    [RFC3551].  If omitted, it has the default value of 1.
 CBR:  Constant Bitrate (CBR) indicates the exact codec bitrate in
    bits per second (not including the overhead from packetization,
    RTP header, or lower layers) that the codec MUST use.  "CBR" is to
    be used when the dynamic rate cannot be supported (one case is,
    e.g., gateway to H.320).  "CBR" is mostly used for gateways to
    circuit switch networks.  Therefore, the "CBR" is the rate not
    including any FEC as specified in Section 4.3.1.  If FEC is to be
    used, the "b=" parameter MUST be used to allow the extra bitrate
    needed to send the redundant information.  It is RECOMMENDED that
    this parameter is only used when necessary to establish a working
    communication.  The usage of this parameter has implications for
    congestion control that need to be considered; see Section 9.
 ptime:  see [RFC4566].
 maxptime:  see [RFC4566].
 Encoding considerations:  This media type is framed and binary; see
    Section 4.8 of [RFC4288].
 Security considerations:  See Section 10 of RFC 5404.
 Interoperability considerations:  The support of the Interleaving
    mode is not mandatory and needs to be negotiated.  See Section 7.2
    for how to do that for SDP-based protocols.
 Published specification:  RFC 5404
 Applications that use this media type:  Real-time audio applications
    like Voice over IP and teleconference, and multi-media streaming.
 Additional information:  none
 Person & email address to contact for further         information:
    Ingemar Johansson
    <ingemar.s.johansson@ericsson.com>
 Intended usage:  COMMON
 Restrictions on usage:  This media type depends on RTP framing, and
    hence is only defined for transfer via RTP [RFC3550].  Transport
    within other framing protocols is not defined at this time.

Westerlund & Johansson Standards Track [Page 18] RFC 5404 RTP Payload Format for G.719 January 2009

 Author:
    Ingemar Johansson <ingemar.s.johansson@ericsson.com>
    Magnus Westerlund <magnus.westerlund@ericsson.com>
 Change controller:  IETF Audio/Video Transport working group
    delegated from the IESG.
 Additionally, note that file storage of G.719-encoded audio in ISO
 base media file format is specified in Annex A of [ITU-T-G719].
 Thus, media file formats such as MP4 (audio/mp4 or video/mp4)
 [RFC4337] and 3GP (audio/3GPP and video/3GPP) [RFC3839] can contain
 G.719-encoded audio.

7.2. Mapping to SDP

 The information carried in the media type specification has a
 specific mapping to fields in the Session Description Protocol (SDP)
 [RFC4566], which is commonly used to describe RTP sessions.  When SDP
 is used to specify sessions employing the G.719 codec, the mapping is
 as follows:
 o  The media type ("audio") goes in SDP "m=" as the media name.
 o  The media subtype (payload format name) goes in SDP "a=rtpmap" as
    the encoding name.  The RTP clock rate in "a=rtpmap" MUST be
    48000, and the encoding parameter "channels" (Section 7.1) MUST
    either be explicitly set to N or omitted, implying a default value
    of 1.  The values of N that are allowed are specified in Section
    4.1 in [RFC3551].
 o  The parameters "ptime" and "maxptime" go in the SDP "a=ptime" and
    "a=maxptime" attributes, respectively.
 o  Any remaining parameters go in the SDP "a=fmtp" attribute by
    copying them directly from the media type parameter string as a
    semicolon-separated list of parameter=value pairs.

7.2.1. Offer/Answer Considerations

 The following considerations apply when using SDP offer/answer
 procedures to negotiate the use of G.719 payload in RTP:
 o  Each combination of the RTP payload transport format configuration
    parameters ("interleaving" and "channels") is unique in its bit
    pattern and not compatible with any other combination.  When
    creating an offer in an application desiring to use the more
    advanced features (interleaving or more than one channel), the
    offerer is RECOMMENDED to also offer a payload type containing

Westerlund & Johansson Standards Track [Page 19] RFC 5404 RTP Payload Format for G.719 January 2009

    only the configuration with a single channel.  If multiple
    configurations are of interest to the application, they may all be
    offered; however, care should be taken not to offer too many
    payload types.  An SDP answerer MUST include, in the SDP answer
    for a payload type, the following parameters unmodified from the
    SDP offer (unless it removes the payload type): "interleaving" and
    "channels".  However, the value of the "interleaving" parameter
    MAY be changed.  The SDP offerer and answerer MUST generate G.719
    packets as described by these parameters.
 o  The "interleaving" and "int-delay" parameters' values have a
    specific relationship that needs to be considered.  It also
    depends on the directionality of the streams and their delivery
    method.  The high-level explanation that can be understood from
    the definition is that the value of "interleaving" declares the
    size of the receiver buffer, while "int-delay" is a stream
    property provided by the sender to inform how much buffer space it
    in practice is using for the stream it sends.
  • For media streams that are sent over multicast, the value of

"interleaving" SHALL NOT be changed by the answerer. It shall

       either be accepted or the payload type deleted.  The value of
       the "int-delay" parameter is a stream property and provided by
       the offer/answer agent that intends to send media with this
       payload type, and for each stream coming from that agent (one
       or more).  The value MUST be between zero and what corresponds
       to the buffer size declared by the value of the "interleaving"
       parameter.
  • For unicast streams that the offerer declares as send-only, the

value of the "interleaving" parameter is the size that the

       answerer is RECOMMENDED to use by the offerer.  The answerer
       MAY change it to any allowed value.  The "int-delay" parameter
       value will be the one the offerer intends to use unless the
       answerer reduces the value of the "interleaving" parameter
       below what is needed for that "int-delay" value.  If the
       "interleaving" value in the answer is smaller than the offer's
       "int-delay" value, the "int-delay" value is per default reduced
       to be corresponding to the "interleaving" value.  If the
       offerer is not satisfied with this, he will need to perform
       another round of offer/answer.  As the answerer will not send
       any media, it doesn't include any "int-delay" in the answer.
  • For unicast streams that the offerer declares as recvonly, the

value of "interleaving" in the offer will be the offerer's size

       of the interleaving buffer.  The answerer indicates its
       preferred size of the interleaving buffer for any future round
       of offer/answer.  The offerer will not provide any "int-delay"

Westerlund & Johansson Standards Track [Page 20] RFC 5404 RTP Payload Format for G.719 January 2009

       parameter as it is not sending any media.  The answerer is
       recommended to include in its answer an "int-delay" parameter
       to declare what the property is for the stream it is going to
       send.  The answer is expected to be capable of selecting a
       valid parameter value that is between zero and the declared
       maximum number of slots in the de-interleaving buffer.
  • For unicast streams that the offer declares as sendrecv

streams, the value of the "interleaving" parameter in the offer

       will be the offerer's size of the interleaving buffer.  The
       answerer will in the answer indicate the size of its actual
       interleaving buffer.  It is recommended that this value is at
       least as big as the offer's.  The offerer is recommended to
       include an "int-delay" parameter that is selected based on the
       answerer having at least as much interleaving space as the
       offerer unless nothing else is known.  As the offerer's
       interleaving buffer size is not yet known, this may fail, in
       which case the default rule is to downgrade the value of the
       "int-delay" to correspond to the full size of the answerer's
       interleaving buffer.  If the offerer isn't satisfied with this,
       it will need to initiate another round of offer/answer.  The
       answerer is recommended in its answer to include an "int-delay"
       parameter to declare what the property is for the stream(s) it
       is going to send.  The answer is expected to be capable of
       selecting a valid parameter value that is between zero and the
       declared maximum number of slots in the de-interleaving buffer.
 o  In most cases, the parameters "maxptime" and "ptime" will not
    affect interoperability; however, the setting of the parameters
    can affect the performance of the application.  The SDP offer/
    answer handling of the "ptime" parameter is described in
    [RFC3264].  The "maxptime" parameter MUST be handled in the same
    way.
 o  The parameter "max-red" is a stream property parameter.  For
    sendonly or sendrecv unicast media streams, the parameter declares
    the limitation on redundancy that the stream sender will use.  For
    recvonly streams, it indicates the desired value for the stream
    sent to the receiver.  The answerer MAY change the value, but is
    RECOMMENDED to use the same limitation as the offer declares.  In
    the case of multicast, the offerer MAY declare a limitation; this
    SHALL be answered using the same value.  A media sender using this
    payload format is RECOMMENDED to always include the "max-red"
    parameter.  This information is likely to simplify the media
    stream handling in the receiver.  This is especially true if no
    redundancy will be used, in which case "max-red" is set to zero.
 o  Any unknown parameter in an offer SHALL be removed in the answer.

Westerlund & Johansson Standards Track [Page 21] RFC 5404 RTP Payload Format for G.719 January 2009

 o  The "b=" SDP parameter SHOULD be used to negotiate the maximum
    bandwidth to be used for the audio stream.  The offerer may offer
    a maximum rate and the answer may contain a lower rate.  If no
    "b=" parameter is present in the offer or answer, it implies a
    rate up to 128 kbps.
 o  The parameter "CBR" is a receiver capability; i.e., only receivers
    that really require a constant bitrate should use it.  Usage of
    this parameter has a negative impact on the possibility to perform
    congestion control; see Section 9.  For recvonly and sendrecv
    streams, it indicates the desired constant bitrate that the
    receiver wants to accept.  A sender MUST be able to send a
    constant bitrate stream since it is a subset of the variable
    bitrate capability.  If the offer includes this parameter, the
    answerer MUST send G.719 audio at the constant bitrate if it is
    within the allowed session bitrate ("b=" parameter).  If the
    answerer cannot support the stated CBR, this payload type must be
    refused in the answer.  The answerer SHOULD only include this
    parameter if the answerer itself requires to receive at a constant
    bitrate, even if the offer did not include the "CBR" parameter.
    In this case, the offerer SHALL send at the constant bitrate, but
    SHALL be able to accept media at a variable bitrate.  An answerer
    is RECOMMEND to use the same CBR as in the offer, as symmetric
    usage is more likely to work.  If both sides require a particular
    CBR, there is the possibility of communication failure when one or
    both sides can't transmit the requested rate.  In this case, the
    agent detecting this issue will have to perform a second round of
    offer/answer to try to find another working configuration or end
    the established session.  In case the offer contained a "CBR"
    parameter but the answer does not, then the offerer is free to
    transmit at any rate to the answerer, but the answerer is
    restricted to the declared rate.

7.2.2. Declarative SDP Considerations

 In declarative usage, like SDP in the Real Time Streaming Protocol
 (RTSP) [RFC2326] or the Session Announcement Protocol (SAP)
 [RFC2974], the parameters SHALL be interpreted as follows:
 o  The payload format configuration parameters ("interleaving" and
    "channels") are all declarative, and a participant MUST use the
    configuration(s) that is provided for the session.  More than one
    configuration may be provided if necessary by declaring multiple
    RTP payload types; however, the number of types should be kept
    small.

Westerlund & Johansson Standards Track [Page 22] RFC 5404 RTP Payload Format for G.719 January 2009

 o  It might not be possible to know the SSRC values that are going to
    be used by the sources at the time of sending the SDP.  This is
    not a major issue as the size of the interleaving buffer can be
    tailored towards the values that are actually going to be used,
    thus ensuring that the default values for "int-delay" are not
    resulting in too much extra buffering.
 o  Any "maxptime" and "ptime" values should be selected with care to
    ensure that the session's participants can achieve reasonable
    performance.
 o  The parameter "CBR" if included applies to all RTP streams using
    that payload type for which a particular CBR is declared.  Usage
    of this parameter has a negative impact on the possibility to
    perform congestion control; see Section 9.

8. IANA Considerations

 One media type (audio/G719) has been defined and registered in the
 media types registry; see Section 7.1.

9. Congestion Control

 The general congestion control considerations for transporting RTP
 data apply; see RTP [RFC3550] and any applicable RTP profile like AVP
 [RFC3551].  However, the multi-rate capability of G.719 audio coding
 provides a mechanism that may help to control congestion, since the
 bandwidth demand can be adjusted (within the limits of the codec) by
 selecting a different encoding bitrate.
 The number of frames encapsulated in each RTP payload highly
 influences the overall bandwidth of the RTP stream due to header
 overhead constraints.  Packetizing more frames in each RTP payload
 can reduce the number of packets sent and hence the header overhead,
 at the expense of increased delay and reduced error robustness.  If
 forward error correction (FEC) is used, the amount of FEC-induced
 redundancy needs to be regulated such that the use of FEC itself does
 not cause a congestion problem.  In other words, a sender SHALL NOT
 increase the total bitrate when adding redundancy in response to
 packet loss, and needs instead to adjust it down in accordance to the
 congestion control algorithm being run.  Thus, when adding
 redundancy, the media bitrate will need to be reduced to provide room
 for the redundancy.
 The "CBR" signaling parameter allows a receiver to lock down an RTP
 payload type to use a single encoding rate.  As this prevents the
 codec rate from being lowered when congestion is experienced, the
 sender is constrained to either change the packetization or abort the

Westerlund & Johansson Standards Track [Page 23] RFC 5404 RTP Payload Format for G.719 January 2009

 transmission.  Since these responses to congestion are severely
 limited, implementations SHOULD NOT use the "CBR" parameter unless
 they are interacting with a device that cannot support a variable
 bitrate (e.g., a gateway to H.320 systems).  When using CBR mode, a
 receiver MUST monitor the packet loss rate to ensure congestion is
 not caused, following the guidelines in Section 2 of RFC 3551.

10. Security Considerations

 RTP packets using the payload format defined in this specification
 are subject to the security considerations discussed in the RTP
 specification [RFC3550] and in any applicable RTP profile.  The main
 security considerations for the RTP packet carrying the RTP payload
 format defined within this memo are confidentiality, integrity, and
 source authenticity.  Confidentiality is achieved by encryption of
 the RTP payload.  Integrity of the RTP packets is achieved through a
 suitable cryptographic integrity protection mechanism.  Such a
 cryptographic system may also allow the authentication of the source
 of the payload.  A suitable security mechanism for this RTP payload
 format should provide confidentiality, integrity protection, and at
 least source authentication capable of determining if an RTP packet
 is from a member of the RTP session.
 Note that the appropriate mechanism to provide security to RTP and
 payloads following this memo may vary.  It is dependent on the
 application, the transport, and the signaling protocol employed.
 Therefore, a single mechanism is not sufficient, although if
 suitable, usage of the Secure Real-time Transport Protocol (SRTP)
 [RFC3711] is recommended.  Other mechanisms that may be used are
 IPsec [RFC4301] and Transport Layer Security (TLS) [RFC5246] (RTP
 over TCP); other alternatives may exist.
 The use of interleaving in conjunction with encryption can have a
 negative impact on confidentiality for a short period of time.
 Consider the following packets (in brackets) containing frame numbers
 as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a popular
 continuous diagonal interleaving pattern).  The originator wishes to
 deny some participants the ability to hear material starting at time
 16.  Simply changing the key on the packet with the timestamp at or
 after 16, and denying that new key to those participants, does not
 achieve this; frames 17, 18, and 21 have been supplied in prior
 packets under the prior key, and error concealment may make the audio
 intelligible at least as far as frame 18 or 19, and possibly further.

Westerlund & Johansson Standards Track [Page 24] RFC 5404 RTP Payload Format for G.719 January 2009

 This RTP payload format and its media decoder do not exhibit any
 significant non-uniformity in the receiver-side computational
 complexity for packet processing, and thus are unlikely to pose a
 denial-of-service threat due to the receipt of pathological data.
 Nor does the RTP payload format contain any active content.

11. Acknowledgements

 The authors would like to thank Roni Even and Anisse Taleb for their
 help with this document.  We would also like to thank the people who
 have provided feedback: Colin Perkins, Mark Baker, and Stephen
 Botzko.

12. References

12.1. Normative References

 [ITU-T-G719]  ITU-T, "Specification : ITU-T G.719 extension for 20
               kHz fullband audio", April 2008.
 [RFC2119]     Bradner, S., "Key words for use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.
 [RFC3264]     Rosenberg, J. and H. Schulzrinne, "An Offer/Answer
               Model with Session Description Protocol (SDP)",
               RFC 3264, June 2002.
 [RFC3550]     Schulzrinne, H., Casner, S., Frederick, R., and V.
               Jacobson, "RTP: A Transport Protocol for Real-Time
               Applications", STD 64, RFC 3550, July 2003.
 [RFC3551]     Schulzrinne, H. and S. Casner, "RTP Profile for Audio
               and Video Conferences with Minimal Control", STD 65,
               RFC 3551, July 2003.
 [RFC4566]     Handley, M., Jacobson, V., and C. Perkins, "SDP:
               Session Description Protocol", RFC 4566, July 2006.
 [RFC5234]     Crocker, D. and P. Overell, "Augmented BNF for Syntax
               Specifications: ABNF", STD 68, RFC 5234, January 2008.
 [RFC5405]     Eggert, L. and G. Fairhurst, "Unicast UDP Usage
               Guidelines for Application Designers", BCP 145,
               RFC 5405, November 2008.

Westerlund & Johansson Standards Track [Page 25] RFC 5404 RTP Payload Format for G.719 January 2009

12.2. Informative References

 [RFC2198]     Perkins, C., Kouvelas, I., Hodson, O., Hardman, V.,
               Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse-
               Parisis, "RTP Payload for Redundant Audio Data",
               RFC 2198, September 1997.
 [RFC2326]     Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
               Streaming Protocol (RTSP)", RFC 2326, April 1998.
 [RFC2974]     Handley, M., Perkins, C., and E. Whelan, "Session
               Announcement Protocol", RFC 2974, October 2000.
 [RFC3711]     Baugher, M., McGrew, D., Naslund, M., Carrara, E., and
               K. Norrman, "The Secure Real-time Transport Protocol
               (SRTP)", RFC 3711, March 2004.
 [RFC3839]     Castagno, R. and D. Singer, "MIME Type Registrations
               for 3rd Generation Partnership Project (3GPP)
               Multimedia files", RFC 3839, July 2004.
 [RFC4288]     Freed, N. and J. Klensin, "Media Type Specifications
               and Registration Procedures", BCP 13, RFC 4288,
               December 2005.
 [RFC4301]     Kent, S. and K. Seo, "Security Architecture for the
               Internet Protocol", RFC 4301, December 2005.
 [RFC4337]     Y Lim and D. Singer, "MIME Type Registration for
               MPEG-4", RFC 4337, March 2006.
 [RFC4855]     Casner, S., "Media Type Registration of RTP Payload
               Formats", RFC 4855, February 2007.
 [RFC5109]     Li, A., "RTP Payload Format for Generic Forward Error
               Correction", RFC 5109, December 2007.
 [RFC5246]     Dierks, T. and E. Rescorla, "The Transport Layer
               Security (TLS) Protocol Version 1.2", RFC 5246,
               August 2008.

Westerlund & Johansson Standards Track [Page 26] RFC 5404 RTP Payload Format for G.719 January 2009

Authors' Addresses

 Magnus Westerlund
 Ericsson AB
 Torshamnsgatan 21-23
 SE-164 83 Stockholm
 SWEDEN
 Phone: +46 10 7190000
 EMail: magnus.westerlund@ericsson.com
 Ingemar Johansson
 Ericsson AB
 Laboratoriegrand 11
 SE-971 28 Lulea
 SWEDEN
 Phone: +46 10 7190000
 EMail: ingemar.s.johansson@ericsson.com

Westerlund & Johansson Standards Track [Page 27]

/data/webs/external/dokuwiki/data/pages/rfc/rfc5404.txt · Last modified: 2009/01/23 17:18 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki