GENWiki

Premier IT Outsourcing and Support Services within the UK

User Tools

Site Tools


rfc:rfc2038

Network Working Group D. Hoffman Request for Comments: 2038 G. Fernando Category: Standards Track Sun Microsystems, Inc.

                                                              V. Goyal
                                                Precept Software, Inc.
                                                          October 1996
              RTP Payload Format for MPEG1/MPEG2 Video

Status of this Memo

 This document specifies an Internet standards track protocol for the
 Internet community, and requests discussion and suggestions for
 improvements.  Please refer to the current edition of the "Internet
 Official Protocol Standards" (STD 1) for the standardization state
 and status of this protocol.  Distribution of this memo is unlimited.

Abstract

 This memo describes a packetization scheme for MPEG video and audio
 streams.  The scheme proposed can be used to transport such a video
 or audio flow over the transport protocols supported by RTP.  Two
 approaches are described. The first is designed to support maximum
 interoperability with MPEG System environments.  The second is
 designed to provide maximum compatibility with other RTP-encapsulated
 media streams and future conference control work of the IETF.

1. Introduction

 ISO/IEC JTC1/SC29 WG11 (also referred to as the MPEG committee) has
 defined the MPEG1 standard (ISO/IEC 11172)[1] and the MPEG2 standard
 (ISO/IEC 13818)[2].  This memo describes a packetization scheme to
 transport MPEG video and audio streams using the Real-time Transport
 Protocol (RTP), version 2 [3, 4].
 The MPEG1 specification is defined in three parts: System, Video and
 Audio.  It is designed primarily for CD-ROM-based applications, and
 is optimized for approximately 1.5 Mbits/sec combined data rates. The
 video and audio portions of the specification describe the basic
 format of the video or audio stream.  These formats define the
 Elementary Streams (ES).  The MPEG1 System specification defines an
 encapsulation of the ES that contains Presentation Time Stamps (PTS),
 Decoding Time Stamps and System Clock references, and performs
 multiplexing of MPEG1 compressed video and audio ES's with user data.

Hoffman, et. al. Standards Track [Page 1] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

 The MPEG2 specification is structured in a similar way. However, it
 hasn't been restricted only to CD-ROM applications. The MPEG2 System
 specification defines two system stream formats:  the MPEG2 Transport
 Stream (MTS) and the MPEG2 Program Stream (MPS).  The MTS is tailored
 for communicating or storing one or more programs of MPEG2 compressed
 data and also other data in relatively error-prone environments. The
 MPS is tailored for relatively error-free environments.
 We seek to achieve interoperability among 4 types of end-systems in
 the following specification. The 4 types are:
      1. Transmitting Interworking Unit (TIU)
         Receives MPEG information from a native MTS system for
         distribution over packet networks using a native RTP-based
         system layer (such as an IP-based internetwork). Examples:
         real-time encoder, MTS satellite link to Internet, video
         server with MTS-encoded source material.
      2. Receiving Interworking Unit (RIU)
         Receives MPEG information in real time from an RTP-based
         network for forwarding to a native MTS environment.
         Examples: Internet-based video server to MTS-based cable
         distribution plant.
      3. Transmitting Internet End-System (TAES)
         Transmits MPEG information generated or stored within the
         internet end-system itself, or received from internet-based
         computer networks.  Example: video server.
      4. Receiving Internet End-System (RAES)
         Receives MPEG information over an RTP-based internet for
         consumption at the internet end-system or forwarding to
         traditional computer network.  Example: desktop PC or
         workstation viewing training video.
 Each of the 2 types of transmitters must work with each of the 2
 types of receivers.  Because it is probable that the TAES, and
 certain that the RAES, will be based on existing and planned
 internet-connected computers, it is highly desirable for the
 interoperable protocol to be based on RTP.
 Because of the range of applications that might employ MPEG streams,
 we propose to define two payload formats.

Hoffman, et. al. Standards Track [Page 2] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

 Much interest in the MPEG community is in the use of one of the MPEG
 System encodings, and hence, in Section 2 we propose encapsulations
 of MPEG1 System streams and MPEG2 Transport and Program Streams with
 RTP.  This profile supports the full semantics of MPEG System and
 offers basic interoperability among all four end-system types.
 When operating only among internet-based end-systems (i.e., TAES and
 RAES) a payload format that provides greater compatibility with the
 Internet architecture is desired, deferring some of the system issues
 to other protocols being defined in the Internet community (such as
 the MMUSIC WG).  In Section 3 we propose an encapsulation of
 compressed video and audio data (referred to in MPEG documentation as
 "Elementary Streams" (ES)) complying with either MPEG1 or MPEG2.
 Here, neither of the System standards of MPEG1 or MPEG2 are utilized.
 The ES's are directly encapsulated with RTP.
 Throughout this specification, we make extensive use of MPEG
 terminology.  The reader should consult the primary MPEG references
 for definitive descriptions of this terminology.

2. Encapsulation of MPEG System and Transport Streams

 Each RTP packet will contain a timestamp derived from the sender's
 90KHz clock reference.  This clock is synchronized to the system
 stream Program Clock Reference (PCR) or System Clock Reference (SCR)
 and represents the target transmission time of the first byte of the
 packet payload.  The RTP timestamp will not be passed to the MPEG
 decoder.  This use of the timestamp is somewhat different than
 normally is the case in RTP, in that it is not considered to be the
 media display or presentation timestamp. The primary purposes of the
 RTP timestamp will be to estimate and reduce any network-induced
 jitter and to synchronize relative time drift between the transmitter
 and receiver.
 For MPEG2 Transport Streams the RTP payload will contain an integral
 number of MPEG transport packets.  To avoid end system
 inefficiencies, data from multiple small MTS packets (normally fixed
 in size at 188 bytes) are aggregated into a single RTP packet.  The
 number of transport packets contained is computed by dividing RTP
 payload length by the length of an MTS packet (188).
 For MPEG2 Program streams and MPEG1 system streams there are no
 packetization restrictions; these streams are treated as a packetized
 stream of bytes.

Hoffman, et. al. Standards Track [Page 3] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

2.1 RTP header usage

 The RTP header fields are used as follows:
      Payload Type: Distinct payload types should be assigned for
        of MPEG1 System Streams, MPEG2 Program Streams and MPEG2
        Transport Streams.  See [4] for payload type assignments.
      M bit:  Set to 1 whenever the timestamp is discontinuous
        (such as might happen when a sender switches from one data
        source to another). This allows the receiver and any
        intervening RTP mixers or translators that are synchronizing
        to the flow to ignore the difference between this timestamp
        and any previous timestamp in their clock phase detectors.
      timestamp: 32 bit 90K Hz timestamp representing the target
        transmission time for the first byte of the packet.

3. Encapsulation of MPEG Elementary Streams

 The following ES types may be encapsulated directly in RTP:
      (a) MPEG1 Video (ISO/IEC 11172-2)
      (b) MPEG2 Video (ISO/IEC 13818-2)
      (c) MPEG1 Audio (ISO/IEC 11172-3)
      (d) MPEG2 Audio (ISO/IEC 13818-3)
 A distinct RTP payload type is assigned to MPEG1/MPEG2 Video and
 MPEG1/MPEG2 Audio, respectively. Further indication as to whether the
 data is MPEG1 or MPEG2 need not be provided in the RTP or MPEG-
 specific headers of this encapsulation, as this information is
 available in the ES headers.
 Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz
 shall be carried in the fixed RTP header. All packets that make up a
 audio or video frame shall have the same time stamp.

3.1 MPEG Video elementary streams

 MPEG1 Video can be distinguished from MPEG2 Video at the video
 sequence header, i.e. for MPEG2 Video a sequence_header() is followed
 by sequence_extension().  The particular profile and level of MPEG2
 Video (MAIN_Profile@MAIN_Level, HIGH_Profile@HIGH_Level, etc) are
 determined by the profile_and_level_indicator field of the
 sequence_extension header of MPEG2 Video.
 The MPEG bit-stream semantics were designed for relatively error-free
 environments, and there is significant amount of dependency (both

Hoffman, et. al. Standards Track [Page 4] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

 temporal and spatial) within the stream such that loss of some data
 make other uncorrupted data useless.  The format as defined in this
 encapsulation uses application layer framing information plus
 additional information in the RTP stream-specific header to allow for
 certain recovery mechanisms.  Appendix 1 suggests several recovery
 strategies based on the properties of this encapsulation.
 Since MPEG pictures can be large, they will normally be fragmented
 into packets of size less than a typical LAN/WAN MTU.  The following
 fragmentation rules apply:
      1. The MPEG Video_Sequence_Header, when present, will always
         be at the beginning of an RTP payload.
      2. An MPEG GOP_header, when present, will always be at the
         beginning of the RTP payload, or will follow a
         Video_Sequence_Header.
      3. An MPEG Picture_Header, when present, will always be at the
         beginning of a RTP payload, or will follow a GOP_header.
 Each ES header must be completely contained within the packet.
 Consequently, a minimum RTP payload size of 261 bytes must be
 supported to contain the largest single header defined in the ES
 (that is, the extension_data() header containing the
 quant_matrix_extension()).  Otherwise, there are no restrictions on
 where headers may appear within packet payloads.
 In MPEG, each picture is made up of one or more "slices," and a slice
 is intended to be the unit of recovery from data loss or corruption.
 An MPEG-compliant decoder will normally advance to the beginning of
 next slice whenever an error is encountered in the stream.  MPEG
 slice begin and end bits are provided in the encapsulation header to
 facilitate this.
 The beginning of a slice must either be the first data in a packet
 (after any MPEG ES headers) or must follow after some integral number
 of slices in a packet.  This requirement insures that the beginning
 of the next slice after one with a missing packet can be found
 without requiring that the receiver scan the packet contents.  Slices
 may be fragmented across packets as long as all the above rules are
 met.
 An implementation based on this encapsulation assumes that the
 Video_Sequence_Header is repeated periodically in the MPEG bit-
 stream.  In practice (though not required by MPEG standard) this is
 used to allow channel switching and to receive and start decoding a
 continuously relayed MPEG bit-stream at arbitrary points in the media
 stream.  It is suggested that when playing back from an MPEG stream
 from a file format (where the Video_Sequence_Header may only be

Hoffman, et. al. Standards Track [Page 5] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

 represented at the beginning of the stream) that the first
 Video_Sequence_Header (preceded by an end-of-stream indicator) be
 saved by the packetizer for periodic injection in to the network
 stream.

3.2 MPEG Audio elementary streams

 MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG
 ancillary_data() header.  For either MPEG1 or MPEG2 Audio, distinct
 Presentation Time Stamps may be present for frames which correspond
 to either 384 samples for Layer-I, or 1152 samples for Layer-II or
 Layer-III.  The actual number of bytes required to represent this
 number of samples will vary depending on the encoder parameters.
 Multiple audio frames may be encapsulated within one RTP packet.  In
 this case, an integral number of audio frames must be contained
 within the packet and the fragmentation header defined in Section 3.5
 shall be set to 0.
 Also, if relatively short packets are to be used, one frame may be so
 large that it may straddle multiple RTP packets.  For example, for
 Layer-II MPEG audio sampled at a rate of 44.1 KHz each frame would
 represent a time slot of 26.1 msec. At this sampling rate if the
 compressed bit-rate is 384 kbits/sec (i.e.  48 kBytes/sec) then the
 average audio frame size would be 1.25 KBytes.  If packets were to be
 500 Bytes long, then each audio frame would straddle 3 RTP packets.
 The audio fragmentation indicator header (See Section 3.5) shall be
 present for an MPEG1/2 Audio payload type to provide for this
 fragmentation.

3.3 RTP Fixed Header for MPEG ES encapsulation

 The RTP header fields are used as follows:
      Payload Type: Distinct payload types should be assigned
        for video elementary streams and audio elementary streams.
        See [4] for payload type assignments.
      M bit:  For video, set to 1 on packet containing MPEG frame
        end code, 0 otherwise.  For audio, set to 1 on first packet
        of a "talk-spurt," 0 otherwise.
      PT:  MPEG video or audio stream ID.
      timestamp: 32-bit 90K Hz timestamp representing presentation
        time of MPEG picture or audio frame.  Same for all packets
        that make up a picture or audio frame.  May not be
        monotonically increasing in video stream if B pictures

Hoffman, et. al. Standards Track [Page 6] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

        present in stream.  For packets that contain only a video
        sequence and/or GOP header, the timestamp is that of the
        subsequent picture.

3.4 MPEG Video-specific header

 This header shall be attached to each RTP packet after the RTP fixed
 header.

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

MBZ TR MBZSBE P BFC FFC

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                                              FBV     FFV
      MBZ: Unused. Must be set to zero in current
         specification. This space is reserved for future use.
      TR: Temporal-Reference (10 bits). The temporal reference of
         the current picture within the current GOP. This value
         ranges from 0-1023 and is constant for all RTP packets of a
         given picture.
      MBZ: Unused. Must be set to zero in current
         specification. This space is reserved for future use.
      S: Sequence-header-present (1 bit). Normally 0 and set to 1 at
         the occurrence of each MPEG sequence header.  Used to
         detect presence of sequence header in RTP packet.
      B: Beginning-of-slice (BS) (1 bit). Set when the start of the
         packet payload is a slice start code, or when a slice start
         code is preceded only by one or more of a
         Video_Sequence_Header, GOP_header and/or Picture_Header.
      E: End-of-slice (ES) (1 bit). Set when the last byte of the
         payload is the end of an MPEG slice.
      P: Picture-Type (3 bits). I (1), P (2), B (3) or D (4). This
         value is constant for each RTP packet of a given picture.
         Value 000B is forbidden and 101B - 111B are reserved to
         support future extensions to the MPEG ES specification.
      FBV: full_pel_backward_vector
      BFC: backward_f_code
      FFV: full_pel_forward_vector
      FFC: forward_f_code

Hoffman, et. al. Standards Track [Page 7] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

         Obtained from the most recent picture header, and are
         constant for each RTP packet of a given picture. None of
         these values are used for I frames and must be set to zero
         in the RTP header. For P frames only the last two values
         are present and FBV and BFC must be set to zero in the RTP
         header. For B frames all the four values are present.

3.5 MPEG Audio-specific header

 This header shall be attached to each RTP packet at the start of the
 payload and after any RTP headers for an MPEG1/2 Audio payload type.

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

MBZ Frag_offset

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      Frag_offset: Byte offset into the audio frame for the data
         in this packet.

Hoffman, et. al. Standards Track [Page 8] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

Appendix 1. Error Recovery and Resynchronization Strategies.

 The following error recovery and resynchronization strategies are
 intended to be guidelines only.  A compliant receiver is free to
 employ alternative (or no) strategies.
 When initially decoding an RTP-encapsulated MPEG Elementary Stream,
 the receiver may discard all packets until the Sequence-header-
 present bit is set to 1.  At this point, sufficient state information
 is contained in the stream to allow processing by an MPEG decoder.
 Loss of packets containing the GOP_header and/or Picture_Header are
 detected by an unexpected change in the Temporal-Reference and
 Picture-Type values.  Consider the following example GOP sequence:
      In display order: 0B 1B 2I 3B 4B 5P 6B 7B 8P GOP_HDR 0B ...
      In stream order:  2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_HDR 2I ...
 Consider also two counters:
      ref_pic_temp (Reference Picture (I,P) Temporal Reference)
      dep_pic_temp (Dependent Picture (B) Temporal Reference)
 At each GOP beginning, set these counters to the temporal reference
 value of the corresponding picture type. For our example GOP
 sequence, ref_pic_temp = 2 and dep_pic_temp = 0. Keep incrementing
 BOTH counters by unity with each following picture. Ref_pic_temp
 should match the temporal references of the I and P frames, and
 dep_pic_temp should match the temporal references of the B frames.
     dep_pic_temp: -  0  1  2  3  4  5  6  7        8  9
 In stream order:  2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_H 2I 0B 1B ...
     ref_pic_temp: 2  3  4  5  6  7  8  9  10  ^    11
                   --------------------------  |    ^
                              Match            Drop |
                                                    Mismatch
                                                     in ref_pic_temp
 The loss of a GOP header can be detected by matching the appropriate
 counter (based on picture type) to the temporal reference value. A
 mismatch indicates a lost GOP header. If desired, a GOP header can be
 re-constructed using a "null" time_code, repeating the closed_gop
 flag from previous GOP headers, and setting the broken_link flag to
 1.
 The loss of a Picture_Header can also be detected by a mismatch in
 the Temporal Reference contained in the RTP packet from the
 appropriate dep_pic_temp or ref_pic_temp counters at the receiver.

Hoffman, et. al. Standards Track [Page 9] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

 After scanning to the next Beginning-of-slice the Picture_Header is
 reconstructed from the P, TR, FBV, BFC, FFV and FFC contained in that
 packet, and from stream-dependent default values.
 Any time an RTP packet is lost (as indicated by a gap in the RTP
 sequence number), the receiver may discard all packets until the
 Beginning-of-slice bit is set.  At this point, sufficient state
 information is contained in the stream to allow processing by an MPEG
 decoder starting at the next slice boundary (possibly after
 reconstruction of the GOP_header and/or Picture_Header as described
 above).

References

 [1] ISO/IEC International Standard 11172; "Coding of moving pictures
     and associated audio for digital storage media up to about 1,5
     Mbits/s", November 1993.
 [2] ISO/IEC International Standard 13818; "Generic coding of moving
     pictures and associated audio information", November 1994.
 [3] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson,
     "RTP: A Transport Protocol for Real-Time Applications",
     RFC 1889, January 1996.
 [4] H. Schulzrinne, "RTP Profile for Audio and Video Conferences
     with Minimal Control", RFC 1890, January 1996.

Hoffman, et. al. Standards Track [Page 10] RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996

Authors' Addresses

 Gerard Fernando
 Sun Microsystems, Inc.
 Mail-stop UMPK14-305
 2550 Garcia Avenue
 Mountain View, California 94043-1100
 USA
 Phone: +1 415-786-6373
 EMail: gerard.fernando@eng.sun.com
 Vivek Goyal
 Precept Software, Inc.
 1072 Arastradero Rd,
 Palo Alto, CA 94304
 USA
 Phone: +1 415-845-5200
 EMail: goyal@precept.com
 Don Hoffman
 Sun Microsystems, Inc.
 Mail-stop UMPK14-305
 2550 Garcia Avenue
 Mountain View, California 94043-1100
 USA
 Phone: +1 503-297-1580
 EMail: don.hoffman@eng.sun.com

Hoffman, et. al. Standards Track [Page 11]

/data/webs/external/dokuwiki/data/pages/rfc/rfc2038.txt · Last modified: 1996/10/29 17:24 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki