MPEG-2 Case

The whole doc is available only for registered users

Pages: 15
Word count: 3508
Category: College Example

A limited time offer! Get a custom sample essay written according to your requirements urgent 3h delivery guaranteed

The new DVB-T digital television standard forms the core of the new digital terrestrial television landscape for Europe. The new digital broadcasting standard brings a plethora of new benefits. DVB-T will bring higher resolution video, higher quality multi channel audio, as well as embedded program information in the broadcast stream. Across the world, similar moves are being put to digitize television broadcasting. In the US, analogue television broadcasts are to end in 2009 as the US FCC has mandated a total switchover to digital television broadcasting (Benoit, 2003, pp. 128-140).

At the core of all these improvements in digital television is the change from transmission of video from analogue to digital. However, simply changing the representation of broadcast data from analogue to digital does not bring all these benefits to the table. The benefits of digital representation come from the ability to use information technology tools on the broadcast signal. With a digital signal, we may employ error detection and correction schemes to help fight channel effects. More importantly, we are able to apply compression techniques with a digital signal. It is with this compression that digital television can send video signals with unparalleled resolution coupled with high quality sound as the transmitting station can now send much more information over the same communications channel. By the ability to compress the data being sent, the system effectively enlarged the communications channel allowing greater traffic over the same frequency and power allocation (Benoit, 2003, pp. 17-29).

At the core of this is MPEG-2, the video compression standard used by ATSC. MPEG-2 is a standard developed by the Moving Pictures Expert Group, a working group of the International Standards Organization (ISO). Worldwide, MPEG 2 has found use as the standard video compression for use in DVDs and in terrestrial digital television systems. This paper takes a closer look at the inner workings and techniques which make up MPEG-2.

The MPEG-2 Standards

MPEG-2 is a standard recognized by the ISO as ISO/IEC 13818-1 and by the International Telecommunications Union (ITU) as ITU-T Rec. H.222.0. The ISO version of MPEG-2 can be broken down into three main parts. The first part defines the kinds of streams supported by MPEG-2. The second part of the standard defines the video compression techniques employed by MPEG-2 while the third part defines the audio coding aspect of MPEG-2.

As with all MPEG standards, MPEG-2 is an asymmetric standard wherein the encoder has much greater complexity than the decoder. The standard only defines how a stream should be interpreted by a decoder to create a video image. There are no provisions on how the stream should be created, only on how the stream should be understood. This strategy creates a standard simple decoder while leaving open the possibility of implementing more technologically advanced complex encoders in the future. This is exceptionally beneficial for situations such as broadcast where the decoders vastly outnumber the encoders. Also, having a simple decoder brings down the costs while the few entities owning or operating encoders can deal with the increased complexity (Watkinson, 2001, pp. 1-3).

The first part of the MPEG-2 standard defines two kinds of streams for MPEG-2, the transport stream and the program stream. These streams define two types of multiplexing schemes for the MPEG file or stream. The program stream is based on the previous MPEG-1 specification and is designed to handle only a single actual video. The program stream is intended to be used in relatively error free environments due to its vulnerability to errors. MPEG-2 program streams are most commonly used for video hosted on storage media such as DVDs. The second type of stream, the transport stream is designed for handling multiple programs and is more reliable against errors. MPEG-2 transport streams are intended for broadcast use which is the basis for accommodating multiple programs and the errors which may arise from the communications channel. The benefits of the transport stream come at a cost – it is much more complicated to create and decode an MPEG-2 transport stream than the program stream (Sarginson, 1996, p. 2).

Technologies

MPEG-2 compression utilizes several patented technologies. These patents are owned by various technology companies, research labs and academic institutions. Around 640 patents make up the MPEG-2 patent portfolio covering aspects from encoder design, image interpolation to the audio encoding system in use. Some patent holders include Alcatel-Lucent, G.E., Mitsubishi Corporation, Phillips, Sony, Sharp and Columbia University (MPEG-LA, 2009).

The large number of patents is one main criticism of MPEG-2. Implementing an MPEG-2 encoder or decoder would involve licensing numerous existing proprietary technologies as opposed to a system which utilized more open standards and technologies. Even though MPEG-2 is an open standard as far as its specifications are open to anyone, it can hardly be called a free standard due to the existing patented technologies it uses.

Compression Techniques

The core of source coding techniques is the removal of redundant information. Such redundant information are removed at the encoder and can simply be reinserted or interpolated at the decoder. This can be done to redundant information since it is highly correlated with other bits of information in the data stream. Using this high level correlation, the removed redundant information can be interpolated back using the values from the retained bits of information. Redundancy in the video stream can be found across time (temporal redundancy) or across the frame (spatial redundancy). (Tudor, 1995).

Another aspect to compression especially of multimedia information such as MPEG-2 is the exploitation of the limitations of the human perception system. MPEG-2 employs lossy compression. The process of removing redundant information is permanent in the case of MPEG-2. While the decoder may interpolate or reconstruct these redundant information, the decoder will not be able to completely rebuild the original source. The final decoded video would not be equal to the source as the removed redundant information cannot be recaptured completely. The goal however is to make sure that this loss of information happens at a rate which cannot be perceived by the viewer. The human psychovisual system can resolve only a limited amount of spatial detail. This information gives us a ceiling for how much detail needs to be present in the image. On the other hand, knowledge of the human psychovisual system also gives us the floor at which artifacts and imperfections introduced by the encoding and decoding process can already be perceived by the viewer (Tudor, 1995).

Frames

A video can be thought of as a rapid sequence of distinct frames presented in order. The rapid sequencing of these static pictures gives the appearance of motion. This fundamental notion underlies cinema and is also present in electronic video systems. Each individual image is called a frame and the number of frames shown per second is called the frame rate. In the US, TV signals are broadcast at a frame rate of 30Hz while European use 25Hz. Current TV broadcasting is interlaced – instead of broadcasting 30 whole frames per second, 60 sub frames called fields are broadcast. Every other field contains only the odd or even lines of the frame (Tudor, 1995).

Video frames can be represented either by the RGB color values of each pixel or by the luminance and chrominance system. Luminance contains information about the absolute brightness of areas in the image while chrominance refers to the color information about specific areas in the image. Most video systems use luminance and chrominance due to savings in bandwidth. Experiments have shown that the human eye is not as sensitive to chrominance as luminance. This allows engineers to broadcast the same video but with chrominance transmitted at a much reduced rate as luminance without significant differences in perception. Video systems use the terms 4:2:2 and 4:2:0 to refer to this subsampling. A value of 4:2:2 tells us that the luminance field is sampled at twice the rate of the chrominance field horizontally while a value of 4:2:0 tells of a chrominance field which is subsampled by two compared to the luminance information across both the vertical and horizontal axes. If for every pixel of video we represent luminance and chrominance values with 8 bits, then a video signal of 25Hz and 720×576 resolution with 4:2:0 chroma subsampling would have 124Mbps worth of information[1]. Using MPEG-2, the final stream can have a bit rate of 3-15 Mbps depending on the amount of compression (Tudor, 1995).

Temporal Compression

Temporal redundance in video is present when the information on a single frame is highly correlated with the contents of frames preceding or succeeding it. This high correlation can be exploited to achieve temporal compression. Instead of encoding every single frame, the encoder can choose to not encode frames which are highly correlated. The decoder can simply rebuild these missing frames using information from the frames surrounding the dropped frame (Tudor, 1995).

The MPEG-2 video standard uses a method called Motion Compensated Prediction to rebuild frames from the data in the surrounding frames. From the name, the motion across frames of individual pixels is predicted. This builds upon the temporal correlation of pixels – pixels which move in a certain direction can be expected to move in the same direction for the next frames. Instead of transmitting all frame, the encoder instead creates a reference predictor frame, a motion vector which indicates the direction of movement and a prediction error vector. Additionally, the movement of pixels within the frame itself is also highly correlated. This is exploited by assuming that contiguous groups of pixels will have the same motion across frames. This reduces the amount of motion vectors that the encoder needs to create (Sikora, 1997).

With respect to temporal compression, MPEG-2 has three defined frame types – the intra (‘I’), predictive (‘P’), and Bi-directionally predictive (‘B’) frames. Intra frames serve as the reference frames and are encoded completely, only undergoing spatial compression. P frames are built from previous I or P frames and their associated motion vectors. Moreover, P frames also use spatial compression which makes P frames much more compressed than I frames. P frames may also be used as references for future predicted frames. Unlike P frames, B-frames can utilize both previous and succeeding I or P frames for motion compensated prediction. Due to the utilization of succeeding frames, B frames introduce a delay into the decoding process. The utilization of both preceding and succeeding frames gives B frames the maximum compression. An I frame is typically three times larger than a P frame while a P frame is typically two times larger than a B frame (Tudor, 1995).

The sequence at which I, P or B frames are sent are called a “Group of Pictures” or GOP. A GOP structure is defined by two parameters N and M. N refers to the number of frames in the GOP while M refers to the spacing of P frames. The MPEG-2 standard does not specify a specific GOP ordering as it leaves the encoder to decide on the best sequence of frame types depending on the video content (Tudor, 1995).

Lossy Spatial Compression using the Discrete Cosine Transform

Spatial compression relies on removing redundant picture data from within a single frame. To achieve spatial compression, MPEG encoding relies on transforming the frame from its direct pixel representation to a representation in another domain – the discrete cosine transform domain.

The discrete cosine transform is a mathematical operation that represents an original signal as a sum of harmonic cosine functions which are in-phase with the signal. For the case of video, this signal is a two dimensional sampling of the picture which creates a two dimensional matrix of coefficients representing the amplitudes of the resulting cosine harmonics. Higher ordered harmonics represent details in the picture found at higher spatial frequencies while lower ordered harmonics represent details found at lower spatial frequencies. For most signals, the bulk of the information in the signal can be found in the lower harmonics. For the DCT, the highest energy is found at the fundamental lowest harmonic representing the DC coefficient (Benoit, 2003, pp. 37-38).

The video frame is divided into square blocks eight pixels at one side. The DCT of each block is then taken which produces an 8×8 matrix of coefficients. The computation of the DCT is performed at both the luminance and chrominance representation of the picture. The difference is that the chrominance frame is subsampled compared to the luminance field according to the video scheme. The resulting 8×8 DCT matrix is arranged such that increasing horizontal and vertical frequencies are located to the right and downwards respectively. The concentration of information at the DC coefficient makes the upper left corner of the DCT the highest valued coefficient (Benoit, 2003, pp. 37-38).

Tests have shown that the humans have limited perceptual sensitivities to higher spatial frequencies. This fact is exploited in the process of quantization. During the quantization process, the coefficients corresponding to lower spatial frequencies are quantized to a greater precision than coefficients of higher frequencies. Even though this introduces inaccuracies due to the loss in resolution of the representation of higher frequencies, this loss in resolution cannot be perceived (Benoit, 2003, pp. 37-38).

Another limitation of human perception is its difficulty in perceiving low energy amplitudes across all frequencies. This is exploited in MPEG-2 encoding by the application of thresholding to the results of the DCT. Frequencies whose amplitude below a certain threshold are reduced to zero. Again, this introduces irrecoverable loss due to the discarding of the amplitudes of certain frequencies but this is not very obvious to the end user. The process of thresholding mostly reduces the original 64 coefficients of the DCT to zeroes. This reduction of the DCT matrix to a matrix full of zeroes is greatly exploited in the succeeding processes of zig-zag scan, run length coding and Huffman coding.

Source Coding

MPEG-2 also applies lossless source coding techniques to create a more compact representation of the bitstream resulting from spatial compression. At the end of quantization and thresholding, the DCT matrix is mostly composed of zeroes except for the top left corner which is still a large whole number due to most of the energy being located in the DC component. In order to convert this matrix into a stream of data, a zig-zag scan is performed along the DCT matrix (See Figure 3). With zig zag coding, coefficients enter the bit stream from the lowest harmonic up to the highest harmonic (Tudor, 1995).

The result of the zig-zag scan is a bitstream composed mostly of zeroes punctuated occasionally by integers. The large amount of zeroes stem from the thresholding process performed in the lossy compression. The high amount of zeroes benefits the use of run-length-coding. Instead of representing all zeroes, run-length simply notes the non zero values, then the number of zeroes found in between these non-zero values. As an example, the sequence 12, 0, 0, 0, 5, 3, 0, 0, 0 can be represented as 12, (0,3), 5, 3, EOB. EOB in this case is a termination character which replaces the final block of zeroes. Run length coding is able to compress the bitstream significantly thanks to the large amount of zeroes introduced by the thresholding process (Tudor, 1995).

The final stage of the source coding is Huffman encoding. In Huffman encoding, elements which occur most frequently are coded using shorter codewords while elements which occur least frequently are accorded longer codewords. Each codeword is unique as well as instantly decodable as no codeword can be a prefix of another codeword. The benefit is that instead of sending eight symbols with a constant eight bits each symbol, Huffman encoding creates a stream wherein most symbols are represented by codewords that are significantly less than eight bits in length while rare symbols are represented by much longer codewords (Benoit, 2003, pp. 32-34).

Errors and Noise in Digital Television

The process of broadcast transmission provides the possibility of introducing errors into the MPEG-2 stream. Channel effects such as fading, cross talk and interference may introduce errors into the stream received at the decoder end. To combat this, the MPEG-2 standard also includes basic channel coding in the form of CRC.

Whereas source coding removes redundancies to compress the stream, channel coding introduces redundancy into the system. The objective with the introduction of redundant bits is that these bits help the decoder detect if the received stream has not encountered error in transmission. These added bits and the method of their addition comprise the forward error correction scheme of the communications system. With a robust enough FEC system, the decoder can even detect where in the bitstream the error has occurred and can then correct the errors by itself without having to ask for a retransmission from the encoder (Benoit, 2003, pp. 105-113).

MPEG-2 applies cyclic redundancy check to ensure the integrity of the bitstream. MPEG-2 transport streams carry individual MPEG-2 transport packets with each packet having a length of 188 bytes. A cyclic redundancy check is a function which takes an arbitrary length bit stream and creates a sequence of fixed length called the checksum. This checksum is sent along with the original bitstream. At the decoder end, the CRC is computed and its resulting checksum is compared with the checksum which came with the bitstream. MPEG-2 transport packets as well as program information embedded in the packets are protected by CRC checksums (Fairhurst, 2001).

Comparison of Available Systems

MPEG-2 is one of three current video compression standards crafted by MPEG. The first standard, MPEG-1, was published in 1992. MPEG-1 was created primarily for supporting video stored on CD-ROM. The audio part of MPEG-1, MPEG layer III or MP3 has gained wide acceptance on its own as a standalone compressed music format. MPEG-1 is a medium bandwidth standard with a bitrate of up to 1.5Mbps for videos with a resolution of 352 x 240 at 30 frames per second (Lo).

As a video compression standard MPEG-2 is best thought of as a collection of various tools of compression. MPEG-2 builds on the earlier MPEG-1 standard while adding new methods of video compression. To this end, MPEG-2 decoders are backwards compatible with MPEG-1 and can successfully understand and decode MPEG-1 video streams. MPEG-2 offers several possible video resolutions as well as various compression methods. This variety allows system designers flexibility in choosing the balance between the final bit rate, decoder complexity and video resolution. MPEG-2 also introduced support for interlaced video as well as 5.1 channel sound (Benoit, 2006, p. 55).

MPEG-4 is a newer video standard optimized for distribution of video over the web. MPEG-4 is also referred to as H.264 after the ITU group which helped craft the standard. MPEG-4 promises up to 50% greater compression efficiency over MPEG-2 which has made it an attractive alternative to MPEG-2 for video broadcasting over satellite, cable, or through the internet (Benoit, 2006, p. 58).

Further developments

MPEG-2’s video compression builds upon a combination of numerical source coding techniques as well as the physical limitations of the human sensory system to create a viable system for video compression. We see that MPEG-2 is not an accident, rather that its features, specifications and standards were created by first studying the needs of the end-users as well as the usage of the final media. The move to standardized the decoding stream while leaving the encoder open ended provides great incentive for mass market adoption of the standard as it works to become the de-facto method for digital TV transmission.

Further developments in MPEG-2 would most certainly be found in direct hardware implementations of the MPEG-2 encoding and decoding process. With the looming mandate of the FCC for digital terrestrial television, there would be increased demand for MPEG-2 compliant hardware. A single IC which can decode an MPEG-2 stream by itself would certainly be a boon to designers of televisions, media players and other devices which may play MPEG-2 video.

Bibliography

Benoit, Herve., 2006, “Digital Television: Satellite, cable, terrestrial, iptv, mobile tv in the DVB framework”. 3^rd edn, Elsevier, Paris.

Fairhurst, G. 2001. “MPEG-2 Transmission”, From University of Aberdeen.Retrieved March 9, 2009 from http://www.abdn.ac.uk/~wpe028/research/future-net/digital-video/mpeg2-trans.html

Lo, R. n.d. “A beginner’s guide for MPEG-2 Standard”. From City University of Hong Kong. Retrieved March 9, 2009 from http://www.fh-friedberg.de/fachbereiche/e2/telekom-labor/zinke/mk/mpeg2beg/beginnzi.htm#MPEG Standards

MPEG Licensing Authority. “MPEG-2”. <http://www.mpegla.com/m2/>

Tudor, P.N.. “MPEG-2 Video Compression.” Electronics and Communications Engineering Journal, no. December (2005): 257-264.

Sargginson, P.A., 1996, “MPEG-2 Research and Development Report.” Research and Development Department, The British Broadcasting Corporation.

Sikora, T. 1997. Digital video coding standards. In Digital Consumer Electronics Handbook, R. K. Jurgen, Ed. McGraw-Hill, Hightstown, NJ, 83-823. URL= http://portal.acm.org/citation.cfm?id=275869.275882

Watkinson, J. 2001, MPEG Handbook, 1^st edn, Focal Press Publishing, Boston MA.

MPEG-2 Case

Related Topics