Transcoding and codec optimization: tips & tricks
Programming Video Streaming Software : Optimizing the codec can be done by reducing the time to encode and/or decode a file/stream. We can also enhance the engine by reducing the CPU utilization, which lets us pack more features or data into the same time frame: for example, more voices to represent more people in a game. Finally, we need to cut down the size for size sensitive or mobile applications since media applications exist in desktop, laptop, PDA, and smartphone form factors. Read this article to get tips.

by Khang Nguyen, senior applications engineer, Software and Solutions Group, Intel Corp.

Media transcoding, which enables media interoperation, plays an important role in the digital home. The Intel Networked Media Product Requirements (INMPR) promotes interoperation between networked devices in the digital home. Optimizing the codec engine (the encoder/decoder, the heart of the transcoder) will make the media transcoding process more efficient, in turn improving the user experience in the digital home. This paper features practical tips and tricks on how to increase the performance of the codec engine. These tips include using Intel® VTune™ Performance Analyzer events, OpenMP for threading, and Prescott New Instructions (Streaming SIMD Extensions 3 (SSE3)). We also discuss when to use faster instructions, employing different execution units to improve parallelism, and when to use MMX™ instead of SSE for speed. You will also learn when to take advantage of the Intel compiler optimized switches.

What is Transcoding?
Since content comes in many different formats, transcoding is necessary to tailor the content, converting one media format to another, before it arrives at the other device. The most common way to convert one media format to another is to first decode to raw data, then encode to the target format. Since an MPEG stream consists of audio and video, we need to split these separately and decode them into raw data before re-encoding them to the desired formats and merging them again.


codec optimization

Codec Optimization
Codec is the compressed and decompressed process. It is the heart, or engine, of the transcoder.

Optimizing the codec can be done by reducing the time to encode and/or decode a file/stream. We can also enhance the engine by reducing the CPU utilization, which lets us pack more features or data into the same time frame: for example, more voices to represent more people in a game. Finally, we need to cut down the size for size sensitive or mobile applications since media applications exist in desktop, laptop, PDA, and smartphone form factors.

General
The optimized process starts with the following steps:

  • Use better hardware
  • Use the Intel VTune Performance Analyzer to find hotspots
  • Look at functions that have highest clock ticks and clock ticks per instruction retired (CPI)
  • Turn on counters for branch misprediction, store forwarding, 64K aliasing, cache split, and trace cache miss
  • Follow general optimization rules
  • Loop unrolling, reduce branching, use SSE2/SSE3
  • Use the Intel compiler
  • Use the Intel Performance Library Suite
  • Follow general optimization rules
Cautions
Observe the following steps at all time:
  • All pitfalls applied (cache split, branch misprediction, store forwarding, etc.)
  • Thread at the highest level possible to avoid running out of resources. Since this is an engine that is used by other applications, its functions can be called many times, especially since the applications are also threaded.
  • Pay attention when threading applications that make use of Intel performance libraries, since some of their functions are threaded.
  • Do not unroll loops too much to avoid trace cache thrash.
  • Do not ignore MMX, since it can be faster than SSE/SSE2 in cases when applications make extensive use of 64-bit data, and it takes effort to rearrange the data to fit into 128-bit registers.
  • Watch out for battery life on mobile applications.
Tips & Tricks
  • Use Intel compiler: /O3, /QaxW, /QaxN, /QaxP, /Qipo, /Qparallel, /Qopenmp. Often you can gain a significant amount of performance just by using the Intel compiler with the right switches.
  • Use special functions like reciprocal (rcp and rcp_nr) to replace division with multiplication and speedup the application.
  • Use SSE3 instruction LDDQU instead of MOVDQU whenever possible.
Tips & Tricks Using Assembly Language
  • Faster instructions
  • Different execution units
  • MOVNTxx: Store values using Non-Temporal Hint to prevent caching of the data.
  • Use combined instruction like PMADDWD.
    • Examples When to Use Thread
      Before:

      
      for (i=0; i<4; i++)
         	EncodeTest(Mem[i], Blk[i],Chunk[i]);
      

      After:

      
      #pragma omp parallel sections
      
      	#pragma omp section
      	   EncodeTest(Blk[0], Blk[0],Chunk[0]);
      	#pragma omp section
      	   EncodeTest(Blk[1],Blk[1],Chunk[1]);
      	#pragma omp section
      	   EncodeTest(Blk[2], Blk[2],Chunk[2])
      	#pragma omp section
      	   EncodeTest(Blk[3], Blk[3],Chunk[3]);
      	
      

      When Not to Use Thread
      Before:

      
            ...				  
      for (j=0; j<4; j++)	
         for (i=0; i>4; i++)    	   
      	 test[i][j] = list[fr]->img[i][j]+t[s]; 	     
      ...
      

      After:

      
      ...
      #pragma omp parallel for
      for (j=0; j<4; j++)
      	for (i=0; i<4; i++)
      	test[i][j] = list[fr]->img[i][j]+t[s]; 
      ...
      

      At first, this loop seems to be a good candidate for threading. In fact, it will improve the performance if it is at the outermost level. However, if this loop is in a function that is deeply buried in many sub-levels, threading it may mean running out of resources. In one case, this loop was implemented within a function that only takes about 8.8% of the total execution time. After threading only 2 loops, it degraded the whole system down to 5X slower.


Subscribers who liked this article also read:
The "Rich-Client" Advantage for .NET Web Services
by Dan Fineberg, enterprise/business marketing manager, and Gary Hayco...

If you're interested in this topic, these articles may be helpful:

Writing robust code
by Glen McCluskey, Glen McCluskey & Associates LLC Many of the te...
Web services essentials: code examples
by Ethan Cerami, O'Reilly Media Inc. This .zip file contains code e...
Optimize Game Code for Better Real-Time Physics
Gamers are constantly looking for the next hot playing experience. Gam...
Getting the bubbles out of code: designing for the Itanium 2 processor
by Andrew Binstock, principal analyst, Pacific Data Works LLC. Intel C...
Web Code Optimization: Google does it. Yahoo! does it. Why don't you do it?
by Tad Fleshman. Port80 Software Inc. Google and Yahoo! know that s...

Related Jobs:

Software Design Engineer #2333SJ - NC - Morrisville - Atmel Corporation
Atmel Corporation has an openning for a Software Design Engineer locat...
Senior Software Engineer - CA - San Diego - Musicmatch, Inc
SENIOR SOFTWARE ENGINEER The Windows Senior Software Engineer, work...
System Architect/Hardware Engineer #2329 - CA - San Jose - Flextronics Corporation
System Architect/Hardware Engineer Job ID: 2329 Location: San Jos...
Software Development Engineer #132331 - WA - Redmond - Microsoft Corporation
Help us reach the goal of $1Billion in sales by 2008 for smartphones a...
J2ME GUI Specialist #SIP221 - Canada - Ottawa - SIPquest
Requirements Minimum 3 years of professional Java GUI programming exp...
Software Development Engineer in Test #131958 - WA - Redmond - Microsoft Corporation
Interested in assuring quality concerning some of the most exciting, f...
Product Development Engineers - Mixed Signal ICs #G1660117 - CA - San Diego - QUALCOMM Incorporated
The Mixed Signal IC product engineering group has multiple openings fo...
Senior Web Programmer #043-05 - MN - Mendota Heights - Internet Broadcasting Systems
Internet Broadcasting is looking for an individual with strong softwar...
Multimedia Firmware Engineer - CA - Milpitas - Sigma Designs, Inc.
Description of duties and responsibilities: Responsible for embedde...
Firmware Engineer - CA - Milpitas - Sigma Designs, Inc.
Job Description: Involves embedded DSP firmware development for se...