Optimize JPEG Decoding for ARM

Registered by Alexander Sack on 2010-10-19

This specification is for porting and optimizing a JPEG decoder library for ARM Cortex A9. It is intended that we will start from an existing open source code base of popular JPEG library that is most conducive to SIMD optimizations and then introduce NEON optimization backend to it. We hope to create a drop in replacement for IJG’s libjpeg library with higher performance without breaking compatibility with existing applications.

Blueprint information

Status:
Complete
Approver:
Alexander Sack
Priority:
Undefined
Drafter:
Mandeep Kumar
Direction:
Needs approval
Assignee:
Mandeep Kumar
Definition:
Approved
Series goal:
Accepted for 1.1.x
Implementation:
Implemented
Milestone target:
milestone icon 1.1.0-2011.05
Started by
Mandeep Kumar on 2010-12-15
Completed by
Ilias Biris on 2011-06-30

Related branches

Sprints

Whiteboard

Work Items (11.01):
Download, Compile and Test libjpeg-turbo on PC: DONE
Identify, Compile and Run Test code to benchmark performance of default package: DONE
Modify code to disable all Intel SIMD and run non-SIMD and compare performance with SIMD version: DONE
Port Android's NEON enabled JPEG (version 6b) to Linux: DONE
Verify work done by meego: DONE
Porting function 1 of Android's NEON routines into 8b version of libjpeg on Linaro.: DONE
Get Oprofile setup working: DONE
Profile 8b NEON accelerated libjpeg codec on i.mx51 processor: DONE
Profile 8b NEON accelerated libjpeg codec on OMAP4: DONE
Profile 8b NEON accelerated libjpeg codec on U8500 processor: DONE
Profile libjpeg-turbo that is hosted on meego project: DONE
Find preliminary hotspot functions in libjpeg library: DONE
Install DS5 & run simple application on DS5: DONE
Verify function 1 of Android's NEON routines: DONE
Port function 2 of Android's NEON routines: DONE
Verify function 2 of Android's NEON routines: DONE
Port function 3 of Android's NEON routines: DONE
Verify function 3 of Android's NEON routines: DONE
Change Application to output performance numbers for jpeg decoding: DONE
Try meego libjpeg-turbo on Linux: DONE
Run and estimate performance gain got from Android's NEON routines: DONE
Compare performance gains of android port and meego: DONE

Work Items (11.02):
Study and identify YUV to RGB Conversion requirement for Android: DONE
Discuss and plan for coding of routines along with Mans: DONE
Design color conversion functions using plain C code: DONE
Covert YUV->RGB conversion functions to NEON assembly: DONE
Write Validation code for YUV->RGB conversion routine: DONE

Work Items (11.03):
Measure performance improvement of optimized libjpeg-8b: DONE

Work Items (11.04):
Port all optimisations of libjpeg-8b to libjpeg-turbo: DONE

Work Items (11.05):
Add CMYK->RGB color conversion routine in libjpeg-turbo: DONE
Add YCCK->RGB color conversion routine in libjpeg-turbo: DONE
Add benchmarking support in djpeg application: DONE
Measure performance improvement of optimized libjpeg-turbo codec: DONE

Comments:

Final measured results for a 12 Mpixel image on OMAP4
Command used: djpeg 12mp.jpeg > /dev/null

Non Optimized libjpeg-turbo(5 runs):
     Decoding Time for Run 1: 2022 ms
     Decoding Time for Run 2: 2029 ms
     Decoding Time for Run 3: 2165 ms
     Decoding Time for Run 4: 2027 ms
     Decoding Time for Run 5: 2150 ms
Median Decoding Time: 2029 ms

Linaro's Optimized libjpeg-turbo (5 runs)
     Decoding Time for Run 1: 1634 ms
     Decoding Time for Run 2: 1634 ms
     Decoding Time for Run 3: 1636 ms
     Decoding Time for Run 4: 1739 ms
     Decoding Time for Run 5: 1738 ms
Median Decoding Time: 1636 ms

Percentage Improvement: ((2029 - 1636) / 2029 )*100 = 19.37%

[asac Dec 16]: work items are not done properly; syntax is wrong. See https://wiki.linaro.org/Process/WorkItemsHowto
[asac Dec 16]: ensure that you have delivery elements for each important step. So for instance there could have been a work item to publish the android port code after the port item etc. Please extend accordingly. Also include platform integration work items for jpeg-turbo ... we want to see this integrated and proofed with one real life application like firefox or something else that uses libjpeg etc.
[asac Dec 16]: update the wiki to include the additional work items from above (e.g. delivery and delivery into platform and testing with real life application)
[asac Dec 16]: set back to review
[abbra Dec 23]: I have now published our ARM Neon optimizations which Nokia has done as part of N900 development (available in N900 PR1.2 onwards) and ported over to libjpeg-turbo at http://maemo.gitorious.org/meego-image-editor/libjpeg-turbo
[abbra Dec 23]: The code works and gives performance boost but there are some regressions as well, compared to old libjpeg6b+same optimizations most probably due to overhead spent in preparing for decoding in libjpeg-turbo when decoding small thumbnails -- the latter case was 2x better in libjpeg6b, so some work is needed.
[asac Dec 28]: Thanks abbra!!
[asac Dec 28]: fixing work items syntax!
[asac Jan 3]: approved to unblock this spec.
[mkum Jan 19]: Updated blueprint with work Items that were achieved during Dallas Sprint
[rajeev feb 01]: Lot of tasks identified for this cycle (Jan month) still not complete. Most of them are for Android which is not yet in main line. Please replan these tasks and intimate it.
[mkum Feb 07]: Updated Task status and changed work items to reflect changed direction for optimization.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.