Multimedia SW codecs optimization work on ARM

Registered by Ilias Biris

The codecs targeted for optimization on ARM during the next quarter are

libpng
      needs NEON optimization, targets Linux and Android. Initial work there for Ubuntu LEB, but not for Android.
libav related work
      - realvideo - needs NEON optimization for rv30/rv40 to reach 720p - work to be done in libav
      - ARM v6 optimizations for vp8.
       There are initial patches for armv6 vp8 (article.gmane.org/gmane.comp.video.li...) and some realvideo neon (thread.gmane.org/gmane.comp.video.lib...). Both patches need to be examined, but this should be low-hanging fruit work.

      - h264 10bit optimizations

Blueprint information

Status:
Complete
Approver:
Kurt Taylor
Priority:
Medium
Drafter:
None
Direction:
Approved
Assignee:
Mans Rullgard
Definition:
Discussion
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon lcq411
Started by
Tom Gall
Completed by
Tom Gall

Related branches

Sprints

Whiteboard

[ibiris 24Nov] List of ACTION POINTS - current status
----
 #ACTION: mru to check with cyang on how to provide an updated version of the libpng for the Android LEB.
[mansr] This should already be available in Linaro Android builds, or will be in the 11.12 release.

#ACTION: ibiris to check what are the plans with Ubuntu folks to support libpng v1.5 in face of the LTS Alpha 1 (Dec 1)
DONE - Colin Watson commented: "Our immediate upstream is Debian, and Debian testing/unstable has 1.2.46. libpng 1.5.6 is only in Debian experimental. Generally we don't sync/merge from experimental unless we have an extra-specially good reason and a strong belief that we'll be able to deal with any issues that arise.

The primary focus of Precise is stability, not features, so I think for important libraries such as libpng with enormous numbers of reverse-dependencies we'd like to stick with Debian testing if possible. This strategy makes a real difference to our ability to keep the whole distribution buildable and installable."

#ACTION: mru to review how much it is important to do h264 and if it is needed/possible to split the work in phases. Initial investigation first to scope the work.
[mansr] Libav has ~2600 lines of NEON code specific to H264, not counting NEON code shared with other codecs. All of this needs to be converted to support high bit depth.

#ACTION: mru to help with the creation of the needed blueprint / identify the work items for libvorbis neon optimisation
[mansr] I discussed this a bit with Monty from the Xiph foundation. He is of the opinion that any such decoder work is best targeted at the Tremor implementation. Originally a fixed-point re-implementation of the decoder, plans have existed for some time to extend it with floating-point and have it replace libvorbis as the primary decoder library. The Tremor source is supposedly cleaner than libvorbis, although that's not saying much.

#ACTION: mru to run preliminary benchmarking for vc1/vp6
[mansr] VP6? That used to be used a lot with Flash but has been mostly replaced by H264. It's quite hard to find VP6 content nowadays. Note https://blueprints.launchpad.net/linaro-multimedia-project/+spec/multimedia-linaro-optimize-vp6-decoding appears to be misnamed, probably really means VP8. An almost identically worded BP for VP8 exists and has been completed (by the time Linaro finished planning, the work had been done upstream, much to upstream's amusement).
VC1 is, although a published standard, also not used much any more. I can still do a profile run and get an estimate of how much work it would be to optimise the libav VC1 decoder and how much of an improvement it might bring.

#ACTION: ibiris and mru to get in touch with <email address hidden>(product manager) for some more details on freerdp
[mansr] I had a quick look at the FreeRDP source code, and there's a lot of room for optimisations there. They have some poorly done NEON intrinsics which may as well not be there. If I were to do any work there, I'd start by rewriting that code properly. The main question here is one of importance. If it is deemed worthy, there are quick gains to be made.

 #ACTION: mru to identify the possible scope for the libv4l work - how important is it?
[mansr] Checking distro packages for dependencies on libv4l should give a rough idea of where it is used. That said, the majority of ARM devices on the market today run Android, and libv4l cannot be used there. This means the importance for us is probably low for the time being.

#ACTION: mru to set a wiki page with the list of most important codecs and their status wrt NEON optimisations
[mansr] Wiki page created. Now comes the hard part of filling it with something useful and keeping it current.

Updated discussion based on the notes from Etherpad - below
 -----
 Discussing over the codecs optimisations for Multimedia WG - leading the
 discussion is Måns Rullgård

 NOTES
 * libpng:
   - currently already done some NEON optimisations - speed up is
 25-30%. Changes are already upstream
   - remaining optimisations are related to checksums and also to
 improving memcpy - there are cases of excessive memcpy in the code
 (showed up under profiling)
[mansr] upstream developers are already looking at the memcpy() issue.
We should wait and see what they come up with.

   - Also most useful thing to do next is NEON automatic detection -
 done manually now
[mansr] Runtime checking is done.

   - Other notes about png
     + Version used for the optimisations so far is 1.5 which is the
 latest upstream version. Ubuntu uses v1.2 (other distros use v1.5
 already). The difference is that v1.5 has cleaner API and cleared some
 internal structures from being exposed through the API.
     + Firefox: uses a mozilla-maintained patch to support apng (animated
 png). The patch has been rejected upstream, so mozilla folks are
 maintaining it.
     + Android: still using an old version for libpng

 #ACTION: mru to check with cyang on how to provide an updated version of the
 libpng for the Android LEB.
[mansr] This should already be available in Linaro Android builds, or will
be in next release.

 * Realvideo: used mostly for webstreaming in the past (still some small
 percentage of online streaming happens through realvideo), but the most
 significant use case is to support media players in China.
   - There is a patch upstream from Janne Grunau to optimise realvideo -
 works and could be used as a starting point, but it can also be improved
 further. Decoder is not very optimised.
   - Should do benchmarking
   - Should be low hanging fruit
   - Other notes:
     + Not for Android
[mansr] I've started working on this.

 * VP8: as an open alternative to h264. Not that huge yet, but
 strategically important (It does not have a real connection with HTML5,
 most browsers which support it can get youtube videos in this format).
 With NEON optimisations it could be done better.
   - There are some patches available on upstream mailing list, which
 should be looked at in order to verify what further improvements could
 take place
[mansr] Note that the pending patches add ARMv6 optimisations targeting Tegra2
and other systems without NEON. Both libav and libvpx already have
good NEON optimisations for VP8.

   - Straightforward task
   - Could go through the upstream review and trickle towards Ubuntu and
 Android LEBs - nothing distribution specific about this work.

 * h264: the normal h264 video is capable of 8 bits per component per
 pixel. H264 supports up to 14 bits per component, though not necessarily all
 would be used. However, using 10 bits out of 8 you have some benefits most
 importantly 10-bit h264 can achieve a same level of
 quality as 8-bit h264, but at a lower filesize (compression is improved
 if encoded using 10bit instead of 8). Also for higher bitrate sources
 more quality can be preserved.
   - However on the decoder side it needs full NEON optimisation. That
 could be potentially quite a lot of work. It is not of immediate need,
 it will still be quite slow for any reasonable resolution on the current
 ARM HW (expected to get better in the next gen ARM processors however).
 However it is strategically important.
   - The work itself is relatively straightforward, the problem is that
 it is a lot of it. Basically need to revise all the component-based
 operations and widen them to handle 10 bits instead of 8. Some of these
 functions are less trivial.

 #ACTION: mru to review how much it is important to do and if needed to
 split the work in phases. Initial investigation first to scope the work.
[mansr] Libav has ~2600 lines of NEON code specific to H264, not counting NEON
code shared with other codecs. All of this needs to be converted to
support high bit depth.

 * Other requests:

 * libvorbis NEON optimisations: vorbis codec exists and is well
 optimised in libav, but in LGPL code. The work would release a
 libvorbis optimisation so that it can be used for commercial/Android
 projects (libvorbis is BSD based).
 #ACTION: mru to help with the creation of the needed blueprint /
 identify the work items for this work
[mansr] I discussed this a bit with Monty from the Xiph foundation. He is
of the opinion that any such decoder work is best targeted at the
Tremor implementation. Originally a fixed-point re-implementation
of the decoder, plans have existed for some time to extend it with
floating-point and have it replace libvorbis as the primary decoder
library. The Tremor source is supposedly cleaner than libvorbis,
although that's not saying much.

 * VC1/VP6: also reverse engineered and supported in libav but we could
 do some more benchmarking to define if there is something to be further
 optimised
 #ACTION: mru to run preliminary benchmarking for these codecs
[mansr] VP6? That used to be used a lot with Flash but has been mostly replaced
by H264. It's quite hard to find VP6 content nowadays. Note
https://blueprints.launchpad.net/linaro-multimedia-project/+spec/multimedia-linaro-optimize-vp6-decoding
appears to be misnamed, probably really means VP8. An almost identically
worded BP for VP8 exists and has been completed (by the time Linaro
finished planning, the work had been done upstream, much to upstream's
amusement).

VC1 is, although a published standard, also not used much any more.
I can still do a profile run and get an estimate of how much work
it would be to optimise the libav VC1 decoder and how much of an
improvement it might bring.

 * Flash Video: compiles and runs but is quite slow apparently on ARM
 (Konstantinos commented). Which flash video codec - there are about a
 dozen: the flash plugin from the browser supports already a number of
 possible alternative formats, for example Sorenson H.263 - which is
 close to mp4hd. Flash does not yet support the webm formats. Not clear
 need there.
[mansr] Need or not, there is nothing we can do. Flash is closed source.

 * Freerdp: have set of codecs which are used to render video and audio
 across connected machines. It would be interesting to check how to
 profile the code running on ARM-based linux platforms, in order to
 confirm what kind of optimisations would be possible. This could be
 useful also for cases like gaming for example. These are MS codecs but
 based on open specifications made available under Apache license.
 #ACTION: ibiris and mru to get in touch with <email address hidden>
 (product manager) for some more details
[mansr] I had a quick look at the FreeRDP source code, and there's a lot of
room for optimisations there. They have some poorly done NEON
intrinsics which may as well not be there. If I were to do any work
there, I'd start by rewriting that code properly.

The main question here is one of importance. If it is deemed worthy,
there are quick gains to be made.

* MJPEG: based on jpeg image procession - possibility to use
 libjpeg-turbo optimised code? Not clear need.
[mansr] MJPEG used to be common in digital cameras some years ago. MJPEG
video is simply a sequence of JPEG images and can be decoded with
any standard JPEG decoder. This does not seem to warrant our
interest at the moment.

 * libv4l: used some times for webcam cameras which support strange image
 formats - compressed images really but in a format which needs to be
 processed a bit before used by other tools (The libv4l library provides
 userland applications with pixel decoding services meant to be used by
 other programs).
 #ACTION: mru to identify the possible scope for this work - how
 important is it?
[mansr] Checking distro packages for dependencies on libv4l should give a
rough idea of where it is used. That said, the majority of ARM
devices on the market today run Android, and libv4l cannot be used
there. This means the importance for us is probably low for the
time being.

 * libav : has support for hundreds of codecs many of which are meant for
 very specific uses (eg some specific format or application - like a
 game). We should list the state of the most important codecs relative to
 their NEON optimisation and prioritise based on that list.

 #ACTION: mru to set a wiki page with the list of most important codecs
 and their status wrt NEON optimisations.
[mansr] Wiki page created. Now comes the hard part of filling it with something
useful and keeping it current.

(?)

Work Items