Reducing installation/CD footprint

Registered by Martin Pitt

Evaluate steps to reduce the installation footprint as well as size of CD images. Examples:

 * Remove unnecessary files (changelogs? other docs?)
 * Remove unnecessary packages (old x.org drivers? printer drivers -> jockey?)
 * Reduce rsyslog duplication
 * Compress files (e. g. apt indexes, see https://wiki.ubuntu.com/ReducingDiskFootprint)
 * Remove/reduce language runtime (Perl/Python/Erlang)
 * Optimize PNGs and SVGs (https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2010-May/011504.html)
 * Provide dman-like web fallback if manpage isn't installed locally

Also see

  https://wiki.ubuntu.com/PaulSladen/OpenOfficeL10nCompression
  https://wiki.ubuntu.com/PaulSladen/LangpackCompression
  https://wiki.ubuntu.com/PaulSladen/DLoop

Blueprint information

Status:
Complete
Approver:
Sebastien Bacher
Priority:
High
Drafter:
Martin Pitt
Direction:
Needs approval
Assignee:
Martin Pitt
Definition:
Approved
Series goal:
Accepted for natty
Implementation:
Implemented
Milestone target:
None
Started by
Martin Pitt
Completed by
Martin Pitt

Whiteboard

GOAL
====
Things that will increase CD size in Natty:
- GTK 3 stack (~ 20 MB); we'll try hard to port most stuff, but some GTK2 rdepends will most likely stay
- unity (2.5 MB)
- gallium drivers (14 MB)
- banshee instead of Rhythmbox (4.7 MB)
- total: ~ 41 MB

Work items (natty-alpha-2):
[kate.stewart] confirm with legal that we can ship without package changelogs: DONE
Truncate changelog.Debian.gz in pkgbinarymangler (10 MB): DONE
Make "changelogs.ubuntu.com" configurable in apt-changelog: DONE
Remove changelog.gz in pkgbinarymangler (17 MB): DONE
Provide apt-changelog wrapper to fetch it from changelogs.ubuntu.com: DONE
Change apt-listchanges to fall back to using apt-changelog: DONE
Integrate optipng into pkgbinarymangler (MIR is in linked bug): DONE
package scour (MIR is in linked bug): DONE
write dh_scour: DONE
integrate dh_scour into cdbs: DONE
investigate maturity of advancecomp (MIR in linked bug): DONE
integrate advancecomp into pkgbinarymangler for further PNG compression: DONE
investigate maturity of jpegoptim (see below): DONE
package user admin bit of GNOME 3: POSTPONED
(perl removal) Remove gnome-system-tools from seeds, replace with user admin tool from GNOME 3 (GNOME 3 bits postponed from natty): POSTPONED
(perl removal) port debconf from libgnome2-perl to libgtk2-perl: DONE
(perl removal) Ensure gnome-terminal/ncurses debconf fallback if libgnome-perl is missing: DONE
(perl removal) Fix libgnome2-perl recommends of gdebi/synaptics/apturl to libgtk2-perl: DONE

Work items:
Rebuild default install packages with PNG files against optipng mangler (5.5 MB): DONE
Rebuild default install packages with SVG files against cdbs scour'ed build system (7 MB): DONE
Build branding-ubuntu with dh_scour: DONE
Build humanity-icon-theme with dh_scour: DONE
Build shotwell with dh_scour: DONE
Build simple-scan with dh_scour: DONE
Build app-install-data with dh_scour: DONE
Build notify-osd-icons with dh_scour: DONE
Build software-center with dh_scour: DONE
Build ubiquity with dh_scour: DONE
Build ubuntu-mono with dh_scour: DONE
Run jpegoptim on ubuntu-wallpapers: DONE
If we still need more space, drop evolution-couchdb from default install, which drops couchdb and erlang (6.7 MB): DONE
[nataliabidart] If we drop evo-couchdb, install evolution-couchdb when enabling it in the U1 control panel: DONE
[raof] dynamically link DRI mesa drivers (~ 25 MB): DONE
drop perl dependency from binfmt-support: DONE
[broder] drop perl dependency from cups-driver-gutenprint: DONE
drop perl dependency from defoma: DONE
drop perl dependency from doc-base: DONE
drop perl dependency from foomatic-filters: DONE
[broder] drop perl dependency from gstreamer0.10-plugins-base-apps: DONE
drop perl dependency from libnss-mdns: DONE
drop perl dependency from libpurple0: DONE
drop perl dependency from lm-sensors: DONE
drop perl dependency from sgml-base: DONE
drop perl dependency from sgml-data: DONE
(install optimizations) enable compressed apt indexes (too many regressions): DROPPED
(install optimizations) don't create srcpkgcache.bin in apt (still needs work in apt): DROPPED
(install optimizations) configure rsyslog to just log to one single file, to avoid redundancy: DONE
[laney] Investigate areas for banshee diet, notably splitting out less used plugins and dropping sqlite2 dependency: DONE

Dropped items for Perl removal (needs further discussion with upstream/Debian); also, AppArmor relies heavily on Perl:
Split out required modules from perl-modules and fix default install packages to only depend on those:
drop perl dependency from apparmor-utils:
drop perl dependency from foomatic-db-engine:
drop perl dependency from libapparmor-perl:
drop perl dependency from libfile-copy-recursive-perl:
drop perl dependency from libhtml-parser-perl:
drop perl dependency from libhtml-tagset-perl:
drop perl dependency from libhtml-tree-perl:
drop perl dependency from libmldbm-perl:
drop perl dependency from librpc-xml-perl:
drop perl dependency from libterm-readkey-perl:
drop perl dependency from liburi-perl:
drop perl dependency from libwww-perl:
drop perl dependency from libxml-parser-perl:

OTHER DISCUSSION
================

Remove Perl (8 MB compressed, 43 MB uncompressed)
 - Lots of work, but worth it, since it won't be missed
 - Ideal: Only use perl-base
 - Fallback: compile list of Perl modules which we need for default system and split it out
 - Major rdepends:
   + gnome-system-tools: need user/groups replacement; pulls in entire glib/gtk/pango/gnome-perl stack (another couple of MB)
   + synaptic uses gtk-perl for shiny debconf/dpkg questions
   + defoma: being obsoleted in Debian, just 5 rdepends; needs dropping dependency and calling update-gsfontmap instead of defoma
   + foomatic: needs input from Till, is perl-base enough?
   + apparmor-utils
   + sgml-data: unnecessary dependency
   + sgml-base: only needs perl-base
   + about 20 installed Perl libraries: check imports, might only need perl-base
   + binfmt-support: C rewrite in progress

OpenOffice.org langpack optimizations (~ 18 uncompressed(?)/ 5 MB compressed per language, we only ship English on CDs)
 - https://wiki.ubuntu.com/PaulSladen/OpenOfficeL10nCompression
 - nontrivial, needs OO.o maintainer, but definitively worth looking at

Duplicate copies of Unicode in gucharmap and others (icu, ghostscript, locales, cups)

jpegoptim
--------------
We only ship 134 jpg files in a standard installation (7.5 MB). A third of those is in OpenOffice, but they only account for 450 kB. The only packages which would noticeably benefit from using jpegoptim are gnome-screensaver and ubuntu-wallpapers, as the jpg files in /usr/share/backgrounds account for 6.7 MB. jpegoptim only gives a negligible improvement in gnome-screensaver, though. Packages which ship a lot of jpg files should just use jpegoptim directly instead of adding it to pkgbinarymangler (which can potentially break a whole lot more).

2010-11-13 (q-funk): An awful lot of packages ship old change logs, as a file separate from the upstream or Debian ones. I narrowed down the regex using this recipe: "find /usr/share/doc/ -name '*old*' | grep changelog | cut -d'/' -f6- | sort | uniq"

2010-10-30 (TheMuso): I am concerned about yelp switching to webkit. Could we possibly consider sticking with the status quo if webkit is not sufficiently accessible by feature freeze? I'd be happy to follow up with upstream about the current status of webkit accessibility.

2010-10-30 pitti: ah, good point; I dropped that from the list for now.

2010-01-01 sladen: print out all of the currently compressed files that are the same number of 4KiB sectors compressed or uncompressed: for i in `find /usr/share/doc/ -name \*.gz | grep -v changelog` ; do wc -c $i | awk '{printf "%s %d ", $2, $1}'; gunzip -cd $i | wc -c ; done | awk '{if (int($2/4096) == int($3/4096)) print $1,$2,$3}'

2010-10-05 seb128: nice work, approving the blueprint for natty

2010-11-25 mike: http://raphaelhertzog.com/2010/11/15/save-disk-space-by-excluding-useless-files-with-dpkg/

2010-12-18 pitti: Defer perl removal bits for natty; this needs further discussion with upstream and Debian, which is stuck ATM.

(?)

Work Items