Android Build Service: Past and Future

Registered by Zach Pfeffer

In this session we will consider Android Build System performance during last 6 months, what was good, what bad, what was done from LDS11.05 plans, what was not, how recent Android team workflow affects older plans. Then, we'll consider improvements to Android Build. This may include: tabbed build organization, commandline build launching, easier parameterization, test-result co-location, etc.

Goals:
Create a set of plans and milestones to improve how android-build looks and runs.

Blueprint information

Status:
Complete
Approver:
Zach Pfeffer
Priority:
Undefined
Drafter:
Paul Sokolovsky
Direction:
Needs approval
Assignee:
Paul Sokolovsky
Definition:
Approved
Series goal:
None
Implementation:
Implemented
Milestone target:
None
Started by
Paul Sokolovsky
Completed by
Paul Sokolovsky

Related branches

Sprints

Whiteboard

Etherpad: http://summit.linaro.org/uds-p/meeting/19341/linaro-platforms-lc4.11-android-build/ (see below for backup copy)

Fun stats:
Currently there're 1084 builds in the system, 1390.06hrs cloud-machine-hours were spent doing them. (Keep in mind old/unused builds are expired).

Things done over the last half year:
1. Resolved builds slave reuse issues (didn't have issues in months)
2. Reliable update deployment procedure (with easy rollback for example)
3. Sandboxed development/testing support (especially good for cross-team work)
4. Build archive disk space issue resolved (expected to handle ~1 year worth of builds)
5. Largely resolved upstream downtime independence issue (for 1+ month AOSP servers down, we kept working, served as a hub for community access)
6. LAVA integration (finished builds can be submitted for testing)
7. Gradual UI look and feel improvements (build description/instructions, LAVA test results display, more accessible build logs, fast HTTP downloads, MD5SUMS, etc.)

Things planned but not done during the last half year (and priority reassessment):
1. Build slave linger time control: currently Jenkins has 30min hardcoded, wastes resources. But as we actually grew over 1hr build time and queued builds, became lesser concern than initially raised. Still worth pursuing once we optimize build time.
2. Build slave separation on cloud instance level: don't reuse a slave for two builds. This was largely alleviated by proper slave (re)init. Original userstories for these feature were private builds and community-managed builds, but we have no requests for such so far. Also, Jenkins supports separation on the slave groups/types level - it was broken when we started, but works now. Assessment: nice to have, but low priority until we hit case where we have real usecase and nothing else works.
3. Compilation time improvements using ccache. This wasn't expected to be an easy task (complexity for doing it in the cloud, reliability, correctness and security concerns), so wasn't started. And we're actually getting to improving source checkout performance as that most non-reliable and non-scalable part of build process. Also, some recent build process changes are not compatible with cached compilation approach: "build tip platform with tip toolchain" approach essentially would mean we'd need to invalidate cache every day, not getting any benefit of it.
4. Generalization. Originally known as Linaro Cloud Buildd, and was one of first working cloud build systems in Linaro, but other systems catch up or were introduced, so there's no urge to generalize it to be able to handle arbitrary builds.

Overall assessment of the current state:
1. The system is good, on par with industry baseline (which is, well, that there's some random error stream)
2. Good cross-team involvement: Android and Validation teams regularly provide patches, uses technology and experience from Toolchain team.
3. As we pump more builds into it, scalability issues aggravate.
4. As we do more advanced things with it (automated AI-driven CI), accountability and reliability issues aggravate.

Currently open fronts of work:
1. CI loop (needed changes: better stability, better performance, better classification of build errors)
2. Seeded builds (improves source checkout performance, improves scalability of parallel builds)
3. Stop hiding Jenkins, cooperate with advanced features/UI it provides.
4. Small UI improvements here and there (adhoc).

Ideas and requests for future:
1. Click-thru license protection for particular build downloads - requested for Snowball.
2. Email notifications (to a predefined mailing list, to which anyone can subscribe and use mail client filtering to get "interesting" results, or per-job configurable email?)
3. Comprehensive UI restructuring ?? (targets: make downloads more visible and easier to deal with for community)
4. (Further) cloud resource usage optimization.

======================

Backup copy of etherpad:
Welcome to Ubuntu Developer Summit!

A historical review of the Android build system.

    It's a young system.
    A lot of builds which takes a lot of machine hours...
    Processes have been established and builds do work.
    It's integrated with LAVA

See bp for bullets about planned but not completed work items.

2. Build slave separation on cloud instance level:
    private builds could possibly be solved using a completely separate system
    click through licenses perhaps not needed if ssh keys from for instance igloocommunity.org are used.

Improving build times
    cchache: no need for it now
    Zach: improvements should happen in the build config so they're available both locally and in the cloud

Decision: Need to be able to scale to 10 concurrent Android builds
    if more than an acceptable number of builds are submitted, the system should at least not freeze. queue the rest of the builds?
    git seems to be the bottle neck

No manually triggered builds anymore, the CI automation triggers builds.

Build failures which are caused by infrastructure failures should be filtered out.

Triggering builds on change, rather than user or machince actions.

The android-build.l.o front end is an important first contact for the users
    Zach wants only the latest releases and tips visible
    instructions for building etc should be available
    "Save this build as a release"

Click through licenses
   We need a popup presented to the user if they request a build which a vendor requires to be protected by a license.
   origen.android-build.l.o igloo.android-build.l.o etc to not display the other vendor build and provide a sense of security by separation

Saving builds
    save the last 5 builds for personal builds
    save the last 10 (exact number being debated) builds for tip
    we do save all releases

Improving the interface between android-build and LAVA

Parametrization of Android builds.
    build sets for sets of toolchains etc etc "Matrix builds"

Anybody should be able to create builds. Asac requirement

"Keep" button to protect builds from being trashed.

"Available for release" button to mark a build as suitable for release

#include for manifests

"Kill a build" feature in the front end.

Need to troubleshoot failed builds. ssh in to see what happened. "sort of high" prio

(?)

Work Items