Ubuntu

Cloud based autopkgtest runs for proposed-migration

Registered by Martin Pitt on 2015-07-08

We want to replace our hackish/brittle lp:auto-package-testing (adt-britney) + jenkins + tons of little rsynced state files + running tests on manually maintained static CI lab servers. All this should move to the cloud for scalability, reducing the CI lab size, simplifying the structure of proposed-migration and make it understandable/maintainable again.

Read the full specification

Blueprint information

Status:: Complete

Approver:: Steve Langasek

Priority:: Undefined

Drafter:: Martin Pitt

Direction:: Approved

Assignee:: Martin Pitt

Definition:: Approved

Series goal:: Accepted for wily

Implementation:: Implemented

Milestone target:: ubuntu-15.08

Started by: Martin Pitt on 2015-07-08

Completed by: Martin Pitt on 2017-10-25

Related branches

Related bugs

Bug #1474437: tmpfail results lead to lost test requests	Fix Released
Bug #1474724: britney: eternal test loop for packages removed from -proposed	Fix Released
Bug #1474725: lxc tmpfails with "testbed auxverb failed with exit code 255"	Fix Released
Bug #1474734: Robustify against transient worker failures	Fix Released
Bug #1481574: Testbed setup times out waiting on ssh	Fix Released
Bug #1488879: python-ruffus kills LXC worker on ppc64el	Fix Released

Sprints

Whiteboard

This has already been discussed about two years ago with Debian: https://wiki.debian.org/debci/DistributedSpec . This went into the design and implementation of Debian's CI system "debci" which is running on http://ci.debian.net . We want to reuse as much as possible of this to get a proper web UI for results browsing.

This document describes the job distribution mechanism (AMQP), the data structure on Swift, and some necessary improvements to debci which happened long ago. But debci today doesn't provide a cloud-based backend nor does it use swift, but it stores all test logs directly on the debci machine. We don't want that as we want to be able to tear down/redeploy the entire rollout. So the primary test output data is precious and should stay in Swift, and any downloaded data in the web results browser, britney, or test execution controllers should only be a local cache.

ScalingStack Resources
------------------------------------
We need the following peak capacity:
40 instances
40 virtual CPUs
80 GB RAM
800 GB disk (this could be reduced to 400 or 200 with a custom flavor)

This would be required for some days when opening a new release (autosyncs from Debian), and about once a week for half a day when we get a new gcc, python, or similar packages which trigger lots of tests. During most days there are usually no more than 2 to 5 tests running at any given time.

Analysis/tracking of current "tmpfail" issues: http://pad.ubuntu.com/autopkgtest-cloud-tmpfail

(?)

Work Items

Work items for ubuntu-15.07:
autopkgtest: adjust ssh nova script to work with ScalingStack: DONE
autopkgtest: create --setup-script to turn a virgin standard cloud image into a suitable testing env: DONE
create PoC worker for receiving AMQP test requests, calling autopkgtest with nova/ssh, and put results into Swift: DONE
debci: allow linking to logs/artifacts directly on swift: DONE
debci: create tool to download results from swift, similar to debci-collector: DONE
deployment: set up DevOps environment with ProdStack (for controller, RabbitMQ, and web frontend) and ScalingStack (for temp testbeds): DONE
deployment: create charms for debci web frontend and worker: DONE
deployment: create script to deploy and configure all charms, plus post-install setup: DONE
deployment: create estimates of necessary CPU/RAM/disk, ask for scaling up ScalingStack: DONE
(IS) deployment: open up firewall rules for rabbit and swift from wolfe and cyclops: DONE
britney: add requesting tests through AMQP (with direct reverse dep calculation) in addition to adt-britney submit: DONE
britney: add collecting results from swift in addition to adt-britney collect: DONE
britney: let the cloud machinery work in parallel for a week or two, fix problems: DONE
deployment: add build-essential to trusty instances for test backwards compat: DONE

Work items for ubuntu-15.08:
create worker for processing requests on LXC: DONE
debci: extend debci-worker to send results to swift instead of through AMQP (upstreaming our tools): POSTPONED
debci: fix broken data/status/*/*/packages.json: DONE
deployment: update worker charm/deploy.sh to create second set of worker instances for the other ScalingStack region: DONE
deployment: add landscape and ksplice charms: DONE
deployment: move armhf/ppc64el autopkgtest slaves from jenkins to amqp/swift: DONE
deployment: add daily container update cronjob to armhf/ppc64el autopkgtest slaves: DONE
deployment: enable armhf in britney: DONE
deployment: fix current OOM killer issues on ppc64el nodes (#1488879): DONE
deployment: enable ppc64el in britney: DONE
britney: set up configurations for SRU testing: DONE
britney: switch authoritative data from adt-britney to cloud: DONE
britney: remove obsolete adt-britney code: DONE
britney: trigger DKMS packages for linux-meta* uploads (linux-meta done, linux-meta-<backport> not yet; will reimplement in terms of the above map): DONE
britney: trigger LXC for kernel uploads: DONE
update https://wiki.ubuntu.com/ProposedMigration: DONE
document the new system on https://wiki.ubuntu.com/ProposedMigration/AutopkgtestInfrastructure : DONE
dkms: backport autopkgtest helper script to precise/trusty (#1489045): DONE
autopkgtest: adjust nova script to get along with precise's cloud-init: DONE
deployment: enable precise tests (mostly for kernel/DKMS): DONE
britney: include triggering packages in AMQP requests: DONE
worker: Pass AMQP trigger to adt-run and tests ($ADT_TEST_TRIGGERS): DONE

Work items for ubuntu-15.09:
[apw] port http://people.canonical.com/~kernel/info/dkms/int-matrix.html to pull results from britney/swift instead of jenkins: DONE
worker: Install the kernel given in ADT_TEST_TRIGGER so that we can test drivers against e. g. linux-meta-lts-vivid: DONE
britney: run linux-meta-* triggered dkms packages in separate test requests, as these don't work with the "test against everything in -proposed" approach: DONE
britney: take test triggers into account when evaluating whether a result applies to a request: DONE

Work items for ubuntu-15.10:
[canonical-is-public] deployment: add enough capacity to DevOps environment (RT#84348): DONE
deployment: create mojo scripts for automated deployment: POSTPONED
britney: add uploader email notification on regressions: DONE

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.