certify-planning

Improvements to the release process of Checkbox

Registered by Ara Pulido on 2012-10-08

Checkbox is a tool that is used by a number of parties in their day to day work, and so they depend upon it to be somewhat reliable and not break often. For this to be the case, Checkbox requires a release processs (to indicate when and how it is being released) and a test plan which indicates which functionality is going to be tested to try and ensure that it's working. At the moment it has neither.

== Release Cadence ==

One of the most important aspects of a release process is to have a regular, predictable schedule. This allows users of the software to plan for forthcoming changes and for arrangements to be made to have resources available for proper testing. Also, for testing to be effective there must be a period of stabalisation where critical bugs which were found during testing can be fixed. The following is the proposal for the initial cadence of two weeks:

Days 1-8: All merges may be accepted into trunk, no restrictions
Days 9-12: Only bug fixes may be accepted into trunk and manual testing begins at this point
Days 13-14: Only fixes for critical bugs identified by testing may be accepted into trunk. Manual testing is rerun to confirm bug fixes are effective and that no further regressions occur.

In addition on day 13, there will be a release meeting involving the release co-ordinator and a representative from each stakeholder to go over the major issues found so that everyone can be aware of them. Checkbox will be released on the 14th day of the cadence.

In order for work to continue on developing new features and fixing bugs during this freeze period, a scheme will be worked out to branch the code from which we will release Checkbox at a particular point. This will prevent potentially important work from being delayed while maintaining stability.

Going forward into the next cycle, the plan is to shorten the cadence down to as little as one week. This can only be achieved by increasing automated testing and making sure that there is good confidence in the effectiveness of that testing. In this way 'freeze' periods (which slow development) can be kept to a minimum and improvements brought to the users faster.

The ultimate goal would be for everyone involved to have strong enough confidence in the automated testing that they are willing to accept this as assurance that a version of Checkbox is sound enough to use.

== Routine Automated Regression Testing ==

In order to provide a solid foundation on which to perform more thorough testing prior to release, it is important to have a strong automated test suite in place that can make sure the fundamental functions and features of Checkbox are working properly. Since Checkbox is a fairly complex piece of software and also heavily geared around user interaction in most cases (although it does support running 'headless') it will take some effort to fully automate everything. More to the point, in terms of time resources 'routine automated regression testing' must be kept distinct from automated testing in general. These tests are intended to run pre-merge, on every merge. There must therefore be a limit on the amount of time they take to run. More than a few minutes and it may be necessary to start splitting the tests out.

At the moment Checkbox does have something in the way of a test suite that fits this description, but it is still in its infancy and requires more effort. At the moment the strategy for expanding it is to add tests on a case-by-case basis whenever there is a 'test escape' which looks easily automatable. This is a decent strategy, but going forward some measure of coverage and a more structured approach would be desirable. Checkbox being a (mainly) Python based application, there are several tools which will aid in measuring coverage (http://pypi.python.org/pypi/coverage/)

== Pre-release Manual Testing/Extensive Automated Testing ==

The final line of defence before release needs to be thorough and cover as many of the most important use-cases as possible.Since Checkbox is based to a large extend around manual testing and is primarily used through its graphical user interface, most of the more extensive testing will have to be geared around real or mock 'manual' testing - i.e. manipulation of the graphical user interface. The most important thing to get right here is to properly prioritise the tests that will be run so that the test cases which, if they were to fail it would be considered critical, are introduced and run first. A list of potential use-cases can be seen here:

https://docs.google.com/a/canonical.com/spreadsheet/ccc?key=0AhbvF3mVZ2BadG9GRXcwdmdZVTFiOW9JT21wbEY1S1E

Initially this testing will have to be manual, but there are tools available which allow for automation of UI interaction (http://xpresser.com/). Still this type of testing would need to be kept distinct from automated regression testing, as it would potentially still take a long time to run.

A further point to be considered is; where to keep the test case definitions? Even if the test cases are automated with a tool such as Xpresser, it is important to store some kind of formal definition for maintenance purposes so that if an automated test breaks then its intent can be kept. Some liason should be made with the Platform QA team to see what they recommend, but it would be possible to fall back on a simple spreadsheet.

During this cycle the goal will be to implement all of the use cases mentioned in the spreadsheet above (except where a use case is dropped by consensus), initially as manual test cases and at least half of the test cases should be automated.

== Versioning/Branch management ==

Not directly related to quality, but still and important aspect of releasing software, is versioning.

Blueprint information

Status:: Not started

Approver:: Ara Pulido

Priority:: Undefined

Drafter:: Brendan Donegan

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: None

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Sprints

uds-r

Whiteboard

== Definition of done ==

- Checkbox is released on a weekly basis with a high level of confidence in its stability
- Checkbox has an automated test suite run on every change which is strong enough that at least one stakeholder feels happy using builds on which only this test suite has been run
- Checkbox has a formalised, repeatable manual test suite which supplements the automated test suite with any tests that cannot be automated
- Checkbox has a clear versioning scheme and strategy for releasing to Ubuntu

(?)

Work Items

Work items:
"As Checkbox maintainer, I want to get a consensus with checkbox stakeholders about the suiteless execution mode, so I can know which modes I have to support - S": TODO
"As The Hardware Certification team, I want all smoke tests to run in headless mode, so I can use a server VM to test them - L": TODO
"As CE QA, I want to deploy/adapt the unit test suites of checkbox to oem-qa-checkbox and keep them synced, so I can use checkbox-oem with confidence - L": TODO
"As Checkbox maintainer, I want to remove the urwid/gtk versions of checkbox, so I can focus maintenance efforts on Qt and CLI versions - M": TODO
"As Checkbox maintainer, I want to measure code coverage for the core of Checkbox so that I can make an informed decision on where to focus automated testing next - M" : TODO
"As Checkbox maintainer, I want to create a list of automated tests that need to be implemented, based on code coverage and test gap analysis, so that I can schedule work to improve the automated regression testing in Checkbox - L": TODO
"As Checkbox maintainer, I want to have automated tests to verify checkbox installs (using dpkg(-offline), apt, upgrade) and uninstalls properly, so I can release checkbox with confidence - L": TODO
"As CE QA, I want to automatically archive my previous test run data, so I can start new tests without losing precious data - M": TODO
"As The Hardware Certification team, I want to use the staging certification website (or setup the new one with the IS charm), so I can automate the results submission tests - L": TODO
"As A lab engineer, I want to have automated tests to validate submission to the certification website, so I can be sure that my results are published and not corrupted - L": TODO
"As A Technical Account Manager, I want to have automated tests to validate the offline submission process to the certification website, so I can go testing on site and use the offline tools to submit later with confidence - L": TODO
"As A Business customer, I want to have automated tests to ensure both XML and XLS reports are properly generated after a test run, so I can share them with checkbox developpers (XML) or customers (XLS) - M": TODO
"As CE QA, I want to extend smoke tests with jobs ordering validation, so I can rely on the checkbox whitelist ordering feature - XL": TODO
"As CE QA, I want to have UI regression tests, so I can test hardware using checkbox with confidence - XL": TODO
"As CE QA, I want to update checkbox-editor to use the checkbox job validator instead of its own (may require a conversion to python3), so I can have just one template validation rules set - L": TODO
"As A Checkbox stakeholder, I want to review the versionning of checkbox, so it can be delivered with a comprehensive version schema - M": TODO

This blueprint contains Public information

Everyone can see this information.

Subscribers

Ara Pulido

Chris Gregan

Daniel Manrique

Elsa Wang

Ethan Chang

Jeff Lane 

Jeffrey Chang

Keng-Yu Lin

Steve Magoun

Sylvain Pineau