UEC testing - update

Registered by C de-Avillez on 2010-04-28

On Lucid we worked on https://blueprints.launchpad.net/ubuntu/+spec/server-lucid-uec-testing. A test rig was procured and built (cempedak, mabolo, marula, mamoncillo, santol, sapodilla, and soncoya), and 5 different topologies were defined. Our experience showed some limitations on both the process and the hardware; at the same time, access to the test rig unveiled some serious issues, not yet reported elsewhere. We need to analyse the results and lessons learned, and redefine/expand on it.

Blueprint information

Status:
Complete
Approver:
Jos Boumans
Priority:
High
Drafter:
C de-Avillez
Direction:
Approved
Assignee:
Dustin Kirkland 
Definition:
Approved
Series goal:
Accepted for maverick
Implementation:
Implemented
Milestone target:
milestone icon ubuntu-10.10-beta
Started by
Jos Boumans on 2010-06-14
Completed by
Dustin Kirkland  on 2010-11-20

Related branches

Sprints

Whiteboard

Status:
in progress

Complexity:
[hggdh2] maverick-alpha-3: 5
[ccheney] maverick-alpha-3: 4
[hggdh2] ubuntu-10.10-beta: 2
[kirkland] ubuntu-10.10-beta: 4
[hggdh2] ubuntu-10.10: 3
[kirkland] ubuntu-10.10: 5

Work items for maverick-alpha-2:
[hggdh2] verify possibility of having Yet Another Test RIG, allowing for simultaneous runs: POSTPONED
[hggdh2] document the need for always using both i386 and AMD64 images on testing: DONE
[hggdh2] announce the availability of the internal UEC for ad-hoc usage (feedback to us): POSTPONED
[hggdh2] discuss with IS access to the internal UEC logs (for monitoring): POSTPONED
[sylvain-pineau] Ensure access to additional rig #1:DONE
[hggdh2] Ensure UEC can be deployed on rig #1: DONE
[sylvain-pineau] Ensure access to additional rig #2 (Sylvain reported issues): POSTPONED
[hggdh2] Ensure UEC can be deployed on rig #2 (depends on rig #2 availability above): POSTPONED
[sylvain-pineau] Ensure hggdh2's test additions work in checkbox: DONE
[hggdh2] Test Eucalyptus SRU of 2010-06-10 (eucalyptus 1.6.2-0ubuntu30.2): DONE
[hggdh2] Test Eucalyptus SRU for Metadata service (first try): DONE
[hggdh2] Test Eucalytpus SRU for Metadata service (To be released): DONE
[kirkland] Identify additional test resources & their availability (Platform:hggdh,ccheney OEM:cgregan,deji,spineau,massimo): DONE
[hggdh2] Train up additional testers (split me out in multiple items): DONE
[hggdh2] verify status and help testing Jaguar: DONE
[hggdh2] refine test criteria, get testing done (with cgregan, spineau, deji): DONE
[hggdh2] verify spineau's volume testing, add in to uec-testing-scripts: POSTPONED
[hggdh2] stress volume testing -- create 512 volumes, try to add one more (will fail), delete all volumes: DONE
[hggdh2] verify status of machines in OEM rig (totara, camucamu, seagrape, ceylon): DONE
[hggdh2] Perform tests on A2 candidate (bug 588861): POSTPONED
[ccheney] uec-testing-scripts: add basic EBS testing -- create/attach to instance/write/detach from instance/attach to new instance/verify: POSTPONED

Work items for maverick-alpha-3:
[hggdh2] verify spineau's volume testing, add in to uec-testing-scripts: POSTPONED
[hggdh2] Perform tests on daily image with new kernel (bug 588861): DONE
[hggdh2] verify possibility of having Yet Another Test RIG, allowing for simultaneous runs (no. We will use what we have): DONE
[hggdh2] announce the availability of the internal UEC for ad-hoc usage (feedback to us): DONE
[hggdh2] discuss with IS access to the internal UEC logs (for monitoring, RT#39727, no response so far): DONE
[hggdh2] uec-testing-scripts: elastic IP testing -- create instance, attach an IP, attach to instance with this IP, detach IP from instance, try again: POSTPONED
[hggdh2] uec-testing-scripts: add euca-run-instances --instance-count=N, 1<N: DONE
[ccheney] uec-testing-scripts: add basic EBS testing -- create/attach to instance/write/detach from instance/attach to new instance/verify: POSTPONED
[ccheney] test all euca2ools binaries, most common options: POSTPONED
[ccheney] uec-testing-scripts: add security group testing -- try to access an instance with a different security group: POSTPONED
[ccheney] uec-testing-scripts: add security group testing -- blocked ports: POSTPONED
[ccheney] uec-testing-scripts: add security group testing -- try to monitor traffic from another instance in a different security group: POSTPONED
[hggdh2] results -- use TAP (Test-Anything-Protocol)? Discuss with QA: POSTPONED
[hggdh2] Perform tests on A3 candidate: DONE
[sylvain-pineau] Ensure access to additional rig #2 (Sylvain reported issues): POSTPONED

Work items for ubuntu-10.10-beta:
[hggdh2] test eucalyptus 2.0 code drops: DONE
[hggdh2] verify spineau's volume testing, add in to uec-testing-scripts (bug 615646): DONE
[hggdh2] results -- use TAP (Test-Anything-Protocol)? Discuss with QA: DONE
[hggdh2] uec-testing-scripts: elastic IP testing -- create instance, attach an IP, attach to instance with this IP, detach IP from instance, try again: POSTPONED
[kirkland] uec-testing-scripts: add basic EBS testing -- create/attach to instance/write/detach from instance/attach to new instance/verify (Sylvain's script): DONE
[kirkland] test all euca2ools binaries, most common options, see https://wiki.ubuntu.com/Euca2oolsTestCoverage : DONE
[kirkland] uec-testing-scripts: add security group testing -- try to access an instance with a different security group: POSTPONED
[kirkland] uec-testing-scripts: add security group testing -- blocked ports: POSTPONED
[kirkland] uec-testing-scripts: add security group testing -- try to monitor traffic from another instance in a different security group: POSTPONED
[sylvain-pineau] Ensure access to additional rig #2 (Sylvain reported issues): DONE

Work items for ubuntu-10.04.1:
[hggdh2] test UEC on topology1 (all-in-one): DONE
[hggdh2] test UEC on topology2 (all separate): DONE

mathiaz review / 20100526:
 * what is the goal of using the TAP protocol?
  * [jib] It generates both Human and Machine readable test output which can be used for success/failure and summary reports/actions. See http://testanything.org
 * euca2ools binaries: I'd suggest to be more specific: which binaries are targeted in order of priority?

ttx review / 20100527:
 * Missing spec doc, though WI are quite precise
 * No work items on UEC QA runs like we did in lucid -- omission, or are they taken care of elsewhere ?
  * [hggdh2] I had forgotten them, added in.
 * Looks like it's a lot of work, especially given how that spec ate Mathias's time in Lucid, how much of that would hggdh be able to commit to it ?
  * [hggdh2] I do not know, as of now. It *is* intense work, all over.

jib review 20100602:
* Work item list is a copy of what we did for lucid, please refine for Maverick (I edited it slightly -- ttx)

====

(QA/Server teams)
* Reference testing: we should come up with a standard sequence of tests, capable to be performed by any group (with access to the necessary hardware/packages)
* convergence with upstream: verify upstream QA/dev tests, trying to assimilate/propose/complement, allowing for an easier comparison of results
* expand test coverage: add in basic and stress storage tests (ebs support); add in rebundling; add network security tests.
* stress tests: expand the stress tests to also add real work performed on the instances (whatever that means),
* results: define a standard location for saving the results (actual test results, and logs) -- naming, location, formats
* integration: can the tests be integrated with other existing tools (checkbox, etc)?
* more hardware? Current tests are time-comsuming (setting up new topologies, running the tests) both on engineer and machine time. Perhaps additional rigs could be deployed, minimising the reconfiguration need (and allowing for tests to be performed in paralell)

(from the GobbyDoc server-m-uec-testing)

== Pain Points ==
 * Taking a long time to setup topology
 * Takes a long time to run tests
 * Carlos estimates 28 hours to run all tests, all topologies
 * Manually, serial install of machines in some topos is a pain:
   - use early_command to wait for dependent systems to be installed/operational before proceeding the installation
     -> One button install: fire off all systems on the test rig and wait for all systems to install

== Suggestions ==
 * A second test rig would help
 * Focus formal testing on 64-bit
  * test harness could run 2 images (one i386, one amd64) to test i386 guest support
 * need to excercise
  * ebs volumes
    * attach to instance
    * write data
    * detach from instance
    * reattach to instance (verifiy data)
    * shut down instance
    * attach to new instance
    * create a snapshot
    * create a new volume based on saved snapshot
    * compare new volume and original one (can snapshots be compared?)
  * security groups (currently only testing creation of)
    * create group
    * open port on blocked port in guest
    * verify that guest has port blocked
    * verify inter-instance traffic also is blocked (instances cannot see each other)
  * elastic IP
    * start instance
    * attach IP
    * attach to instance
    * detach IP
  * euca-run-instances --instance-count (smoser has noticed different load on UEC by this)
  * user data tests
  * comprehensive testing of euca2ools' euca-* binaries
    * specifically, look for the euca-* binaries which are not currently called by the testrig
    * add a couple of scripts that additionally exercise those
    * could do some of this testing on the UEC LiveUSB
  * Results
    * need a standard format for results
    * currently saving all results from every run (~300MB uncompressed) -- Carlos pushes to lp:/~hggdh2/+junk/uec-qa
    * TAP (Test anything Protocol)uds-bois-
      * http://en.wikipedia.org/wiki/Test_Anything_Protocol
  * Test Instances should do "something" useful besides just listening on SSH
    * Perhaps cpu/io intensive workload (like a kernel compile)

[ACTION] Jos: make people use our internal uec rig

(?)

Work Items