Tests involving multiple systems simultaneously in LAVA

Registered by Paul Larson

Tests of client/server and distributed systems via LAVA requires the ability to launch and coordinate actions on multiple target machines. We should discuss the ways people would like to use this, how to specify something like this sensibly in the test jobs, and the interaction with other LAVA components such as the dispatcher.

Session Notes:
use cases
* client/server functionality
* performance
 - nuttcp over network topologies
 * stress
  - use client systems to generate load on a server
  - requires feedback from both sides
Testing systems that have shared resources like a san
Changes proposed for lava
* Target specification (for more than one target)
* result reporting separated by target
* inter-client concurrency, synchronization, client actions, data sharing and config
defines client groups with a name for the group, and specific targets specified
* what if want a number of devices of a certain type rather than individual clients?
* scheduler would need to somehow make sure *all* systems in a client group for a job are available before running
^^^ Or, alternatively, provide a way to request and script jobs for classes/types of machines rather than specific machines
* make actions something that can be installed from out-of-tree source
* extend results to allow hw/sw context for additional machines?
* Which existing actions, such as deploy, would need to be modified to work on groups of machines?
Steps to implementing this:
Defining multiple targets
* specified or by type
* scheduler allocation
Dispatcher handling of multi client/context
* does this work with new client/connection split in trunk
* synchronization action
Test actions
* pluggable actions
* defining generic ones?
* API for defining test results?
Results aggregation
* would see all the results pre-submit, and be able to modify or add to them

Blueprint information

Status:
Complete
Approver:
Paul Larson
Priority:
Medium
Drafter:
None
Direction:
Approved
Assignee:
None
Definition:
Obsolete
Series goal:
Accepted for linaro-11.11
Implementation:
Unknown
Milestone target:
milestone icon backlog
Completed by
Neil Williams

Related branches

Sprints

Whiteboard

[asac, Oct 29, 2011]: I assume splitting the test runs at entry level could be the easiest way. important that all runs deliver results to the same bundle though, so they would all be visible here: http://bit.ly/tQvacV ...just with multiple deploy and boot and gather results etc. for each child job.
[jason-hobbs, Oct 29, 2011]: We've implemented a solution using threading. The subtleties arise in coordinating actions between clients - like waiting for a file to be transferred to check for its existence. I have a short presentation prepared that I will cover at the start of this to go over our use cases and proposed solution.

use cases
* client/server functionality
* performance
 - nuttcp over network topologies
 * stress
  - use client systems to generate load on a server
  - requires feedback from both sides
Testing systems that have shared resources like a san

Changes proposed for lava
* Target specification (for more than one target)
* result reporting separated by target
* inter-client concurrency, synchronization, client actions, data sharing and config

defines client groups with a name for the group, and specific targets specified
* what if want a number of devices of a certain type rather than individual clients?
* scheduler would need to somehow make sure *all* systems in a client group for a job are available before running
^^^ Or, alternatively, provide a way to request and script jobs for classes/types of machines rather than specific machines
* make actions something that can be installed from out-of-tree source
* extend results to allow hw/sw context for additional machines?
* Which existing actions, such as deploy, would need to be modified to work on groups of machines?

Steps to implementing this:
Defining multiple targets
* specified or by type
* scheduler allocation
Dispatcher handling of multi client/context
* does this work with new client/connection split in trunk
* synchronization action
Test actions
* pluggable actions
* defining generic ones?
* API for defining test results?
Results aggregation
* would see all the results pre-submit, and be able to modify or add to them

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.