LAVA Dispatcher Improvements

Registered by Paul Larson

We now have a working dispatcher, but I think we should look at the internals and see how we can do better now that we better understand the overall flow. I also think we are barely scratching the surface of what we can provide here, and would like to brainstorm some ideas for new features.

Blueprint information

Paul Larson
Spring Zhang
Needs approval
Spring Zhang
Series goal:
Milestone target:
Started by
Spring Zhang
Completed by
Neil Williams

Related branches



Status: Drafting

User stories:
Format like "As a <role>, I want <goal/desire> so that <benefit>"

As an android tester, I want to run use lava with android images so that I can deploy android, run tests, and submit results
 - Some of this is already in trunk, but some testing, bugfixing, and improvement needs to be done on it.

As a LAVA user, I want to have documentation about the dispatcher so that I can set up a development environment, submit test jobs, and test it locally.

As a job submitter, I want to run my job as far as possible so that I know what the root cause if failed and see the test result and logs even if the job can not finish as normal.
 - There should be exceptions to this... for instance, if a deploy step fails, it makes no sense to run tests.
 - current behavior is to check the last action, and ensure that it is to submit results, if so, stop execution of actions if anything fails, and try to run the submit_results action before exiting

Sometimes, the result uploading is not reliable in test, to make sure result is predictable and reliable:

As a LAVA developer, I want to get a better view of dispatcher output log so that it's easier to debug and distinguish with serial log.

As kernel developer, I want to specify a test kernel and packages I compiled myself so that I can validate my change to the kernel.

As a user with slow network or proxy, I want the image preparation and package installation faster, I want a caching mode for reserving necessary packages for future releases validation, so that I can see the test result quicker.

As a test case developer, I want to run the command which is not defined as an action and some general system commands like "free" so that I can organise my tests and test sequence freely

As a tester, I want to run a stress test with no specified duration so that I can find out how long it ran before the system crashed.
- Dependent on LAVA being able to make a judgement call that the system is unresponsive and/or crashed, and recovering the machine, submitting results with the total duration injected.

As a normal user, I want to know an approximate waiting time and running time for my job so that I can check the job progress some time later.
- For wait time at least, possibly both, this seems like more of a scheduler item? Needs clarification

Proposed Work items for detail blueprints:
[qzhang] Enhance errors and exceptions classification and their handlers: DONE
[qzhang] Implement a general error handler for submitting test result: InProgress
[qzhang] Implement error handler for Timeout: DONE
[qzhang] Implement error handler for OperationFailed exception: InProgress
[qzhang] Implement error handler for RuntimeError: DONE
Investigate transferring data via serial port: TODO
Implement transferring data via serial port: TODO
Implement error handler for NetworkError: TODO
Implement an action to deploy a new kernel: TODO
Implement interface to install deb package to test image: TODO
Implement interface to extract tarball to test image: TODO
Implement deploying a kernel from deb package: TODO
Implement deploying a kernel from a tarball: TODO
Implement deploying custom files like modules from a tarball: TODO
Implement capturing stdout and stderr for out of tree command as attachment of result bundle: TODO
Add return error code for out of tree cmd to result bundle: TODO
Replace name of "test run" field to out of tree cmd name: TODO
Implement an action run-out-of-tree-command: TODO
Investigate apt-cacher-ng: TODO
Make deployment of packages and tests faster: TODO
Summary approximate time for every part of a job like deployment, test case execution, result submitting: TODO
Define the time statistics format for a job, align with dispatcher and scheduler: TODO



Work Items

Dependency tree

* Blueprints in grey have been implemented.