Benchmarks and baselines in LAVA Dashboard

Registered by Zygmunt Krynicki on 2011-08-29

Additional features enabling tracking benchmark data produced by the Graphics WG (benchmark reports, hardware association, baselines)

Blueprint information

Status:
Not started
Approver:
Paul Larson
Priority:
Medium
Drafter:
Zygmunt Krynicki
Direction:
Approved
Assignee:
Andy Doan
Definition:
Drafting
Series goal:
Accepted for linaro-11.11
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

[doanac, 2012-11-06] I think we can close this out now. The graphics view alexandros did seems to satisfy what the Graphics team wanted: https://validation.linaro.org/lava-server/graphics/

The desire is to start tracking benchmark data on a daily basis and produce interesting views on of that data.
This would be accomplished by three-level-view and some internal processing and site configuration data.

Level one view (top level) would present a list of Tests+TestCases that are pulled from the side configuration data. Those would make the list customizeable and manageable for interested users. This list might be composed of all the distinct values of BenchmarkBaseline.TestCase we have in the database.

Second level would focus on a particular TestCase and create a breakdown grid of latest results of that TestCase.measurement across various possible hardware. Hardware would be matched by running a specific small side program (function) on a hardware context to determine which "bin" the test run should be associated with. In reality we'll probably have one bin per SoC or GPU. This view would present the user with the following columns of data: "hardware bin name", "aggregate measurement" and "computed aggregate status". The first is simply the value produced by the hardware classifier (so for us it might be the board name). The aggregate measurement would be the result of running a custom aggregation function on all of the measurements in that bin. Finally the the computed aggregate status would be a pass/fail classification of the aggregate result based on a baseline run measurement and allowed threshold (both represented as numbers). Clicking on any hardware-specific aggregation would bring the user to the next view.

The third level would present the user with a list of results (extracted from a list of test runs) that were used to create the aggregate measurement in the preceding view. Again this would be rendered as a table of values (+ optional line chart over time). The table would have the following columns: "time and date of the run", "measurement", "computed status" and unspecified software context data (most likely hardware pack information but could be as detailed as kernel/driver version/commit id). Clicking on the result would move the user to the standard test result view.

The optional line chart would be showing relative progress / regress over time against the baseline. It would be composed of either test run timestamps or some other monotonic property (such as build timestamp or hwpack version, TBD). The chart would show both the actual values as well as the threshold area (so clearly identify results that are away from the "green zone". Clicking on an item would bring the user to the standard test result page.

(?)

Work Items

Work items:
Store results from glmark2 running daily on the selected hardware: TODO
Add ability to define and store HardwareClassifier: TODO
Add ability to define BenchmarkBaselines: TODO
Add ability to select a particular test result as a BenchmarkBaseline for a particular HardwareClassifier value (manually from the data viewer action, by having appropriate permission): TODO
Create first level view: TODO
Create second level view: TODO
Create third level view: TODO
Create the chart in the third level view: TODO

This blueprint contains Public information 
Everyone can see this information.