Estimating job computation time and Progress Reporting

Registered by Lars Butler on 2011-08-24

DRAFT

--------------------------------------------
Estimating calculation time upfront
--------------------------------------------

We need to be able to estimate the size of an OpenQuake calculation (prior to running it) for the following reasons:

1) Users should be given a reasonable estimate of how long the computation will take to complete.
2) The computation estimate can be used to by the user to decide if additional resources (such as EC2 nodes) should be added to the worker pool (in the case of extremely large calculations).

Job time could be estimated by mining previous job data. For example, we could use the elapsed time of one or more previous jobs with similar inputs to help us provide estimates for subsequent jobs. Here is some of the relevant information we should consider capturing each time a job is run:

- Total number of sites
- Logic tree samples (Hazard)
- Elapsed time (either as a single value or derived from 'start_timestamp' and 'finish_timestamp')
- Number of seismic sources (Hazard)
- Number of assets in exposure model (Risk)

From this information, we could derive a handful estimation formulae.

Note: For the first iteration, I think it's reasonable to just capture job start time, stop time, and number of sites (and iterate from there).

---------------------------------------------------------------------------
Estimating remaining calculation time (during a calculation)
---------------------------------------------------------------------------

The following method was suggested by Damiano Monelli:

Trying to estimate the job time upfront is very difficult to do given the large number of input parameters, as well as computational resource factors. Thus, it may be more realistic for a first iteration to give a more realistic estimate, obtained in the following manner:

- Start the computation
- Wait for 1 task to complete
- Measure the task time
- Report the estimated time remaining based on sites remaining

This procedure could be repeated each time a task completes to update the time estimate.

This would give us an accurate estimate of the computation time taking into account the current system resources.

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.